WorldWideScience

Sample records for partially annotated listing

  1. Algal Functional Annotation Tool: a web-based analysis suite to functionally interpret large gene lists using integrated annotation and expression data

    Directory of Open Access Journals (Sweden)

    Merchant Sabeeha S

    2011-07-01

    Full Text Available Abstract Background Progress in genome sequencing is proceeding at an exponential pace, and several new algal genomes are becoming available every year. One of the challenges facing the community is the association of protein sequences encoded in the genomes with biological function. While most genome assembly projects generate annotations for predicted protein sequences, they are usually limited and integrate functional terms from a limited number of databases. Another challenge is the use of annotations to interpret large lists of 'interesting' genes generated by genome-scale datasets. Previously, these gene lists had to be analyzed across several independent biological databases, often on a gene-by-gene basis. In contrast, several annotation databases, such as DAVID, integrate data from multiple functional databases and reveal underlying biological themes of large gene lists. While several such databases have been constructed for animals, none is currently available for the study of algae. Due to renewed interest in algae as potential sources of biofuels and the emergence of multiple algal genome sequences, a significant need has arisen for such a database to process the growing compendiums of algal genomic data. Description The Algal Functional Annotation Tool is a web-based comprehensive analysis suite integrating annotation data from several pathway, ontology, and protein family databases. The current version provides annotation for the model alga Chlamydomonas reinhardtii, and in the future will include additional genomes. The site allows users to interpret large gene lists by identifying associated functional terms, and their enrichment. Additionally, expression data for several experimental conditions were compiled and analyzed to provide an expression-based enrichment search. A tool to search for functionally-related genes based on gene expression across these conditions is also provided. Other features include dynamic visualization of

  2. LazySorted: A Lazily, Partially Sorted Python List

    Directory of Open Access Journals (Sweden)

    Naftali Harris

    2015-06-01

    Full Text Available LazySorted is a Python C extension implementing a partially and lazily sorted list data structure. It solves a common problem faced by programmers, in which they need just part of a sorted list, like its middle element (the median, but sort the entire list to get it. LazySorted presents them with the abstraction that they are working with a fully sorted list, while actually only sorting the list partially with quicksort partitions to return the requested sub-elements. This enables programmers to use naive "sort first" algorithms but nonetheless attain linear run-times when possible. LazySorted may serve as a drop-in replacement for the built-in sorted function in most cases, and can sometimes achieve run-times more than 7 times faster.

  3. New Publications for Planning Libraries (List No. 20). Exchange Bibliography 928.

    Science.gov (United States)

    Vance, Mary, Comp.

    This partially annotated bibliography contains current listings on a variety of topics including architecture, economics, energy, environmental education, geography, housing, land use, politics, urban planning, recreation, and transportation. The bulk of the documents are project reports, commercially published books, and studies. Most date from…

  4. Invertebrates of The H.J. Andrews Experimental Forest, Western Cascade Range, Oregon. V: An Annotated List of Insects and Other Arthropods

    Science.gov (United States)

    Gary L. Parson; Gerasimos Cassis; Andrew R. Moldenke; John D. Lattin; Norman H. Anderson; Jeffrey C Miller; Paul Hammond; Timothy D. Schowalter

    1991-01-01

    An annotated list of species of insects and other arthropods that have been collected and studies on the H.J. Andrews Experimental forest, western Cascade Range, Oregon. The list includes 459 families, 2,096 genera, and 3,402 species. All species have been authoritatively identified by more than 100 specialists. Information is included on habitat type, functional group...

  5. Invertebrates of The H.J. Andrews Experimental Forest, western Cascade Range, Oregon. V: An annotated list of insects and other arthropods.

    Science.gov (United States)

    Gary L. Parson; Gerasimos Cassis; Andrew R. Moldenke; John D. Lattin; Norman H. Anderson; Jeffrey C Miller; Paul Hammond; Timothy D. Schowalter

    1991-01-01

    An annotated list of species of insects and other arthropods that have been collected and studies on the H.J. Andrews Experimental forest, western Cascade Range, Oregon. The list includes 459 families, 2,096 genera, and 3,402 species. All species have been authoritatively identified by more than 100 specialists. Information is included on habitat type, functional group...

  6. Diverse Image Annotation

    KAUST Repository

    Wu, Baoyuan

    2017-11-09

    In this work we study the task of image annotation, of which the goal is to describe an image using a few tags. Instead of predicting the full list of tags, here we target for providing a short list of tags under a limited number (e.g., 3), to cover as much information as possible of the image. The tags in such a short list should be representative and diverse. It means they are required to be not only corresponding to the contents of the image, but also be different to each other. To this end, we treat the image annotation as a subset selection problem based on the conditional determinantal point process (DPP) model, which formulates the representation and diversity jointly. We further explore the semantic hierarchy and synonyms among the candidate tags, and require that two tags in a semantic hierarchy or in a pair of synonyms should not be selected simultaneously. This requirement is then embedded into the sampling algorithm according to the learned conditional DPP model. Besides, we find that traditional metrics for image annotation (e.g., precision, recall and F1 score) only consider the representation, but ignore the diversity. Thus we propose new metrics to evaluate the quality of the selected subset (i.e., the tag list), based on the semantic hierarchy and synonyms. Human study through Amazon Mechanical Turk verifies that the proposed metrics are more close to the humans judgment than traditional metrics. Experiments on two benchmark datasets show that the proposed method can produce more representative and diverse tags, compared with existing image annotation methods.

  7. Diverse Image Annotation

    KAUST Repository

    Wu, Baoyuan; Jia, Fan; Liu, Wei; Ghanem, Bernard

    2017-01-01

    In this work we study the task of image annotation, of which the goal is to describe an image using a few tags. Instead of predicting the full list of tags, here we target for providing a short list of tags under a limited number (e.g., 3), to cover as much information as possible of the image. The tags in such a short list should be representative and diverse. It means they are required to be not only corresponding to the contents of the image, but also be different to each other. To this end, we treat the image annotation as a subset selection problem based on the conditional determinantal point process (DPP) model, which formulates the representation and diversity jointly. We further explore the semantic hierarchy and synonyms among the candidate tags, and require that two tags in a semantic hierarchy or in a pair of synonyms should not be selected simultaneously. This requirement is then embedded into the sampling algorithm according to the learned conditional DPP model. Besides, we find that traditional metrics for image annotation (e.g., precision, recall and F1 score) only consider the representation, but ignore the diversity. Thus we propose new metrics to evaluate the quality of the selected subset (i.e., the tag list), based on the semantic hierarchy and synonyms. Human study through Amazon Mechanical Turk verifies that the proposed metrics are more close to the humans judgment than traditional metrics. Experiments on two benchmark datasets show that the proposed method can produce more representative and diverse tags, compared with existing image annotation methods.

  8. Additions to the annotated list of marine alien biota in the Mediterranean with special emphasis on Foraminifera and Parasites

    Directory of Open Access Journals (Sweden)

    A. ZENETOS

    2008-05-01

    Full Text Available The present work is an update of the annotated list (ZENETOS et al., 2006 based on literature up to April 2008. Emphasis is given to ecofunctional/taxonomic groups poorly addressed in the annotated list, such as the foraminiferan and parasites, while macrophytes are critically reviewed following the CIESM Atlas (VERLAQUE et al., in press. Moreover, in this update the bio-geographic area addressed includes the Sea of Marmara. The update yields a further 175 alien species in the Mediterranean bringing the total to 903. As evidenced by recent findings, more and more previously known ‘casual’ aliens, are becoming established. Approximately 100 more species have become well established in the region, raising the number of established species to 496 versus 385 until 2005. In the period from January 2006 to April 2008 more than 80 published papers have resulted in the recording of 94 new aliens, which is interpreted as a new introduction every 9 days, a rate beyond the worst scenario.

  9. PIERO ontology for analysis of biochemical transformations: effective implementation of reaction information in the IUBMB enzyme list.

    Science.gov (United States)

    Kotera, Masaaki; Nishimura, Yosuke; Nakagawa, Zen-ichi; Muto, Ai; Moriya, Yuki; Okamoto, Shinobu; Kawashima, Shuichi; Katayama, Toshiaki; Tokimatsu, Toshiaki; Kanehisa, Minoru; Goto, Susumu

    2014-12-01

    Genomics is faced with the issue of many partially annotated putative enzyme-encoding genes for which activities have not yet been verified, while metabolomics is faced with the issue of many putative enzyme reactions for which full equations have not been verified. Knowledge of enzymes has been collected by IUBMB, and has been made public as the Enzyme List. To date, however, the terminology of the Enzyme List has not been assessed comprehensively by bioinformatics studies. Instead, most of the bioinformatics studies simply use the identifiers of the enzymes, i.e. the Enzyme Commission (EC) numbers. We investigated the actual usage of terminology throughout the Enzyme List, and demonstrated that the partial characteristics of reactions cannot be retrieved by simply using EC numbers. Thus, we developed a novel ontology, named PIERO, for annotating biochemical transformations as follows. First, the terminology describing enzymatic reactions was retrieved from the Enzyme List, and was grouped into those related to overall reactions and biochemical transformations. Consequently, these terms were mapped onto the actual transformations taken from enzymatic reaction equations. This ontology was linked to Gene Ontology (GO) and EC numbers, allowing the extraction of common partial reaction characteristics from given sets of orthologous genes and the elucidation of possible enzymes from the given transformations. Further future development of the PIERO ontology should enhance the Enzyme List to promote the integration of genomics and metabolomics.

  10. Mesotext. Framing and exploring annotations

    NARCIS (Netherlands)

    Boot, P.; Boot, P.; Stronks, E.

    2007-01-01

    From the introduction: Annotation is an important item on the wish list for digital scholarly tools. It is one of John Unsworth’s primitives of scholarship (Unsworth 2000). Especially in linguistics,a number of tools have been developed that facilitate the creation of annotations to source material

  11. Systems Theory and Communication. Annotated Bibliography.

    Science.gov (United States)

    Covington, William G., Jr.

    This annotated bibliography presents annotations of 31 books and journal articles dealing with systems theory and its relation to organizational communication, marketing, information theory, and cybernetics. Materials were published between 1963 and 1992 and are listed alphabetically by author. (RS)

  12. Linking wilderness research and management-volume 3. Recreation fees in wilderness and other public lands: an annotated reading list

    Science.gov (United States)

    Annette Puttkammer; Vita Wright

    2001-01-01

    This annotated reading list provides an introduction to the issue of recreation fees on public lands. With an emphasis on wilderness recreation fees, this compilation of historical and recent publications is divided into the following sections: historical context, arguments for and against fees, pricing mechanisms and the effects of price, public attitudes toward fees...

  13. Special Issue: Annotated Bibliography for Volumes XIX-XXXII.

    Science.gov (United States)

    Pullin, Richard A.

    1998-01-01

    This annotated bibliography lists 310 articles from the "Journal of Cooperative Education" from Volumes XIX-XXXII, 1983-1997. Annotations are presented in the order they appear in the journal; author and subject indexes are provided. (JOW)

  14. Essential Requirements for Digital Annotation Systems

    Directory of Open Access Journals (Sweden)

    ADRIANO, C. M.

    2012-06-01

    Full Text Available Digital annotation systems are usually based on partial scenarios and arbitrary requirements. Accidental and essential characteristics are usually mixed in non explicit models. Documents and annotations are linked together accidentally according to the current technology, allowing for the development of disposable prototypes, but not to the support of non-functional requirements such as extensibility, robustness and interactivity. In this paper we perform a careful analysis on the concept of annotation, studying the scenarios supported by digital annotation tools. We also derived essential requirements based on a classification of annotation systems applied to existing tools. The analysis performed and the proposed classification can be applied and extended to other type of collaborative systems.

  15. 76 FR 70105 - National Oil and Hazardous Substance Pollution Contingency Plan National Priorities List: Partial...

    Science.gov (United States)

    2011-11-10

    ... and Hazardous Substance Pollution Contingency Plan National Priorities List: Partial Deletion of the... appendix of the National Oil and Hazardous Substances Pollution Contingency Plan (NCP). EPA and the State... property PINs listed above. The deletion of these two parcels from the Site affects all surface soils...

  16. Partial list of bipartite Bell inequalities with four binary settings

    International Nuclear Information System (INIS)

    Brunner, Nicolas; Gisin, Nicolas

    2008-01-01

    We give a partial list of 26 tight Bell inequalities for the case where Alice and Bob choose among four two-outcome measurements. All tight Bell inequalities with less settings are reviewed as well. For each inequality we compute numerically the maximal quantum violation, the resistance to noise and the minimal detection efficiency required for closing the detection loophole. Surprisingly, most of these inequalities are outperformed by the CHSH inequality

  17. Sublime Imperfections : Annotated Reading List

    NARCIS (Netherlands)

    Rutten, E.

    2016-01-01

    In this reading list, I share thoughts on scholars and journalists from which the Sublime Imperfections project takes its inspiration. The authors of the texts that I clustered ponder the nexus between the imperfect and the sublime, they rethink repair and breakdown, they critically interrogate and

  18. Resources for Achieving Sex Equity: An Annotated Bibliography.

    Science.gov (United States)

    Miller, Susan W., Comp.

    This annotated bibliography provides a list of resources dealing with sex equity in vocational education. The bibliography first provides operational definitions of "sexism,""sex fair,""sex affirmative,""sex bias," and "affirmative action." It then lists resources under the following topics and/or bibliographic forms: (1) sex role definition, (2)…

  19. Persuasion: A Selected, Annotated Bibliography.

    Science.gov (United States)

    McDermott, Steven T.

    Designed to reflect the diversity of approaches to persuasion, this annotated bibliography cites materials selected for their contribution to that diversity as well as for being relatively current and/or especially significant representatives of particular approaches. The bibliography starts with a list of 17 general textbooks on approaches to…

  20. WORKSHOPS FOR THE HANDICAPPED, AN ANNOTATED BIBLIOGRAPHY--NO. 3.

    Science.gov (United States)

    PERKINS, DOROTHY C.; AND OTHERS

    THESE 126 ANNOTATIONS ARE THE THIRD VOLUME OF A CONTINUING SERIES OF BIBLIOGRAPHIES LISTING ARTICLES APPEARING IN JOURNALS AND CONFERENCE, RESEARCH, AND PROJECT REPORTS. LISTINGS INCLUDE TESTS, TEST RESULTS, STAFF TRAINING PROGRAMS, GUIDES FOR COUNSELORS AND TEACHERS, AND ARCHITECTURAL PLANNING, AND RELATE TO THE MENTALLY RETARDED, EMOTIONALLY…

  1. Partially nested designs in psychotherapy trials: A review of modeling developments.

    Science.gov (United States)

    Sterba, Sonya K

    2017-07-01

    Individually-randomized psychotherapy trials are often partially nested. For instance, individuals assigned to a treatment arm may be clustered into therapy groups for purposes of treatment administration, whereas individuals assigned to a wait-list control are unclustered. The past several years have seen rapid expansion and investigation of methods for analyzing partially nested data. Yet partial nesting often remains ignored in psychotherapy trials. This review integrates and disseminates developments in the analysis of partially nested data that are particularly relevant for psychotherapy researchers. First, we differentiate among alternative partially nested designs. Then, we present adaptations of multilevel model specifications that accommodate each design. Next, we address how moderation by treatment as well as mediation of the treatment effect can be investigated in partially nested designs. Model fitting results, annotated software syntax, and illustrative data sets are provided and key methodological issues are discussed. We emphasize that cluster-level variability in the treatment arm need not be considered a nuisance; it can be modeled to yield insights about the treatment process.

  2. Legal Information Sources: An Annotated Bibliography.

    Science.gov (United States)

    Conner, Ronald C.

    This 25-page annotated bibliography describes the legal reference materials in the special collection of a medium-sized public library. Sources are listed in 12 categories: cases, dictionaries, directories, encyclopedias, forms, references for the lay person, general, indexes, laws and legislation, legal research aids, periodicals, and specialized…

  3. CAGE_peaks_annotation - FANTOM5 | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available switchLanguage; BLAST Search Image Search Home About Archive Update History Data List Contact us FANTOM...file File name: CAGE_peaks_annotation File URL: ftp://ftp.biosciencedbc.jp/archive/fantom...on Download License Update History of This Database Site Policy | Contact Us CAGE_peaks_annotation - FANTOM5 | LSDB Archive ...

  4. Propagating annotations of molecular networks using in silico fragmentation.

    Science.gov (United States)

    da Silva, Ricardo R; Wang, Mingxun; Nothias, Louis-Félix; van der Hooft, Justin J J; Caraballo-Rodríguez, Andrés Mauricio; Fox, Evan; Balunas, Marcy J; Klassen, Jonathan L; Lopes, Norberto Peporine; Dorrestein, Pieter C

    2018-04-18

    The annotation of small molecules is one of the most challenging and important steps in untargeted mass spectrometry analysis, as most of our biological interpretations rely on structural annotations. Molecular networking has emerged as a structured way to organize and mine data from untargeted tandem mass spectrometry (MS/MS) experiments and has been widely applied to propagate annotations. However, propagation is done through manual inspection of MS/MS spectra connected in the spectral networks and is only possible when a reference library spectrum is available. One of the alternative approaches used to annotate an unknown fragmentation mass spectrum is through the use of in silico predictions. One of the challenges of in silico annotation is the uncertainty around the correct structure among the predicted candidate lists. Here we show how molecular networking can be used to improve the accuracy of in silico predictions through propagation of structural annotations, even when there is no match to a MS/MS spectrum in spectral libraries. This is accomplished through creating a network consensus of re-ranked structural candidates using the molecular network topology and structural similarity to improve in silico annotations. The Network Annotation Propagation (NAP) tool is accessible through the GNPS web-platform https://gnps.ucsd.edu/ProteoSAFe/static/gnps-theoretical.jsp.

  5. Annotated Bibliography of Textbooks and Reference Materials in Marine Sciences. Provisional Edition. Intergovernmental Oceanographic Commission, Technical Series.

    Science.gov (United States)

    United Nations Educational, Scientific, and Cultural Organization, Paris (France). Intergovernmental Oceanographic Commission.

    Presented is an annotated bibliography based on selected materials from a preliminary survey of existing bibliographies, publishers' listings, and other sources. It is intended to serve educators and researchers, especially those in countries where marine sciences are just developing. One hundred annotated and 450 non-annotated entries are…

  6. SNAD: sequence name annotation-based designer

    Directory of Open Access Journals (Sweden)

    Gorbalenya Alexander E

    2009-08-01

    Full Text Available Abstract Background A growing diversity of biological data is tagged with unique identifiers (UIDs associated with polynucleotides and proteins to ensure efficient computer-mediated data storage, maintenance, and processing. These identifiers, which are not informative for most people, are often substituted by biologically meaningful names in various presentations to facilitate utilization and dissemination of sequence-based knowledge. This substitution is commonly done manually that may be a tedious exercise prone to mistakes and omissions. Results Here we introduce SNAD (Sequence Name Annotation-based Designer that mediates automatic conversion of sequence UIDs (associated with multiple alignment or phylogenetic tree, or supplied as plain text list into biologically meaningful names and acronyms. This conversion is directed by precompiled or user-defined templates that exploit wealth of annotation available in cognate entries of external databases. Using examples, we demonstrate how this tool can be used to generate names for practical purposes, particularly in virology. Conclusion A tool for controllable annotation-based conversion of sequence UIDs into biologically meaningful names and acronyms has been developed and placed into service, fostering links between quality of sequence annotation, and efficiency of communication and knowledge dissemination among researchers.

  7. Nutrition & Adolescent Pregnancy: A Selected Annotated Bibliography.

    Science.gov (United States)

    National Agricultural Library (USDA), Washington, DC.

    This annotated bibliography on nutrition and adolescent pregnancy is intended to be a source of technical assistance for nurses, nutritionists, physicians, educators, social workers, and other personnel concerned with improving the health of teenage mothers and their babies. It is divided into two major sections. The first section lists selected…

  8. Annotated bibliography of coal in the Caribbean region. [Lignite

    Energy Technology Data Exchange (ETDEWEB)

    Orndorff, R.C.

    1985-01-01

    The purpose of preparing this annotated bibliography was to compile information on coal localities for the Caribbean region used for preparation of a coal map of the region. Also, it serves as a brief reference list of publications for future coal studies in the Caribbean region. It is in no way an exhaustive study or complete listing of coal literature for the Caribbean. All the material was gathered from published literature with the exception of information from Cuba which was supplied from a study by Gordon Wood of the US Geological Survey, Branch of Coal Resources. Following the classification system of the US Geological Survey (Wood and others, 1983), the term coal resources has been used in this report for reference to general estimates of coal quantities even though authors of the material being annotated may have used the term coal reserves in a similar denotation. The literature ranges from 1857 to 1981. The countries listed include Colombia, Mexico, Venezuela, Cuba, the Dominican Republic, Haiti, Jamaica, Puerto Rico, and the countries of Central America.

  9. Water Conservation Resource List.

    Science.gov (United States)

    NJEA Review, 1981

    1981-01-01

    Alarmed by the growing water shortage, the New Jersey State Office of Dissemination has prepared this annotated list of free or inexpensive instructional materials for teaching about water conservation, K-l2. A tipsheet for home water conservation is appended. (Editor/SJL)

  10. Annotated bibliography of South African indigenous evergreen forest ecology

    CSIR Research Space (South Africa)

    Geldenhuys, CJ

    1985-01-01

    Full Text Available Annotated references to 519 publications are presented, together with keyword listings and keyword, regional, place name and taxonomic indices. This bibliography forms part of the first phase of the activities of the Forest Biome Task Group....

  11. Persuasion: Attitude/Behavior Change. A Selected, Annotated Bibliography.

    Science.gov (United States)

    Benoit, William L.

    Designed for teachers, students and researchers of the psychological dimensions of attitude and behavior change, this annotated bibliography lists books, bibliographies and articles on the subject ranging from general introductions and surveys through specific research studies, and from theoretical position essays to literature reviews. The 42…

  12. (reprocessed)CAGE_peaks_annotation - FANTOM5 | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available switchLanguage; BLAST Search Image Search Home About Archive Update History Data List Contact us FANTOM...: ftp://ftp.biosciencedbc.jp/archive/fantom5/datafiles/reprocessed/hg38_latest/extra/CAGE_peaks_annotation/ ...e URL: ftp://ftp.biosciencedbc.jp/archive/fantom5/datafiles/reprocessed/mm10_latest/extra/CAGE_peaks_annotat...te History of This Database Site Policy | Contact Us (reprocessed)CAGE_peaks_annotation - FANTOM5 | LSDB Archive ...

  13. Astronomy Books of 1985: The Technical List.

    Science.gov (United States)

    Fraknoi, Andrew

    1986-01-01

    Consists of the second part of the 1985 annotated review of technical books done by the Astronomical Society of the Pacific. This listing was primarily designed for graduate students or research scientists. (TW)

  14. TubercuList--10 years after.

    Science.gov (United States)

    Lew, Jocelyne M; Kapopoulou, Adamandia; Jones, Louis M; Cole, Stewart T

    2011-01-01

    TubercuList (http://tuberculist.epfl.ch/), the relational database that presents genome-derived information about H37Rv, the paradigm strain of Mycobacterium tuberculosis, has been active for ten years and now presents its twentieth release. Here, we describe some of the recent changes that have resulted from manual annotation with information from the scientific literature. Through manual curation, TubercuList strives to provide current gene-based information and is thus distinguished from other online sources of genome sequence data for M. tuberculosis. New, mostly small, genes have been discovered and the coordinates of some existing coding sequences have been changed when bioinformatics or experimental data suggest that this is required. Nucleotides that are polymorphic between different sources of H37Rv are annotated and gene essentiality data have been updated. A host of functional information has been gleaned from the literature and many new activities of proteins and RNAs have been included. To facilitate basic and translational research, TubercuList also provides links to other specialized databases that present diverse datasets such as 3D-structures, expression profiles, drug development criteria and drug resistance information, in addition to direct access to PubMed articles pertinent to particular genes. TubercuList has been and remains a highly valuable tool for the tuberculosis research community with >75,000 visitors per month. Copyright © 2010 Elsevier Ltd. All rights reserved.

  15. Listing of Sandia publications in nuclear energy

    International Nuclear Information System (INIS)

    Cochrell, R.C.

    1990-10-01

    This report gives an annotated bibliography of reports published in 1989 by the Nuclear Energy Technology Directorate. A listing is also given of reports published by the staff in the nuclear energy field since 1972

  16. Deburring: an annotated bibliography. Volume V

    International Nuclear Information System (INIS)

    Gillespie, L.K.

    1978-01-01

    An annotated summary of 204 articles and publications on burrs, burr prevention and deburring is presented. Thirty-seven deburring processes are listed. Entries cited include English, Russian, French, Japanese and German language articles. Entries are indexed by deburring processes, author, and language. Indexes also indicate which references discuss equipment and tooling, how to use a process, economics, burr properties, and how to design to minimize burr problems. Research studies are identified as are the materials deburred

  17. Towards the VWO Annotation Service: a Success Story of the IMAGE RPI Expert Rating System

    Science.gov (United States)

    Reinisch, B. W.; Galkin, I. A.; Fung, S. F.; Benson, R. F.; Kozlov, A. V.; Khmyrov, G. M.; Garcia, L. N.

    2010-12-01

    Interpretation of Heliophysics wave data requires specialized knowledge of wave phenomena. Users of the virtual wave observatory (VWO) will greatly benefit from a data annotation service that will allow querying of data by phenomenon type, thus helping accomplish the VWO goal to make Heliophysics wave data searchable, understandable, and usable by the scientific community. Individual annotations can be sorted by phenomenon type and reduced into event lists (catalogs). However, in contrast to the event lists, annotation records allow a greater flexibility of collaborative management by more easily admitting operations of addition, revision, or deletion. They can therefore become the building blocks for an interactive Annotation Service with a suitable graphic user interface to the VWO middleware. The VWO Annotation Service vision is an interactive, collaborative sharing of domain expert knowledge with fellow scientists and students alike. An effective prototype of the VWO Annotation Service has been in operation at the University of Massachusetts Lowell since 2001. An expert rating system (ERS) was developed for annotating the IMAGE radio plasma imager (RPI) active sounding data containing 1.2 million plasmagrams. The RPI data analysts can use ERS to submit expert ratings of plasmagram features, such as presence of echo traces resulted from reflected RPI signals from distant plasma structures. Since its inception in 2001, the RPI ERS has accumulated 7351 expert plasmagram ratings in 16 phenomenon categories, together with free-text descriptions and other metadata. In addition to human expert ratings, the system holds 225,125 ratings submitted by the CORPRAL data prospecting software that employs a model of the human pre-attentive vision to select images potentially containing interesting features. The annotation records proved to be instrumental in a number of investigations where manual data exploration would have been prohibitively tedious and expensive

  18. Annotated Bibliography; Freedom of Information Center Reports and Summary Papers.

    Science.gov (United States)

    Freedom of Information Center, Columbia, MO.

    This bibliography lists and annotates almost 400 information reports, opinion papers, and summary papers dealing with freedom of information. Topics covered include the nature of press freedom and increased press efforts toward more open access to information; the press situation in many foreign countries, including France, Sweden, Communist…

  19. An updated and annotated list of Indian lizards (Reptilia: Sauria based on a review of distribution records and checklists of Indian reptiles

    Directory of Open Access Journals (Sweden)

    P.D. Venugopal

    2010-03-01

    Full Text Available Over the past two decades many checklists of reptiles of India and adjacent countries have been published. These publications have furthered the growth of knowledge on systematics, distribution and biogeography of Indian reptiles, and the field of herpetology in India in general. However, the reporting format of most such checklists of Indian reptiles does not provide a basis for direct verification of the information presented. As a result, mistakes in the inclusion and omission of species have been perpetuated and the exact number of reptile species reported from India still remains unclear. A verification of the current listings based on distributional records and review of published checklists revealed that 199 species of lizards (Reptilia: Sauria are currently validly reported on the basis of distributional records within the boundaries of India. Seventeen other lizard species have erroneously been included in earlier checklists of Indian reptiles. Omissions of species by these checklists have been even more numerous than erroneous inclusions. In this paper, I present a plea to report species lists as annotated checklists which corroborate the inclusion and omission of species by providing valid source references or notes.

  20. Annotated Bibliography on Koreans in America. Working Papers on Asian American Studies.

    Science.gov (United States)

    Kim, Christopher; Takabashi, Michiko, Ed.

    This annotated bibliography lists bibliographies, directories, articles, unpublished papers, manuals, citations, conference reports, and other documents dealing with Koreans in America. Topics considered include general history, immigration history, deportation cases, Korean students, State and Federal legislation affecting Koreans in America,…

  1. An annotated checklist of the vascular flora of Washington County Mississippi

    Science.gov (United States)

    Field explorations have yielded 257 species new to Washington County, Mississippi and Calandrinia ciliata (Ruiz & Pav.) DC. and Ruellia nudiflora (Engelm. & Gray) Urban new to the state. An annotated list of 796 taxa for Washington County is provided and excludes 62 species that were reported from ...

  2. Ethical Issues in Health Services: A Report and Annotated Bibliography.

    Science.gov (United States)

    Carmody, James

    This publication identifies, discusses, and lists areas for further research for five ethical issues related to health services: 1) the right to health care; 2) death and euthanasia; 3) human experimentation; 4) genetic engineering; and, 5) abortion. Following a discussion of each issue is a selected annotated bibliography covering the years 1967…

  3. Estimating the annotation error rate of curated GO database sequence annotations

    Directory of Open Access Journals (Sweden)

    Brown Alfred L

    2007-05-01

    Full Text Available Abstract Background Annotations that describe the function of sequences are enormously important to researchers during laboratory investigations and when making computational inferences. However, there has been little investigation into the data quality of sequence function annotations. Here we have developed a new method of estimating the error rate of curated sequence annotations, and applied this to the Gene Ontology (GO sequence database (GOSeqLite. This method involved artificially adding errors to sequence annotations at known rates, and used regression to model the impact on the precision of annotations based on BLAST matched sequences. Results We estimated the error rate of curated GO sequence annotations in the GOSeqLite database (March 2006 at between 28% and 30%. Annotations made without use of sequence similarity based methods (non-ISS had an estimated error rate of between 13% and 18%. Annotations made with the use of sequence similarity methodology (ISS had an estimated error rate of 49%. Conclusion While the overall error rate is reasonably low, it would be prudent to treat all ISS annotations with caution. Electronic annotators that use ISS annotations as the basis of predictions are likely to have higher false prediction rates, and for this reason designers of these systems should consider avoiding ISS annotations where possible. Electronic annotators that use ISS annotations to make predictions should be viewed sceptically. We recommend that curators thoroughly review ISS annotations before accepting them as valid. Overall, users of curated sequence annotations from the GO database should feel assured that they are using a comparatively high quality source of information.

  4. An Annotated Checklist of the Mammals of Kuwait

    Directory of Open Access Journals (Sweden)

    Peter J. Cowan

    2013-12-01

    Full Text Available An annotated checklist of the mammals of Kuwait is presented, based on the literature, personal communications, a Kuwait website and a blog and the author’s observations. Twenty five species occur, a further four are uncommon or rare visitors, six used to occur whilst another two are of doubtful provenance. This list should assist those planning desert rehabilitation, animal reintroduction and protected area projects in Kuwait.

  5. Simulator fidelity and training effectiveness: a comprehensive bibliography with selected annotations

    International Nuclear Information System (INIS)

    Rankin, W.L.; Bolton, P.A.; Shikiar, R.; Saari, L.M.

    1984-05-01

    This document contains a comprehensive bibliography on the topic of simulator fidelity and training effectiveness, prepared during the preliminary phases of work on an NRC-sponsored project on the Role of Nuclear Power Plant Simulators in Operator Licensing and Training. Section A of the document is an annotated bibliography consisting of articles and reports with relevance to the psychological aspects of simulator fidelity and the effectiveness of training simulators in a variety of settings, including military. The annotated items are drawn from a more comprehensive bibliography, presented in Section B, listing documents treating the role of simulators in operator training both in the nuclear industry and elsewhere

  6. Recruiting and Retaining Army Nurses: An Annotated Bibliography

    OpenAIRE

    Roberts, Benjamin J.; Kocher, Kathryn M.

    1988-01-01

    This listing of annotated references includes studies dealing with the labor market behavior of registered nurses. References describing both the military and the civilian working environments for RNs are contained in the bibliography. Because the Army must recruit and retain nurses in the context of the national labor market for nurses, a broad perspective was maintained in selecting publication. Studies dealing with the factors influential in attracting and retaining Army Active Duty and Re...

  7. Questionnaires for research: an annotated bibliography on design, construction, and use.

    Science.gov (United States)

    Dale R. Potter; Kathryn M. Sharpe; John C. Hendee; Roger N. Clark

    1972-01-01

    Questionnaires as social science tools are used increasingly to study people aspects of outdoor recreation and other natural resource fields. An annotated bibliography including subjective evaluations of each article and a keyword list is presented for 193 references to aid researchers and managers in the design, construction, and use of mail questionnaires.

  8. Annotated list of marine alien species in the Mediterranean with records of the worst invasive species

    Directory of Open Access Journals (Sweden)

    A. ZENETOS

    2005-12-01

    Full Text Available This collaborative effort by many specialists across the Mediterranean presents an updated annotated list of alien marine species in the Mediterranean Sea. Alien species have been grouped into six broad categories namely established, casual, questionable, cryptogenic, excluded and invasive, and presented in lists of major ecofunctional/taxonomic groups. The establishment success within each group is provided while the questionable and excluded records are commented in brief. A total of 963 alien species have been reported from the Mediterranean until December 2005, 218 of which have been classified as excluded (23% leaving 745 of the recorded species as valid aliens. Of these 385 (52% are already well established, 262 (35% are casual records, while 98 species (13% remain “questionable” records. The species cited in this work belong mostly to zoobenthos and in particular to Mollusca and Crustacea, while Fish and Phytobenthos are the next two groups which prevail among alien biota in the Mediterranean. The available information depends greatly on the taxonomic group examined. Thus, besides the three groups explicitly addressed in the CIESM atlas series (Fish, Decapoda/Crustacea and Mollusca, which are however updated in the present work, Polychaeta, Phytobenthos, Phytoplankton and Zooplankton are also addressed in this study. Among other zoobenthic taxa sufficiently covered in this study are Echinodermata, Sipuncula, Bryozoa and Ascidiacea. On the contrary, taxa such as Foraminifera, Amphipoda and Isopoda, that are not well studied in the Mediterranean, are insufficiently covered. A gap of knowledge is also noticed in Parasites, which, although ubiquitous and pervasive in marine systems, have been relatively unexplored as to their role in marine invasions. Conclusively the lack of funding purely systematic studies in the region has led to underestimation of the number of aliens in the Mediterranean. Emphasis is put on those species that are

  9. Special Project Grants Awarded for Improvement in Nurse Training. A Listing.

    Science.gov (United States)

    National Institutes of Health (DHEW), Bethesda, MD. Div. of Nursing.

    This current directory lists alphabetically by state, special projects funded by the Title II Nurse Training Act of the Health Manpower Act of 1968, which are awarded for improvement programs in nurse training. Projects funded through June 1971 are listed and briefly annotated, including planning grants awarded for the first time during the fiscal…

  10. annot8r: GO, EC and KEGG annotation of EST datasets

    Directory of Open Access Journals (Sweden)

    Schmid Ralf

    2008-04-01

    Full Text Available Abstract Background The expressed sequence tag (EST methodology is an attractive option for the generation of sequence data for species for which no completely sequenced genome is available. The annotation and comparative analysis of such datasets poses a formidable challenge for research groups that do not have the bioinformatics infrastructure of major genome sequencing centres. Therefore, there is a need for user-friendly tools to facilitate the annotation of non-model species EST datasets with well-defined ontologies that enable meaningful cross-species comparisons. To address this, we have developed annot8r, a platform for the rapid annotation of EST datasets with GO-terms, EC-numbers and KEGG-pathways. Results annot8r automatically downloads all files relevant for the annotation process and generates a reference database that stores UniProt entries, their associated Gene Ontology (GO, Enzyme Commission (EC and Kyoto Encyclopaedia of Genes and Genomes (KEGG annotation and additional relevant data. For each of GO, EC and KEGG, annot8r extracts a specific sequence subset from the UniProt dataset based on the information stored in the reference database. These three subsets are then formatted for BLAST searches. The user provides the protein or nucleotide sequences to be annotated and annot8r runs BLAST searches against these three subsets. The BLAST results are parsed and the corresponding annotations retrieved from the reference database. The annotations are saved both as flat files and also in a relational postgreSQL results database to facilitate more advanced searches within the results. annot8r is integrated with the PartiGene suite of EST analysis tools. Conclusion annot8r is a tool that assigns GO, EC and KEGG annotations for data sets resulting from EST sequencing projects both rapidly and efficiently. The benefits of an underlying relational database, flexibility and the ease of use of the program make it ideally suited for non

  11. Market concentration, corporate governance and innovation: Partial and combined effects in US-listed firms

    Directory of Open Access Journals (Sweden)

    Mehmet Ugur

    2012-10-01

    Full Text Available Existing research on the relationship between market concentration and innovation has produced conflicting findings. In addition, the emerging literature on the relationship between corporate governance and innovation tends to focus only on partial effects of corporate governance on innovation. We aim to contribute to the debate by investigating both partial and combined effects of corporate governance and market concentration on innovation. Utilising a dataset for 1,400 non-financial US-listed companies and two-way cluster-robust estimation methodology, we report several findings. First, the relationship between market concentration and innovation is non-linear. Secondly, the relationship has a U-shape in the case of input measure of innovation (research and development - R&D – expenditures; but it has an inverted-U shape when net book-value of brands and patents is used as output measure of innovation. Third, corporate governance indicators such as anti-takeover defences and insider control tend to have a negative partial effect on R&D expenditures but a positive partial effect on net book-value of brands and patents. Finally, when interacted with market concentration, anti-takeover defences and insider control act as complements to market concentration. Hence, firms with strong anti-take-over defences and under insider control tend to spend more on R&D but are less able to generate valuable brands and patents as market concentration increases. These results are based on two-way cluster-robust estimation, which takes account of both serial and cross-sectional dependence in the error terms.

  12. Thermal effects on aquatic organisms: annotated bibliography of the 1974 literature

    International Nuclear Information System (INIS)

    Coutant, C.C.; Talmage, S.S.; Carrier, R.F.; Collier, B.N.

    1975-06-01

    The annotated bibliography covers the 1974 literature concerning thermal effects on aquatic organisms. Emphasis is placed on the effects of the release of thermal effluents on aquatic ecosystems. Indexes are provided for: author, keywords, subject category, geographic location, taxon, and title (alphabetical listing of keyword-in-context of the nontrivial words in the title). (CH)

  13. Phenex: ontological annotation of phenotypic diversity.

    Directory of Open Access Journals (Sweden)

    James P Balhoff

    2010-05-01

    Full Text Available Phenotypic differences among species have long been systematically itemized and described by biologists in the process of investigating phylogenetic relationships and trait evolution. Traditionally, these descriptions have been expressed in natural language within the context of individual journal publications or monographs. As such, this rich store of phenotype data has been largely unavailable for statistical and computational comparisons across studies or integration with other biological knowledge.Here we describe Phenex, a platform-independent desktop application designed to facilitate efficient and consistent annotation of phenotypic similarities and differences using Entity-Quality syntax, drawing on terms from community ontologies for anatomical entities, phenotypic qualities, and taxonomic names. Phenex can be configured to load only those ontologies pertinent to a taxonomic group of interest. The graphical user interface was optimized for evolutionary biologists accustomed to working with lists of taxa, characters, character states, and character-by-taxon matrices.Annotation of phenotypic data using ontologies and globally unique taxonomic identifiers will allow biologists to integrate phenotypic data from different organisms and studies, leveraging decades of work in systematics and comparative morphology.

  14. Phenex: ontological annotation of phenotypic diversity.

    Science.gov (United States)

    Balhoff, James P; Dahdul, Wasila M; Kothari, Cartik R; Lapp, Hilmar; Lundberg, John G; Mabee, Paula; Midford, Peter E; Westerfield, Monte; Vision, Todd J

    2010-05-05

    Phenotypic differences among species have long been systematically itemized and described by biologists in the process of investigating phylogenetic relationships and trait evolution. Traditionally, these descriptions have been expressed in natural language within the context of individual journal publications or monographs. As such, this rich store of phenotype data has been largely unavailable for statistical and computational comparisons across studies or integration with other biological knowledge. Here we describe Phenex, a platform-independent desktop application designed to facilitate efficient and consistent annotation of phenotypic similarities and differences using Entity-Quality syntax, drawing on terms from community ontologies for anatomical entities, phenotypic qualities, and taxonomic names. Phenex can be configured to load only those ontologies pertinent to a taxonomic group of interest. The graphical user interface was optimized for evolutionary biologists accustomed to working with lists of taxa, characters, character states, and character-by-taxon matrices. Annotation of phenotypic data using ontologies and globally unique taxonomic identifiers will allow biologists to integrate phenotypic data from different organisms and studies, leveraging decades of work in systematics and comparative morphology.

  15. Annotated checklist of Albanian butterflies (Lepidoptera, Papilionoidea and Hesperioidea

    Directory of Open Access Journals (Sweden)

    Rudi Verovnik

    2013-08-01

    Full Text Available The Republic of Albania has a rich diversity of flora and fauna. However, due to its political isolation, it has never been studied in great depth, and consequently, the existing list of butterfly species is outdated and in need of radical amendment. In addition to our personal data, we have studied the available literature, and can report a total of 196 butterfly species recorded from the country. For some of the species in the list we have given explanations for their inclusion and made other annotations. Doubtful records have been removed from the list, and changes in taxonomy have been updated and discussed separately. The purpose of our paper is to remove confusion and conflict regarding published records. However, the revised checklist should not be considered complete: it represents a starting point for further research.

  16. Data for constructing insect genome content matrices for phylogenetic analysis and functional annotation

    Directory of Open Access Journals (Sweden)

    Jeffrey Rosenfeld

    2016-03-01

    Full Text Available Twenty one fully sequenced and well annotated insect genomes were used to construct genome content matrices for phylogenetic analysis and functional annotation of insect genomes. To examine the role of e-value cutoff in ortholog determination we used scaled e-value cutoffs and a single linkage clustering approach.. The present communication includes (1 a list of the genomes used to construct the genome content phylogenetic matrices, (2 a nexus file with the data matrices used in phylogenetic analysis, (3 a nexus file with the Newick trees generated by phylogenetic analysis, (4 an excel file listing the Core (CORE genes and Unique (UNI genes found in five insect groups, and (5 a figure showing a plot of consistency index (CI versus percent of unannotated genes that are apomorphies in the data set for gene losses and gains and bar plots of gains and losses for four consistency index (CI cutoffs.

  17. Ubiquitous Annotation Systems

    DEFF Research Database (Denmark)

    Hansen, Frank Allan

    2006-01-01

    Ubiquitous annotation systems allow users to annotate physical places, objects, and persons with digital information. Especially in the field of location based information systems much work has been done to implement adaptive and context-aware systems, but few efforts have focused on the general...... requirements for linking information to objects in both physical and digital space. This paper surveys annotation techniques from open hypermedia systems, Web based annotation systems, and mobile and augmented reality systems to illustrate different approaches to four central challenges ubiquitous annotation...... systems have to deal with: anchoring, structuring, presentation, and authoring. Through a number of examples each challenge is discussed and HyCon, a context-aware hypermedia framework developed at the University of Aarhus, Denmark, is used to illustrate an integrated approach to ubiquitous annotations...

  18. College and University Rankings: Part 2--An Annotated Bibliography of Analysis, Criticism, and Evaluation.

    Science.gov (United States)

    Hattendorf, Lynn C.

    1987-01-01

    This annotated bibliography of recent articles and books on academic rankings updates an article in the Spring 1986 "RQ." Items are listed by subject and ranking in general; individual guides; subject areas including accounting, advertising, biogeography, business, communications, data communications, economics, music, publishing,…

  19. Articles on Mass Communication in U.S. and Foreign Journals: A Selected Annotated Bibliography

    Science.gov (United States)

    McKerns, Joseph P.; Delahaye, Alfred N.

    1978-01-01

    Lists and annotates 212 journal articles on mass communication, grouped according to topic. Topics include audience and communicator analysis, broadcasting, communication theory, courts and law, criticism and defense of media, journalism education, government and media, history and biography, international topics, and public relations. (GW)

  20. AcEST(EST sequences of Adiantum capillus-veneris and their annotation) - AcEST | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us AcEST AcEST(EST sequences of Adiantum capillus-veneris and their annotation) Data detail Dat...a name AcEST(EST sequences of Adiantum capillus-veneris and their annotation) DOI 10.18908/lsdba.nbdc00839-0...01 Description of data contents EST sequence of Adiantum capillus-veneris and its annotation (clone ID, libr...le search URL http://togodb.biosciencedbc.jp/togodb/view/archive_acest#en Data acquisition method Capillary ...ainst UniProtKB/Swiss-Prot and UniProtKB/TrEMBL databases) Number of data entries Adiantum capillus-veneris

  1. DAVID Knowledgebase: a gene-centered database integrating heterogeneous gene annotation resources to facilitate high-throughput gene functional analysis

    Directory of Open Access Journals (Sweden)

    Baseler Michael W

    2007-11-01

    Full Text Available Abstract Background Due to the complex and distributed nature of biological research, our current biological knowledge is spread over many redundant annotation databases maintained by many independent groups. Analysts usually need to visit many of these bioinformatics databases in order to integrate comprehensive annotation information for their genes, which becomes one of the bottlenecks, particularly for the analytic task associated with a large gene list. Thus, a highly centralized and ready-to-use gene-annotation knowledgebase is in demand for high throughput gene functional analysis. Description The DAVID Knowledgebase is built around the DAVID Gene Concept, a single-linkage method to agglomerate tens of millions of gene/protein identifiers from a variety of public genomic resources into DAVID gene clusters. The grouping of such identifiers improves the cross-reference capability, particularly across NCBI and UniProt systems, enabling more than 40 publicly available functional annotation sources to be comprehensively integrated and centralized by the DAVID gene clusters. The simple, pair-wise, text format files which make up the DAVID Knowledgebase are freely downloadable for various data analysis uses. In addition, a well organized web interface allows users to query different types of heterogeneous annotations in a high-throughput manner. Conclusion The DAVID Knowledgebase is designed to facilitate high throughput gene functional analysis. For a given gene list, it not only provides the quick accessibility to a wide range of heterogeneous annotation data in a centralized location, but also enriches the level of biological information for an individual gene. Moreover, the entire DAVID Knowledgebase is freely downloadable or searchable at http://david.abcc.ncifcrf.gov/knowledgebase/.

  2. Lynx web services for annotations and systems analysis of multi-gene disorders.

    Science.gov (United States)

    Sulakhe, Dinanath; Taylor, Andrew; Balasubramanian, Sandhya; Feng, Bo; Xie, Bingqing; Börnigen, Daniela; Dave, Utpal J; Foster, Ian T; Gilliam, T Conrad; Maltsev, Natalia

    2014-07-01

    Lynx is a web-based integrated systems biology platform that supports annotation and analysis of experimental data and generation of weighted hypotheses on molecular mechanisms contributing to human phenotypes and disorders of interest. Lynx has integrated multiple classes of biomedical data (genomic, proteomic, pathways, phenotypic, toxicogenomic, contextual and others) from various public databases as well as manually curated data from our group and collaborators (LynxKB). Lynx provides tools for gene list enrichment analysis using multiple functional annotations and network-based gene prioritization. Lynx provides access to the integrated database and the analytical tools via REST based Web Services (http://lynx.ci.uchicago.edu/webservices.html). This comprises data retrieval services for specific functional annotations, services to search across the complete LynxKB (powered by Lucene), and services to access the analytical tools built within the Lynx platform. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. Articles on Mass Communication in U.S. and Foreign Journals: A Selected Annotated Bibliography--October, November, December 1978.

    Science.gov (United States)

    McKerns, Joseph P.; Delahaye, Alfred N.

    1979-01-01

    Lists and annotates more than 200 articles on mass communication, grouped according to topic. Topics include advertising, broadcasting, courts and law, education for journalism, international, management, public relations, and visual communications. (GT)

  4. Hydrologic bibliography of the Columbia River basalts in Washington with selected annotations

    International Nuclear Information System (INIS)

    Tanaka, H.; Wildrick, L.; Pearson, B.

    1979-08-01

    The objective of this compilation is to present a comprehensive listing of the published, unpublished, and open file references pertaining to the surface and subsurface hydrology of the Columbia River basalts within the State of Washington and is presented in support of Rockwell's hydrologic data compilation effort for the Basalt Waste Isolation Program. A comprehensive, annotated bibliography of the Pasco Basin (including the Hanford Site) hydrology has been prepared for Rockwell as part of the Pasco Basin hydrology studies. In order to avoid unnecessary duplication, no effort was made to include a complete list of bibliographic references on Hanford in this volume

  5. Annotated checklist and database for vascular plants of the Jemez Mountains

    Energy Technology Data Exchange (ETDEWEB)

    Foxx, T. S.; Pierce, L.; Tierney, G. D.; Hansen, L. A.

    1998-03-01

    Studies done in the last 40 years have provided information to construct a checklist of the Jemez Mountains. The present database and checklist builds on the basic list compiled by Teralene Foxx and Gail Tierney in the early 1980s. The checklist is annotated with taxonomic information, geographic and biological information, economic uses, wildlife cover, revegetation potential, and ethnographic uses. There are nearly 1000 species that have been noted for the Jemez Mountains. This list is cross-referenced with the US Department of Agriculture Natural Resource Conservation Service PLANTS database species names and acronyms. All information will soon be available on a Web Page.

  6. Towards Viral Genome Annotation Standards, Report from the 2010 NCBI Annotation Workshop.

    Science.gov (United States)

    Brister, James Rodney; Bao, Yiming; Kuiken, Carla; Lefkowitz, Elliot J; Le Mercier, Philippe; Leplae, Raphael; Madupu, Ramana; Scheuermann, Richard H; Schobel, Seth; Seto, Donald; Shrivastava, Susmita; Sterk, Peter; Zeng, Qiandong; Klimke, William; Tatusova, Tatiana

    2010-10-01

    Improvements in DNA sequencing technologies portend a new era in virology and could possibly lead to a giant leap in our understanding of viral evolution and ecology. Yet, as viral genome sequences begin to fill the world's biological databases, it is critically important to recognize that the scientific promise of this era is dependent on consistent and comprehensive genome annotation. With this in mind, the NCBI Genome Annotation Workshop recently hosted a study group tasked with developing sequence, function, and metadata annotation standards for viral genomes. This report describes the issues involved in viral genome annotation and reviews policy recommendations presented at the NCBI Annotation Workshop.

  7. Towards Viral Genome Annotation Standards, Report from the 2010 NCBI Annotation Workshop

    Directory of Open Access Journals (Sweden)

    Qiandong Zeng

    2010-10-01

    Full Text Available Improvements in DNA sequencing technologies portend a new era in virology and could possibly lead to a giant leap in our understanding of viral evolution and ecology. Yet, as viral genome sequences begin to fill the world’s biological databases, it is critically important to recognize that the scientific promise of this era is dependent on consistent and comprehensive genome annotation. With this in mind, the NCBI Genome Annotation Workshop recently hosted a study group tasked with developing sequence, function, and metadata annotation standards for viral genomes. This report describes the issues involved in viral genome annotation and reviews policy recommendations presented at the NCBI Annotation Workshop.

  8. Articles on Mass Communication in U.S. and Foreign Journals: A Selected Annotated Bibliography--January, February, March 1979.

    Science.gov (United States)

    McKerns, Joseph P.; Delahaye, Alfred N.

    1979-01-01

    Lists and annotates more than 200 articles on mass communication, grouped according to topic. Topics include advertising, broadcasting, courts and law, government and media, history and biography, international, management, public relations, and visual communication. (GT)

  9. Amino acid sequences of predicted proteins and their annotation for 95 organism species. - Gclust Server | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Gclust Server Amino acid sequences of predicted proteins and their annotation for 95 organis...m species. Data detail Data name Amino acid sequences of predicted proteins and their annotation for 95 orga...nism species. DOI 10.18908/lsdba.nbdc00464-001 Description of data contents Amino acid sequences of predicted proteins...Database Description Download License Update History of This Database Site Policy | Contact Us Amino acid sequences of predicted prot...eins and their annotation for 95 organism species. - Gclust Server | LSDB Archive ...

  10. The Suburban Press; First Steps toward an Annotated Bibliography. Suburban Press Research Series No. 16 and 17.

    Science.gov (United States)

    Northern Illinois Univ., De Kalb. Suburban Press Research Center.

    This bibliography lists journal articles concerning various aspects of the suburban press. Annotated selections, arranged alphabetically according to journal title, are gathered from the following periodicals: "Advertising Age,""Business Week,""Columbia Journalism Review,""Editor and Publisher,""Grassroots Editor,""Journalism…

  11. Articles on Mass Communication in U.S. and Foreign Journals: A Selected Annotated Bibliography--July, August, September 1979.

    Science.gov (United States)

    Delahaye, Alfred N.; McKerns, Joseph P.

    1979-01-01

    Lists and annotates more than 200 articles on mass communication, grouped according to topic. Topics include advertising, broadcasting, courts and law, journalism education, history and biography, international, public relations, visual communication, and women and media. (GT)

  12. Guidelines for visualizing and annotating rule-based models†

    Science.gov (United States)

    Chylek, Lily A.; Hu, Bin; Blinov, Michael L.; Emonet, Thierry; Faeder, James R.; Goldstein, Byron; Gutenkunst, Ryan N.; Haugh, Jason M.; Lipniacki, Tomasz; Posner, Richard G.; Yang, Jin; Hlavacek, William S.

    2011-01-01

    Rule-based modeling provides a means to represent cell signaling systems in a way that captures site-specific details of molecular interactions. For rule-based models to be more widely understood and (re)used, conventions for model visualization and annotation are needed. We have developed the concepts of an extended contact map and a model guide for illustrating and annotating rule-based models. An extended contact map represents the scope of a model by providing an illustration of each molecule, molecular component, direct physical interaction, post-translational modification, and enzyme-substrate relationship considered in a model. A map can also illustrate allosteric effects, structural relationships among molecular components, and compartmental locations of molecules. A model guide associates elements of a contact map with annotation and elements of an underlying model, which may be fully or partially specified. A guide can also serve to document the biological knowledge upon which a model is based. We provide examples of a map and guide for a published rule-based model that characterizes early events in IgE receptor (FcεRI) signaling. We also provide examples of how to visualize a variety of processes that are common in cell signaling systems but not considered in the example model, such as ubiquitination. An extended contact map and an associated guide can document knowledge of a cell signaling system in a form that is visual as well as executable. As a tool for model annotation, a map and guide can communicate the content of a model clearly and with precision, even for large models. PMID:21647530

  13. Guidelines for visualizing and annotating rule-based models.

    Science.gov (United States)

    Chylek, Lily A; Hu, Bin; Blinov, Michael L; Emonet, Thierry; Faeder, James R; Goldstein, Byron; Gutenkunst, Ryan N; Haugh, Jason M; Lipniacki, Tomasz; Posner, Richard G; Yang, Jin; Hlavacek, William S

    2011-10-01

    Rule-based modeling provides a means to represent cell signaling systems in a way that captures site-specific details of molecular interactions. For rule-based models to be more widely understood and (re)used, conventions for model visualization and annotation are needed. We have developed the concepts of an extended contact map and a model guide for illustrating and annotating rule-based models. An extended contact map represents the scope of a model by providing an illustration of each molecule, molecular component, direct physical interaction, post-translational modification, and enzyme-substrate relationship considered in a model. A map can also illustrate allosteric effects, structural relationships among molecular components, and compartmental locations of molecules. A model guide associates elements of a contact map with annotation and elements of an underlying model, which may be fully or partially specified. A guide can also serve to document the biological knowledge upon which a model is based. We provide examples of a map and guide for a published rule-based model that characterizes early events in IgE receptor (FcεRI) signaling. We also provide examples of how to visualize a variety of processes that are common in cell signaling systems but not considered in the example model, such as ubiquitination. An extended contact map and an associated guide can document knowledge of a cell signaling system in a form that is visual as well as executable. As a tool for model annotation, a map and guide can communicate the content of a model clearly and with precision, even for large models.

  14. Ferns and flowering plants of Klaserie Private Nature Reserve, eastern Transvaal: an annotated checklist

    Directory of Open Access Journals (Sweden)

    N. Zambatis

    1994-10-01

    Full Text Available An annotated checklist of the plant taxa of the Klaserie Private Nature Reserve, eastern Transvaal Lowveld, is presented. Of the 618 infrageneric taxa recorded, six are pteridophytes and the remainder angiosperms. Of these, 161 are monocotyledons and 451 dicotyledons. Five of the latter are currently listed in the Red Data List of the Transvaal, two of which are first records for the Transvaal Lowveld. The vegetation of the reserve shows strong affinities with the Savanna Biome, and to a lesser degree, with the Grassland Biome.

  15. The ART of CSI: An augmented reality tool (ART) to annotate crime scenes in forensic investigation

    NARCIS (Netherlands)

    Streefkerk, J.W.; Houben, M.; Amerongen, P. van; Haar, F. ter; Dijk, J.

    2013-01-01

    Forensic professionals have to collect evidence at crime scenes quickly and without contamination. A handheld Augmented Reality (AR) annotation tool allows these users to virtually tag evidence traces at crime scenes and to review, share and export evidence lists. In an user walkthrough with this

  16. Semi-Semantic Annotation: A guideline for the URDU.KON-TB treebank POS annotation

    Directory of Open Access Journals (Sweden)

    Qaiser ABBAS

    2016-12-01

    Full Text Available This work elaborates the semi-semantic part of speech annotation guidelines for the URDU.KON-TB treebank: an annotated corpus. A hierarchical annotation scheme was designed to label the part of speech and then applied on the corpus. This raw corpus was collected from the Urdu Wikipedia and the Jang newspaper and then annotated with the proposed semi-semantic part of speech labels. The corpus contains text of local & international news, social stories, sports, culture, finance, religion, traveling, etc. This exercise finally contributed a part of speech annotation to the URDU.KON-TB treebank. Twenty-two main part of speech categories are divided into subcategories, which conclude the morphological, and semantical information encoded in it. This article reports the annotation guidelines in major; however, it also briefs the development of the URDU.KON-TB treebank, which includes the raw corpus collection, designing & employment of annotation scheme and finally, its statistical evaluation and results. The guidelines presented as follows, will be useful for linguistic community to annotate the sentences not only for the national language Urdu but for the other indigenous languages like Punjab, Sindhi, Pashto, etc., as well.

  17. Articles on Mass Communication in U.S. and Foreign Journals: A Selected Annotated Bibliography--April, May, June 1979.

    Science.gov (United States)

    McKerns, Joseph P.; Delahaye, Alfred N.

    1979-01-01

    Lists and annotates 200 articles on mass communication, grouped according to topic. Topics include advertising, audience and communicator analysis, broadcasting, courts and law, education for journalism, government and media, international, management, public relations, and visual communication. (GT)

  18. tRNA sequence data, annotation data and curation data - tRNADB-CE | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available switchLanguage; BLAST Search Image Search Home About Archive Update History Data List Contact us tRNAD... tRNA sequence data, annotation data and curation data - tRNADB-CE | LSDB Archive ...

  19. BioAnnote: a software platform for annotating biomedical documents with application in medical learning environments.

    Science.gov (United States)

    López-Fernández, H; Reboiro-Jato, M; Glez-Peña, D; Aparicio, F; Gachet, D; Buenaga, M; Fdez-Riverola, F

    2013-07-01

    Automatic term annotation from biomedical documents and external information linking are becoming a necessary prerequisite in modern computer-aided medical learning systems. In this context, this paper presents BioAnnote, a flexible and extensible open-source platform for automatically annotating biomedical resources. Apart from other valuable features, the software platform includes (i) a rich client enabling users to annotate multiple documents in a user friendly environment, (ii) an extensible and embeddable annotation meta-server allowing for the annotation of documents with local or remote vocabularies and (iii) a simple client/server protocol which facilitates the use of our meta-server from any other third-party application. In addition, BioAnnote implements a powerful scripting engine able to perform advanced batch annotations. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  20. Eastern gas shales bibliography selected annotations: gas, oil, uranium, etc. Citations in bituminous shales worldwide

    Energy Technology Data Exchange (ETDEWEB)

    Hall, V.S. (comp.)

    1980-06-01

    This bibliography contains 2702 citations, most of which are annotated. They are arranged by author in numerical order with a geographical index following the listing. The work is international in scope and covers the early geological literature, continuing through 1979 with a few 1980 citations in Addendum II. Addendum I contains a listing of the reports, well logs and symposiums of the Unconventional Gas Recovery Program (UGR) through August 1979. There is an author-subject index for these publications following the listing. The second part of Addendum I is a listing of the UGR maps which also has a subject-author index following the map listing. Addendum II includes several important new titles on the Devonian shale as well as a few older citations which were not found until after the bibliography had been numbered and essentially completed. A geographic index for these citations follows this listing.

  1. Chado controller: advanced annotation management with a community annotation system.

    Science.gov (United States)

    Guignon, Valentin; Droc, Gaëtan; Alaux, Michael; Baurens, Franc-Christophe; Garsmeur, Olivier; Poiron, Claire; Carver, Tim; Rouard, Mathieu; Bocs, Stéphanie

    2012-04-01

    We developed a controller that is compliant with the Chado database schema, GBrowse and genome annotation-editing tools such as Artemis and Apollo. It enables the management of public and private data, monitors manual annotation (with controlled vocabularies, structural and functional annotation controls) and stores versions of annotation for all modified features. The Chado controller uses PostgreSQL and Perl. The Chado Controller package is available for download at http://www.gnpannot.org/content/chado-controller and runs on any Unix-like operating system, and documentation is available at http://www.gnpannot.org/content/chado-controller-doc The system can be tested using the GNPAnnot Sandbox at http://www.gnpannot.org/content/gnpannot-sandbox-form valentin.guignon@cirad.fr; stephanie.sidibe-bocs@cirad.fr Supplementary data are available at Bioinformatics online.

  2. Ion implantation: an annotated bibliography

    International Nuclear Information System (INIS)

    Ting, R.N.; Subramanyam, K.

    1975-10-01

    Ion implantation is a technique for introducing controlled amounts of dopants into target substrates, and has been successfully used for the manufacture of silicon semiconductor devices. Ion implantation is superior to other methods of doping such as thermal diffusion and epitaxy, in view of its advantages such as high degree of control, flexibility, and amenability to automation. This annotated bibliography of 416 references consists of journal articles, books, and conference papers in English and foreign languages published during 1973-74, on all aspects of ion implantation including range distribution and concentration profile, channeling, radiation damage and annealing, compound semiconductors, structural and electrical characterization, applications, equipment and ion sources. Earlier bibliographies on ion implantation, and national and international conferences in which papers on ion implantation were presented have also been listed separately

  3. Articles on Mass Communication in U.S. and Foreign Journals: A Selected Annotated Bibliography--October, November, December 1979.

    Science.gov (United States)

    McKerns, Joseph P.; Delahaye, Alfred N.

    1980-01-01

    Lists and annotates more than 200 articles on mass communication, grouped according to topic. Topics include advertising, broadcasting, courts and law, criticism and defense of media, history and biography, international, public relations, visual communication, and women and media. (GT)

  4. Combining gene prediction methods to improve metagenomic gene annotation

    Directory of Open Access Journals (Sweden)

    Rosen Gail L

    2011-01-01

    Full Text Available Abstract Background Traditional gene annotation methods rely on characteristics that may not be available in short reads generated from next generation technology, resulting in suboptimal performance for metagenomic (environmental samples. Therefore, in recent years, new programs have been developed that optimize performance on short reads. In this work, we benchmark three metagenomic gene prediction programs and combine their predictions to improve metagenomic read gene annotation. Results We not only analyze the programs' performance at different read-lengths like similar studies, but also separate different types of reads, including intra- and intergenic regions, for analysis. The main deficiencies are in the algorithms' ability to predict non-coding regions and gene edges, resulting in more false-positives and false-negatives than desired. In fact, the specificities of the algorithms are notably worse than the sensitivities. By combining the programs' predictions, we show significant improvement in specificity at minimal cost to sensitivity, resulting in 4% improvement in accuracy for 100 bp reads with ~1% improvement in accuracy for 200 bp reads and above. To correctly annotate the start and stop of the genes, we find that a consensus of all the predictors performs best for shorter read lengths while a unanimous agreement is better for longer read lengths, boosting annotation accuracy by 1-8%. We also demonstrate use of the classifier combinations on a real dataset. Conclusions To optimize the performance for both prediction and annotation accuracies, we conclude that the consensus of all methods (or a majority vote is the best for reads 400 bp and shorter, while using the intersection of GeneMark and Orphelia predictions is the best for reads 500 bp and longer. We demonstrate that most methods predict over 80% coding (including partially coding reads on a real human gut sample sequenced by Illumina technology.

  5. Annotated bibliography of selected reports relating to the isolation of nuclear waste in crystalline rock

    International Nuclear Information System (INIS)

    1988-06-01

    BMI/OCRD-29 is an annotated bibliography of published reports that have been produced for the US Department of Energy Crystalline Repository Project Office or the Swedish-American Cooperative Program on Radioactive Waste Storage in Mined Caverns. This document consists of a main report listing of citations and abstracts and a topical index

  6. GoGene: gene annotation in the fast lane.

    Science.gov (United States)

    Plake, Conrad; Royer, Loic; Winnenburg, Rainer; Hakenberg, Jörg; Schroeder, Michael

    2009-07-01

    High-throughput screens such as microarrays and RNAi screens produce huge amounts of data. They typically result in hundreds of genes, which are often further explored and clustered via enriched GeneOntology terms. The strength of such analyses is that they build on high-quality manual annotations provided with the GeneOntology. However, the weakness is that annotations are restricted to process, function and location and that they do not cover all known genes in model organisms. GoGene addresses this weakness by complementing high-quality manual annotation with high-throughput text mining extracting co-occurrences of genes and ontology terms from literature. GoGene contains over 4,000,000 associations between genes and gene-related terms for 10 model organisms extracted from more than 18,000,000 PubMed entries. It does not cover only process, function and location of genes, but also biomedical categories such as diseases, compounds, techniques and mutations. By bringing it all together, GoGene provides the most recent and most complete facts about genes and can rank them according to novelty and importance. GoGene accepts keywords, gene lists, gene sequences and protein sequences as input and supports search for genes in PubMed, EntrezGene and via BLAST. Since all associations of genes to terms are supported by evidence in the literature, the results are transparent and can be verified by the user. GoGene is available at http://gopubmed.org/gogene.

  7. What's New in Children's Literature for the Children of Louisiana? A Selected Annotated Bibliography with Readability Levels (Selected) and Associated Louisiana Content Standards

    Science.gov (United States)

    Webre, Elizabeth C.

    2011-01-01

    An annotated list of children's books published within the last 15 years and related to Louisiana culture, environment, and economics are linked to the Louisiana Content Standards. Readability levels of selected books are included, providing guidance as to whether a book is appropriate for independent student use. The thirty-three books listed are…

  8. Articles on Mass Communication in U.S. and Foreign Journals: A Selected Annotated Bibliography--January, February, March 1980.

    Science.gov (United States)

    McKerns, Joseph P.; Delahaye, Alfred N.

    1980-01-01

    Lists and annotates more than 250 articles on mass communication, grouped according to topic. Topics include advertising, audience and communicator analysis, broadcasting, community journalism, courts and law, criticism and defense of media, education for journalism, history and biography, international, management, public relations, visual…

  9. Annotated Bibliography of Publications on Watershed Management and Ecological Studies at Coweeta Hydrologic Laboratory, 1934-1994

    Science.gov (United States)

    Patricia L. Stickney; Lloyd W. Swift; Wayne T. Swank

    1994-01-01

    This annotated bibliography spans over60 years of research at Coweeta from 1934 through part of 1994, and includes earlier papers onforest influenteswritten atthe Appalachian Station beforethe establishment of Coweeta. It is a modification and update of previous compilations of research results at Coweeta and contains aseparatesection listing theses and dissertations....

  10. MixtureTree annotator: a program for automatic colorization and visual annotation of MixtureTree.

    Directory of Open Access Journals (Sweden)

    Shu-Chuan Chen

    Full Text Available The MixtureTree Annotator, written in JAVA, allows the user to automatically color any phylogenetic tree in Newick format generated from any phylogeny reconstruction program and output the Nexus file. By providing the ability to automatically color the tree by sequence name, the MixtureTree Annotator provides a unique advantage over any other programs which perform a similar function. In addition, the MixtureTree Annotator is the only package that can efficiently annotate the output produced by MixtureTree with mutation information and coalescent time information. In order to visualize the resulting output file, a modified version of FigTree is used. Certain popular methods, which lack good built-in visualization tools, for example, MEGA, Mesquite, PHY-FI, TreeView, treeGraph and Geneious, may give results with human errors due to either manually adding colors to each node or with other limitations, for example only using color based on a number, such as branch length, or by taxonomy. In addition to allowing the user to automatically color any given Newick tree by sequence name, the MixtureTree Annotator is the only method that allows the user to automatically annotate the resulting tree created by the MixtureTree program. The MixtureTree Annotator is fast and easy-to-use, while still allowing the user full control over the coloring and annotating process.

  11. Partial dependency parsing for Irish

    OpenAIRE

    Uí Dhonnchadha, Elaine; van Genabith, Josef

    2010-01-01

    In this paper we present a partial dependency parser for Irish, in which Constraint Grammar (CG) rules are used to annotate dependency relations and grammatical functions in unrestricted Irish text. Chunking is performed using a regular-expression grammar which operates on the dependency tagged sentences. As this is the first implementation of a parser for unrestricted Irish text (to our knowledge), there were no guidelines or precedents available. Therefore deciding what constitutes a syntac...

  12. An annotated synopsis of the powder post beetles of Iran (Coleoptera: Bostrichoidea: Bostrichidae

    Directory of Open Access Journals (Sweden)

    Lan-Yu Liu

    2016-07-01

    Full Text Available An annotated synopsis of Iranian Bostrichidae (Coleoptera: Bostrichoidea is provided as a basis for future studies, with notes on distribution, host plants, biology and economic importance. In total, 31 species from 18 genera and 4 subfamilies (Bostrichinae, Dinoderinae, Lyctinae and Psoinae are listed from Iran. Sinoxylon anale Lesne, 1897, Sinoxylon perforans (Schrank, 1789, Stephanopachys linearis (Kugelann, 1792 and Xylopertha retusa (Olivier, 1790 are new records for Iran.

  13. A Description and Source Listing of Curriculum Materials in Agricultural Education. 1972-73.

    Science.gov (United States)

    American Vocational Association, Washington, DC. Agricultural Education Div.

    Listed are 246 curriculum material items in ten categories: field crops, horticulture, forestry, animal science, soils, diseases and pests, agricultural engineering, agricultural economics, agricultural occupations, and professional. Most materials are annotated and all are classified according to the AGPEX filing system. Bibliographic and…

  14. Protannotator: a semiautomated pipeline for chromosome-wise functional annotation of the "missing" human proteome.

    Science.gov (United States)

    Islam, Mohammad T; Garg, Gagan; Hancock, William S; Risk, Brian A; Baker, Mark S; Ranganathan, Shoba

    2014-01-03

    The chromosome-centric human proteome project (C-HPP) aims to define the complete set of proteins encoded in each human chromosome. The neXtProt database (September 2013) lists 20,128 proteins for the human proteome, of which 3831 human proteins (∼19%) are considered "missing" according to the standard metrics table (released September 27, 2013). In support of the C-HPP initiative, we have extended the annotation strategy developed for human chromosome 7 "missing" proteins into a semiautomated pipeline to functionally annotate the "missing" human proteome. This pipeline integrates a suite of bioinformatics analysis and annotation software tools to identify homologues and map putative functional signatures, gene ontology, and biochemical pathways. From sequential BLAST searches, we have primarily identified homologues from reviewed nonhuman mammalian proteins with protein evidence for 1271 (33.2%) "missing" proteins, followed by 703 (18.4%) homologues from reviewed nonhuman mammalian proteins and subsequently 564 (14.7%) homologues from reviewed human proteins. Functional annotations for 1945 (50.8%) "missing" proteins were also determined. To accelerate the identification of "missing" proteins from proteomics studies, we generated proteotypic peptides in silico. Matching these proteotypic peptides to ENCODE proteogenomic data resulted in proteomic evidence for 107 (2.8%) of the 3831 "missing proteins, while evidence from a recent membrane proteomic study supported the existence for another 15 "missing" proteins. The chromosome-wise functional annotation of all "missing" proteins is freely available to the scientific community through our web server (http://biolinfo.org/protannotator).

  15. Evaluating Hierarchical Structure in Music Annotations.

    Science.gov (United States)

    McFee, Brian; Nieto, Oriol; Farbood, Morwaread M; Bello, Juan Pablo

    2017-01-01

    Music exhibits structure at multiple scales, ranging from motifs to large-scale functional components. When inferring the structure of a piece, different listeners may attend to different temporal scales, which can result in disagreements when they describe the same piece. In the field of music informatics research (MIR), it is common to use corpora annotated with structural boundaries at different levels. By quantifying disagreements between multiple annotators, previous research has yielded several insights relevant to the study of music cognition. First, annotators tend to agree when structural boundaries are ambiguous. Second, this ambiguity seems to depend on musical features, time scale, and genre. Furthermore, it is possible to tune current annotation evaluation metrics to better align with these perceptual differences. However, previous work has not directly analyzed the effects of hierarchical structure because the existing methods for comparing structural annotations are designed for "flat" descriptions, and do not readily generalize to hierarchical annotations. In this paper, we extend and generalize previous work on the evaluation of hierarchical descriptions of musical structure. We derive an evaluation metric which can compare hierarchical annotations holistically across multiple levels. sing this metric, we investigate inter-annotator agreement on the multilevel annotations of two different music corpora, investigate the influence of acoustic properties on hierarchical annotations, and evaluate existing hierarchical segmentation algorithms against the distribution of inter-annotator agreement.

  16. Evaluating Hierarchical Structure in Music Annotations

    Directory of Open Access Journals (Sweden)

    Brian McFee

    2017-08-01

    Full Text Available Music exhibits structure at multiple scales, ranging from motifs to large-scale functional components. When inferring the structure of a piece, different listeners may attend to different temporal scales, which can result in disagreements when they describe the same piece. In the field of music informatics research (MIR, it is common to use corpora annotated with structural boundaries at different levels. By quantifying disagreements between multiple annotators, previous research has yielded several insights relevant to the study of music cognition. First, annotators tend to agree when structural boundaries are ambiguous. Second, this ambiguity seems to depend on musical features, time scale, and genre. Furthermore, it is possible to tune current annotation evaluation metrics to better align with these perceptual differences. However, previous work has not directly analyzed the effects of hierarchical structure because the existing methods for comparing structural annotations are designed for “flat” descriptions, and do not readily generalize to hierarchical annotations. In this paper, we extend and generalize previous work on the evaluation of hierarchical descriptions of musical structure. We derive an evaluation metric which can compare hierarchical annotations holistically across multiple levels. sing this metric, we investigate inter-annotator agreement on the multilevel annotations of two different music corpora, investigate the influence of acoustic properties on hierarchical annotations, and evaluate existing hierarchical segmentation algorithms against the distribution of inter-annotator agreement.

  17. Pipeline to upgrade the genome annotations

    Directory of Open Access Journals (Sweden)

    Lijin K. Gopi

    2017-12-01

    Full Text Available Current era of functional genomics is enriched with good quality draft genomes and annotations for many thousands of species and varieties with the support of the advancements in the next generation sequencing technologies (NGS. Around 25,250 genomes, of the organisms from various kingdoms, are submitted in the NCBI genome resource till date. Each of these genomes was annotated using various tools and knowledge-bases that were available during the period of the annotation. It is obvious that these annotations will be improved if the same genome is annotated using improved tools and knowledge-bases. Here we present a new genome annotation pipeline, strengthened with various tools and knowledge-bases that are capable of producing better quality annotations from the consensus of the predictions from different tools. This resource also perform various additional annotations, apart from the usual gene predictions and functional annotations, which involve SSRs, novel repeats, paralogs, proteins with transmembrane helices, signal peptides etc. This new annotation resource is trained to evaluate and integrate all the predictions together to resolve the overlaps and ambiguities of the boundaries. One of the important highlights of this resource is the capability of predicting the phylogenetic relations of the repeats using the evolutionary trace analysis and orthologous gene clusters. We also present a case study, of the pipeline, in which we upgrade the genome annotation of Nelumbo nucifera (sacred lotus. It is demonstrated that this resource is capable of producing an improved annotation for a better understanding of the biology of various organisms.

  18. Community annotation and bioinformatics workforce development in concert--Little Skate Genome Annotation Workshops and Jamborees.

    Science.gov (United States)

    Wang, Qinghua; Arighi, Cecilia N; King, Benjamin L; Polson, Shawn W; Vincent, James; Chen, Chuming; Huang, Hongzhan; Kingham, Brewster F; Page, Shallee T; Rendino, Marc Farnum; Thomas, William Kelley; Udwary, Daniel W; Wu, Cathy H

    2012-01-01

    Recent advances in high-throughput DNA sequencing technologies have equipped biologists with a powerful new set of tools for advancing research goals. The resulting flood of sequence data has made it critically important to train the next generation of scientists to handle the inherent bioinformatic challenges. The North East Bioinformatics Collaborative (NEBC) is undertaking the genome sequencing and annotation of the little skate (Leucoraja erinacea) to promote advancement of bioinformatics infrastructure in our region, with an emphasis on practical education to create a critical mass of informatically savvy life scientists. In support of the Little Skate Genome Project, the NEBC members have developed several annotation workshops and jamborees to provide training in genome sequencing, annotation and analysis. Acting as a nexus for both curation activities and dissemination of project data, a project web portal, SkateBase (http://skatebase.org) has been developed. As a case study to illustrate effective coupling of community annotation with workforce development, we report the results of the Mitochondrial Genome Annotation Jamborees organized to annotate the first completely assembled element of the Little Skate Genome Project, as a culminating experience for participants from our three prior annotation workshops. We are applying the physical/virtual infrastructure and lessons learned from these activities to enhance and streamline the genome annotation workflow, as we look toward our continuing efforts for larger-scale functional and structural community annotation of the L. erinacea genome.

  19. Community annotation and bioinformatics workforce development in concert—Little Skate Genome Annotation Workshops and Jamborees

    Science.gov (United States)

    Wang, Qinghua; Arighi, Cecilia N.; King, Benjamin L.; Polson, Shawn W.; Vincent, James; Chen, Chuming; Huang, Hongzhan; Kingham, Brewster F.; Page, Shallee T.; Farnum Rendino, Marc; Thomas, William Kelley; Udwary, Daniel W.; Wu, Cathy H.

    2012-01-01

    Recent advances in high-throughput DNA sequencing technologies have equipped biologists with a powerful new set of tools for advancing research goals. The resulting flood of sequence data has made it critically important to train the next generation of scientists to handle the inherent bioinformatic challenges. The North East Bioinformatics Collaborative (NEBC) is undertaking the genome sequencing and annotation of the little skate (Leucoraja erinacea) to promote advancement of bioinformatics infrastructure in our region, with an emphasis on practical education to create a critical mass of informatically savvy life scientists. In support of the Little Skate Genome Project, the NEBC members have developed several annotation workshops and jamborees to provide training in genome sequencing, annotation and analysis. Acting as a nexus for both curation activities and dissemination of project data, a project web portal, SkateBase (http://skatebase.org) has been developed. As a case study to illustrate effective coupling of community annotation with workforce development, we report the results of the Mitochondrial Genome Annotation Jamborees organized to annotate the first completely assembled element of the Little Skate Genome Project, as a culminating experience for participants from our three prior annotation workshops. We are applying the physical/virtual infrastructure and lessons learned from these activities to enhance and streamline the genome annotation workflow, as we look toward our continuing efforts for larger-scale functional and structural community annotation of the L. erinacea genome. PMID:22434832

  20. Reasoning with Annotations of Texts

    OpenAIRE

    Ma , Yue; Lévy , François; Ghimire , Sudeep

    2011-01-01

    International audience; Linguistic and semantic annotations are important features for text-based applications. However, achieving and maintaining a good quality of a set of annotations is known to be a complex task. Many ad hoc approaches have been developed to produce various types of annotations, while comparing those annotations to improve their quality is still rare. In this paper, we propose a framework in which both linguistic and domain information can cooperate to reason with annotat...

  1. A selective annotated bibliography for clinical audiology (1988-2008): reference works.

    Science.gov (United States)

    Ferrer-Vinent, Susan T; Ferrer-Vinent, Ignacio J

    2009-06-01

    This is the 1st in a series of 3 planned companion articles that present a selected, annotated, and indexed bibliography of clinical audiology publications from 1988 to 2008. Research and preparation of the bibliography were based on published guidelines, professional audiology experience, and professional librarian experience. This article presents reference works (dictionaries, encyclopedias, handbooks, and manuals). The future planned articles will cover other monographs, periodicals, and online resources. Audiologists and librarians can use these lists as a guide when seeking clinical audiology literature.

  2. Annotation and retrieval system of CAD models based on functional semantics

    Science.gov (United States)

    Wang, Zhansong; Tian, Ling; Duan, Wenrui

    2014-11-01

    CAD model retrieval based on functional semantics is more significant than content-based 3D model retrieval during the mechanical conceptual design phase. However, relevant research is still not fully discussed. Therefore, a functional semantic-based CAD model annotation and retrieval method is proposed to support mechanical conceptual design and design reuse, inspire designer creativity through existing CAD models, shorten design cycle, and reduce costs. Firstly, the CAD model functional semantic ontology is constructed to formally represent the functional semantics of CAD models and describe the mechanical conceptual design space comprehensively and consistently. Secondly, an approach to represent CAD models as attributed adjacency graphs(AAG) is proposed. In this method, the geometry and topology data are extracted from STEP models. On the basis of AAG, the functional semantics of CAD models are annotated semi-automatically by matching CAD models that contain the partial features of which functional semantics have been annotated manually, thereby constructing CAD Model Repository that supports model retrieval based on functional semantics. Thirdly, a CAD model retrieval algorithm that supports multi-function extended retrieval is proposed to explore more potential creative design knowledge in the semantic level. Finally, a prototype system, called Functional Semantic-based CAD Model Annotation and Retrieval System(FSMARS), is implemented. A case demonstrates that FSMARS can successfully botain multiple potential CAD models that conform to the desired function. The proposed research addresses actual needs and presents a new way to acquire CAD models in the mechanical conceptual design phase.

  3. Semantic annotation of consumer health questions.

    Science.gov (United States)

    Kilicoglu, Halil; Ben Abacha, Asma; Mrabet, Yassine; Shooshan, Sonya E; Rodriguez, Laritza; Masterton, Kate; Demner-Fushman, Dina

    2018-02-06

    Consumers increasingly use online resources for their health information needs. While current search engines can address these needs to some extent, they generally do not take into account that most health information needs are complex and can only fully be expressed in natural language. Consumer health question answering (QA) systems aim to fill this gap. A major challenge in developing consumer health QA systems is extracting relevant semantic content from the natural language questions (question understanding). To develop effective question understanding tools, question corpora semantically annotated for relevant question elements are needed. In this paper, we present a two-part consumer health question corpus annotated with several semantic categories: named entities, question triggers/types, question frames, and question topic. The first part (CHQA-email) consists of relatively long email requests received by the U.S. National Library of Medicine (NLM) customer service, while the second part (CHQA-web) consists of shorter questions posed to MedlinePlus search engine as queries. Each question has been annotated by two annotators. The annotation methodology is largely the same between the two parts of the corpus; however, we also explain and justify the differences between them. Additionally, we provide information about corpus characteristics, inter-annotator agreement, and our attempts to measure annotation confidence in the absence of adjudication of annotations. The resulting corpus consists of 2614 questions (CHQA-email: 1740, CHQA-web: 874). Problems are the most frequent named entities, while treatment and general information questions are the most common question types. Inter-annotator agreement was generally modest: question types and topics yielded highest agreement, while the agreement for more complex frame annotations was lower. Agreement in CHQA-web was consistently higher than that in CHQA-email. Pairwise inter-annotator agreement proved most

  4. Predicting word sense annotation agreement

    DEFF Research Database (Denmark)

    Martinez Alonso, Hector; Johannsen, Anders Trærup; Lopez de Lacalle, Oier

    2015-01-01

    High agreement is a common objective when annotating data for word senses. However, a number of factors make perfect agreement impossible, e.g. the limitations of the sense inventories, the difficulty of the examples or the interpretation preferences of the annotations. Estimating potential...... agreement is thus a relevant task to supplement the evaluation of sense annotations. In this article we propose two methods to predict agreement on word-annotation instances. We experiment with a continuous representation and a three-way discretization of observed agreement. In spite of the difficulty...

  5. Alignment-Annotator web server: rendering and annotating sequence alignments.

    Science.gov (United States)

    Gille, Christoph; Fähling, Michael; Weyand, Birgit; Wieland, Thomas; Gille, Andreas

    2014-07-01

    Alignment-Annotator is a novel web service designed to generate interactive views of annotated nucleotide and amino acid sequence alignments (i) de novo and (ii) embedded in other software. All computations are performed at server side. Interactivity is implemented in HTML5, a language native to web browsers. The alignment is initially displayed using default settings and can be modified with the graphical user interfaces. For example, individual sequences can be reordered or deleted using drag and drop, amino acid color code schemes can be applied and annotations can be added. Annotations can be made manually or imported (BioDAS servers, the UniProt, the Catalytic Site Atlas and the PDB). Some edits take immediate effect while others require server interaction and may take a few seconds to execute. The final alignment document can be downloaded as a zip-archive containing the HTML files. Because of the use of HTML the resulting interactive alignment can be viewed on any platform including Windows, Mac OS X, Linux, Android and iOS in any standard web browser. Importantly, no plugins nor Java are required and therefore Alignment-Anotator represents the first interactive browser-based alignment visualization. http://www.bioinformatics.org/strap/aa/ and http://strap.charite.de/aa/. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Chest x-ray screening practices: an annotated bibliography

    International Nuclear Information System (INIS)

    Torchia, M.; DuChez, J.

    1980-03-01

    This annotated bibliography is a review of the scientific literature on the selection of asymptomatic patients for chest x-ray screening examinations. Selected articles cover a period of time from 1969 through 1979. The articles are organized under 10 main topics which correspond to various categories of chest x-ray screening examinations performed in the United States today. Within each main topic, the articles are presented in chronological order. To aid the reader in identifying specific citations, an author index and a list of citations by journal have been included for user reference. The standard format for each citation includes the title of each article, the author(s), journal, volume, page, date, and abstract

  7. New Publications for Planning Libraries (List No. 19). Exchange Bibliography 927.

    Science.gov (United States)

    Vance, Mary, Comp.

    This partially annotated bibliography contains current sources on a variety of topics including architecture, economics, environmental education, energy, geography, land use, urban planning, politics, recreation, and open space. The sources date from 1973 through 1975. The bulk of the documents are project reports, commercially published books,…

  8. A DESCRIPTION AND SOURCE LISTING OF PROFESSIONAL INFORMATION IN AGRICULTURAL EDUCATION, 1963-64.

    Science.gov (United States)

    SLEDGE, GEORGE W.; AND OTHERS

    BRIEF ANNOTATIONS ARE GIVEN FOR MANY OF THE 107 REFERENCES LISTED UNDER THE FOLLOWING CATEGORIES -- (1) ADULT EDUCATION, (2) AGRICULTURAL ENGINEERING, (3) ANIMAL SCIENCE, (4) CURRICULUM DEVELOPMENT AND CURRICULUM IN CROPS, ENTOMOLOGY, FARM MANAGEMENT, FARM MECHANICS, AND LIVESTOCK, (5) FARM BUSINESS MANAGEMENT AND MARKETING, (6) FORESTRY, (7)…

  9. Using Annotated Conceptual Models to Derive Information System Implementations

    Directory of Open Access Journals (Sweden)

    Anthony Berglas

    1994-05-01

    Full Text Available Producing production quality information systems from conceptual descriptions is a time consuming process that employs many of the world's programmers. Although most of this programming is fairly routine, the process has not been amenable to simple automation because conceptual models do not provide sufficient parameters to make all the implementation decisions that are required, and numerous special cases arise in practice. Most commercial CASE tools address these problems by essentially implementing a waterfall model in which the development proceeds from analysis through design, layout and coding phases in a partially automated manner, but the analyst/programmer must heavily edit each intermediate stage. This paper demonstrates that by recognising the nature of information systems, it is possible to specify applications completely using a conceptual model that has een annotated with additional parameters that guide automated implementation. More importantly, it will be argued that a manageable number of annotations are sufficient to implement realistic applications, and techniques will be described that enabled the author's commercial CASE tool, the Intelligent Develope to automated implementation without requiring complex theorem proving technology.

  10. Annotated Bibliography. Conference Materials: Strategy Conference on Education and the Economy (Washington, D.C., September 22-24, 1981).

    Science.gov (United States)

    Center for Law and Education, Boston, MA.

    The 22 items included in a packet distributed to participants at a 1981 conference on education and the economy are listed in this annotated bibliography, with sources for the items identified. The materials include articles, chapters from books, and monographs, as well as a resource guide and bibliography concerning rural economic and community…

  11. An annotated checklist of the Greek Stonefly Fauna (Insecta: Plecoptera).

    Science.gov (United States)

    Karaouzas, Ioannis; Andriopoulou, Argyro; Kouvarda, Theodora; Murányi, Dávid

    2016-05-17

    An overview of the Greek stonefly (Plecoptera) fauna is presented as an annotated index of all available published records. These records have resulted in an updated species list reflecting current taxonomy and species distributions of the Greek peninsula and islands. Currently, a total of 71 species and seven subspecies belonging to seven families and 19 genera are reported from Greece. There is high species endemicity of the Leuctridae and Nemouridae, particularly on the Greek islands. The endemics known from Greece comprise thirty species representing 42% of the Greek stonefly fauna. The remaining taxa are typical Balkan and Mediterranean species.

  12. Objective-guided image annotation.

    Science.gov (United States)

    Mao, Qi; Tsang, Ivor Wai-Hung; Gao, Shenghua

    2013-04-01

    Automatic image annotation, which is usually formulated as a multi-label classification problem, is one of the major tools used to enhance the semantic understanding of web images. Many multimedia applications (e.g., tag-based image retrieval) can greatly benefit from image annotation. However, the insufficient performance of image annotation methods prevents these applications from being practical. On the other hand, specific measures are usually designed to evaluate how well one annotation method performs for a specific objective or application, but most image annotation methods do not consider optimization of these measures, so that they are inevitably trapped into suboptimal performance of these objective-specific measures. To address this issue, we first summarize a variety of objective-guided performance measures under a unified representation. Our analysis reveals that macro-averaging measures are very sensitive to infrequent keywords, and hamming measure is easily affected by skewed distributions. We then propose a unified multi-label learning framework, which directly optimizes a variety of objective-specific measures of multi-label learning tasks. Specifically, we first present a multilayer hierarchical structure of learning hypotheses for multi-label problems based on which a variety of loss functions with respect to objective-guided measures are defined. And then, we formulate these loss functions as relaxed surrogate functions and optimize them by structural SVMs. According to the analysis of various measures and the high time complexity of optimizing micro-averaging measures, in this paper, we focus on example-based measures that are tailor-made for image annotation tasks but are seldom explored in the literature. Experiments show consistency with the formal analysis on two widely used multi-label datasets, and demonstrate the superior performance of our proposed method over state-of-the-art baseline methods in terms of example-based measures on four

  13. Concept annotation in the CRAFT corpus.

    Science.gov (United States)

    Bada, Michael; Eckert, Miriam; Evans, Donald; Garcia, Kristin; Shipley, Krista; Sitnikov, Dmitry; Baumgartner, William A; Cohen, K Bretonnel; Verspoor, Karin; Blake, Judith A; Hunter, Lawrence E

    2012-07-09

    Manually annotated corpora are critical for the training and evaluation of automated methods to identify concepts in biomedical text. This paper presents the concept annotations of the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of 97 full-length, open-access biomedical journal articles that have been annotated both semantically and syntactically to serve as a research resource for the biomedical natural-language-processing (NLP) community. CRAFT identifies all mentions of nearly all concepts from nine prominent biomedical ontologies and terminologies: the Cell Type Ontology, the Chemical Entities of Biological Interest ontology, the NCBI Taxonomy, the Protein Ontology, the Sequence Ontology, the entries of the Entrez Gene database, and the three subontologies of the Gene Ontology. The first public release includes the annotations for 67 of the 97 articles, reserving two sets of 15 articles for future text-mining competitions (after which these too will be released). Concept annotations were created based on a single set of guidelines, which has enabled us to achieve consistently high interannotator agreement. As the initial 67-article release contains more than 560,000 tokens (and the full set more than 790,000 tokens), our corpus is among the largest gold-standard annotated biomedical corpora. Unlike most others, the journal articles that comprise the corpus are drawn from diverse biomedical disciplines and are marked up in their entirety. Additionally, with a concept-annotation count of nearly 100,000 in the 67-article subset (and more than 140,000 in the full collection), the scale of conceptual markup is also among the largest of comparable corpora. The concept annotations of the CRAFT Corpus have the potential to significantly advance biomedical text mining by providing a high-quality gold standard for NLP systems. The corpus, annotation guidelines, and other associated resources are freely available at http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml.

  14. Association for Teacher Education in Europe (ATEE) Seminar on Practical Experience in Teacher Education (Gargnano, Italy, June 28-July 1, 1995). Select Annotated Bibliography.

    Science.gov (United States)

    Macrae, Sheila, Comp.; Manning, Patricia, Comp; Moon, Bob, Comp.

    1996-01-01

    This annotated bibliography offers information from presentations at a 1995 seminar on practical teaching experience in preservice teacher education programs throughout Europe. Each entry includes a brief abstract of the presentation and a list of keywords. (SM)

  15. Making web annotations persistent over time

    Energy Technology Data Exchange (ETDEWEB)

    Sanderson, Robert [Los Alamos National Laboratory; Van De Sompel, Herbert [Los Alamos National Laboratory

    2010-01-01

    As Digital Libraries (DL) become more aligned with the web architecture, their functional components need to be fundamentally rethought in terms of URIs and HTTP. Annotation, a core scholarly activity enabled by many DL solutions, exhibits a clearly unacceptable characteristic when existing models are applied to the web: due to the representations of web resources changing over time, an annotation made about a web resource today may no longer be relevant to the representation that is served from that same resource tomorrow. We assume the existence of archived versions of resources, and combine the temporal features of the emerging Open Annotation data model with the capability offered by the Memento framework that allows seamless navigation from the URI of a resource to archived versions of that resource, and arrive at a solution that provides guarantees regarding the persistence of web annotations over time. More specifically, we provide theoretical solutions and proof-of-concept experimental evaluations for two problems: reconstructing an existing annotation so that the correct archived version is displayed for all resources involved in the annotation, and retrieving all annotations that involve a given archived version of a web resource.

  16. Semantic annotation in biomedicine: the current landscape.

    Science.gov (United States)

    Jovanović, Jelena; Bagheri, Ebrahim

    2017-09-22

    The abundance and unstructured nature of biomedical texts, be it clinical or research content, impose significant challenges for the effective and efficient use of information and knowledge stored in such texts. Annotation of biomedical documents with machine intelligible semantics facilitates advanced, semantics-based text management, curation, indexing, and search. This paper focuses on annotation of biomedical entity mentions with concepts from relevant biomedical knowledge bases such as UMLS. As a result, the meaning of those mentions is unambiguously and explicitly defined, and thus made readily available for automated processing. This process is widely known as semantic annotation, and the tools that perform it are known as semantic annotators.Over the last dozen years, the biomedical research community has invested significant efforts in the development of biomedical semantic annotation technology. Aiming to establish grounds for further developments in this area, we review a selected set of state of the art biomedical semantic annotators, focusing particularly on general purpose annotators, that is, semantic annotation tools that can be customized to work with texts from any area of biomedicine. We also examine potential directions for further improvements of today's annotators which could make them even more capable of meeting the needs of real-world applications. To motivate and encourage further developments in this area, along the suggested and/or related directions, we review existing and potential practical applications and benefits of semantic annotators.

  17. Radioactive occurrences in veins and igneous and metamorphic rocks of New Mexico with annotated bibliography

    International Nuclear Information System (INIS)

    McLemore, V.T.

    1982-01-01

    From an extensive literature search and field examination of 96 nonsandstone radioactive occurrences, the author compiled an annotated bibliography of over 600 citations and a list of 327 radioactive occurrences in veins and igneous and metamorphic rocks of New Mexico. The citations are indexed by individual radioactive occurrence, geographic area, county, fluorspar deposits and occurrences, geochemical analyses, and geologic maps. In addition, the geology, mineralization, and uranium and thorium potential of 41 geographic areas in New Mexico containing known radioactive occurrences in veins and igneous and metamorphic rocks or that contain host rocks considered favorable for uranium or thorium mineralization are summarized. A list of aerial-radiometric, magnetic, hydrogeochemical, and stream-sediment survey reports is included

  18. Comparison of lists of genes based on functional profiles

    Directory of Open Access Journals (Sweden)

    Salicrú Miquel

    2011-10-01

    Full Text Available Abstract Background How to compare studies on the basis of their biological significance is a problem of central importance in high-throughput genomics. Many methods for performing such comparisons are based on the information in databases of functional annotation, such as those that form the Gene Ontology (GO. Typically, they consist of analyzing gene annotation frequencies in some pre-specified GO classes, in a class-by-class way, followed by p-value adjustment for multiple testing. Enrichment analysis, where a list of genes is compared against a wider universe of genes, is the most common example. Results A new global testing procedure and a method incorporating it are presented. Instead of testing separately for each GO class, a single global test for all classes under consideration is performed. The test is based on the distance between the functional profiles, defined as the joint frequencies of annotation in a given set of GO classes. These classes may be chosen at one or more GO levels. The new global test is more powerful and accurate with respect to type I errors than the usual class-by-class approach. When applied to some real datasets, the results suggest that the method may also provide useful information that complements the tests performed using a class-by-class approach if gene counts are sparse in some classes. An R library, goProfiles, implements these methods and is available from Bioconductor, http://bioconductor.org/packages/release/bioc/html/goProfiles.html. Conclusions The method provides an inferential basis for deciding whether two lists are functionally different. For global comparisons it is preferable to the global chi-square test of homogeneity. Furthermore, it may provide additional information if used in conjunction with class-by-class methods.

  19. Contributions to In Silico Genome Annotation

    KAUST Repository

    Kalkatawi, Manal M.

    2017-11-30

    Genome annotation is an important topic since it provides information for the foundation of downstream genomic and biological research. It is considered as a way of summarizing part of existing knowledge about the genomic characteristics of an organism. Annotating different regions of a genome sequence is known as structural annotation, while identifying functions of these regions is considered as a functional annotation. In silico approaches can facilitate both tasks that otherwise would be difficult and timeconsuming. This study contributes to genome annotation by introducing several novel bioinformatics methods, some based on machine learning (ML) approaches. First, we present Dragon PolyA Spotter (DPS), a method for accurate identification of the polyadenylation signals (PAS) within human genomic DNA sequences. For this, we derived a novel feature-set able to characterize properties of the genomic region surrounding the PAS, enabling development of high accuracy optimized ML predictive models. DPS considerably outperformed the state-of-the-art results. The second contribution concerns developing generic models for structural annotation, i.e., the recognition of different genomic signals and regions (GSR) within eukaryotic DNA. We developed DeepGSR, a systematic framework that facilitates generating ML models to predict GSR with high accuracy. To the best of our knowledge, no available generic and automated method exists for such task that could facilitate the studies of newly sequenced organisms. The prediction module of DeepGSR uses deep learning algorithms to derive highly abstract features that depend mainly on proper data representation and hyperparameters calibration. DeepGSR, which was evaluated on recognition of PAS and translation initiation sites (TIS) in different organisms, yields a simpler and more precise representation of the problem under study, compared to some other hand-tailored models, while producing high accuracy prediction results. Finally

  20. Active learning reduces annotation time for clinical concept extraction.

    Science.gov (United States)

    Kholghi, Mahnoosh; Sitbon, Laurianne; Zuccon, Guido; Nguyen, Anthony

    2017-10-01

    To investigate: (1) the annotation time savings by various active learning query strategies compared to supervised learning and a random sampling baseline, and (2) the benefits of active learning-assisted pre-annotations in accelerating the manual annotation process compared to de novo annotation. There are 73 and 120 discharge summary reports provided by Beth Israel institute in the train and test sets of the concept extraction task in the i2b2/VA 2010 challenge, respectively. The 73 reports were used in user study experiments for manual annotation. First, all sequences within the 73 reports were manually annotated from scratch. Next, active learning models were built to generate pre-annotations for the sequences selected by a query strategy. The annotation/reviewing time per sequence was recorded. The 120 test reports were used to measure the effectiveness of the active learning models. When annotating from scratch, active learning reduced the annotation time up to 35% and 28% compared to a fully supervised approach and a random sampling baseline, respectively. Reviewing active learning-assisted pre-annotations resulted in 20% further reduction of the annotation time when compared to de novo annotation. The number of concepts that require manual annotation is a good indicator of the annotation time for various active learning approaches as demonstrated by high correlation between time rate and concept annotation rate. Active learning has a key role in reducing the time required to manually annotate domain concepts from clinical free text, either when annotating from scratch or reviewing active learning-assisted pre-annotations. Copyright © 2017 Elsevier B.V. All rights reserved.

  1. Procedures for the elicitation of expert judgements in the probabilistic risk analysis of the long-term effects of radioactive waste repositories: an annotated bibliography

    International Nuclear Information System (INIS)

    Watson, S.R.

    1993-01-01

    This annotated bibliography describes the key literature relevant to the elicitation of expert judgements in radioactive waste management. The bibliography is divided into seven sections; section 2 lists the literature exploring the proper interpretation of probabilities used in Probabilistic Risk Analysis (PRA). Section 3 lists literature describing other calculi for handling uncertainty in a numerical fashion. In section 4 comments are given on how to elicit probabilities from individuals as a measure of subjective degrees of belief and section 5 lists the literature concerning how expert judgements can be combined. Sections 6 and 7 list literature giving an overview of the issues involved in PRA for radioactive waste repositories. (author)

  2. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes

    Energy Technology Data Exchange (ETDEWEB)

    Brettin, Thomas; Davis, James J.; Disz, Terry; Edwards, Robert A.; Gerdes, Svetlana; Olsen, Gary J.; Olson, Robert; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D.; Shukla, Maulik; Thomason, James A.; Stevens, Rick; Vonstein, Veronika; Wattam, Alice R.; Xia, Fangfang

    2015-02-10

    The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.

  3. RASTtk: a modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes.

    Science.gov (United States)

    Brettin, Thomas; Davis, James J; Disz, Terry; Edwards, Robert A; Gerdes, Svetlana; Olsen, Gary J; Olson, Robert; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D; Shukla, Maulik; Thomason, James A; Stevens, Rick; Vonstein, Veronika; Wattam, Alice R; Xia, Fangfang

    2015-02-10

    The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.

  4. The STAPL pList

    KAUST Repository

    Tanase, Gabriel

    2010-01-01

    We present the design and implementation of the stapl pList, a parallel container that has the properties of a sequential list, but allows for scalable concurrent access when used in a parallel program. The Standard Template Adaptive Parallel Library (stapl) is a parallel programming library that extends C++ with support for parallelism. stapl provides a collection of distributed data structures (pContainers) and parallel algorithms (pAlgorithms) and a generic methodology for extending them to provide customized functionality. stapl pContainers are thread-safe, concurrent objects, providing appropriate interfaces (e.g., views) that can be used by generic pAlgorithms. The pList provides stl equivalent methods, such as insert, erase, and splice, additional methods such as split, and efficient asynchronous (non-blocking) variants of some methods for improved parallel performance. We evaluate the performance of the stapl pList on an IBM Power 5 cluster and on a CRAY XT4 massively parallel processing system. Although lists are generally not considered good data structures for parallel processing, we show that pList methods and pAlgorithms (p-generate and p-partial-sum) operating on pLists provide good scalability on more than 103 processors and that pList compares favorably with other dynamic data structures such as the pVector. © 2010 Springer-Verlag.

  5. Annotations on the virtual element method for second-order elliptic problems

    Energy Technology Data Exchange (ETDEWEB)

    Manzini, Gianmarco [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2017-01-03

    This document contains working annotations on the Virtual Element Method (VEM) for the approximate solution of diffusion problems with variable coefficients. To read this document you are assumed to have familiarity with concepts from the numerical discretization of Partial Differential Equations (PDEs) and, in particular, the Finite Element Method (FEM). This document is not an introduction to the FEM, for which many textbooks (also free on the internet) are available. Eventually, this document is intended to evolve into a tutorial introduction to the VEM (but this is really a long-term goal).

  6. Computer systems for annotation of single molecule fragments

    Science.gov (United States)

    Schwartz, David Charles; Severin, Jessica

    2016-07-19

    There are provided computer systems for visualizing and annotating single molecule images. Annotation systems in accordance with this disclosure allow a user to mark and annotate single molecules of interest and their restriction enzyme cut sites thereby determining the restriction fragments of single nucleic acid molecules. The markings and annotations may be automatically generated by the system in certain embodiments and they may be overlaid translucently onto the single molecule images. An image caching system may be implemented in the computer annotation systems to reduce image processing time. The annotation systems include one or more connectors connecting to one or more databases capable of storing single molecule data as well as other biomedical data. Such diverse array of data can be retrieved and used to validate the markings and annotations. The annotation systems may be implemented and deployed over a computer network. They may be ergonomically optimized to facilitate user interactions.

  7. Image annotation under X Windows

    Science.gov (United States)

    Pothier, Steven

    1991-08-01

    A mechanism for attaching graphic and overlay annotation to multiple bits/pixel imagery while providing levels of performance approaching that of native mode graphics systems is presented. This mechanism isolates programming complexity from the application programmer through software encapsulation under the X Window System. It ensures display accuracy throughout operations on the imagery and annotation including zooms, pans, and modifications of the annotation. Trade-offs that affect speed of display, consumption of memory, and system functionality are explored. The use of resource files to tune the display system is discussed. The mechanism makes use of an abstraction consisting of four parts; a graphics overlay, a dithered overlay, an image overly, and a physical display window. Data structures are maintained that retain the distinction between the four parts so that they can be modified independently, providing system flexibility. A unique technique for associating user color preferences with annotation is introduced. An interface that allows interactive modification of the mapping between image value and color is discussed. A procedure that provides for the colorization of imagery on 8-bit display systems using pixel dithering is explained. Finally, the application of annotation mechanisms to various applications is discussed.

  8. Motion lecture annotation system to learn Naginata performances

    Science.gov (United States)

    Kobayashi, Daisuke; Sakamoto, Ryota; Nomura, Yoshihiko

    2013-12-01

    This paper describes a learning assistant system using motion capture data and annotation to teach "Naginata-jutsu" (a skill to practice Japanese halberd) performance. There are some video annotation tools such as YouTube. However these video based tools have only single angle of view. Our approach that uses motion-captured data allows us to view any angle. A lecturer can write annotations related to parts of body. We have made a comparison of effectiveness between the annotation tool of YouTube and the proposed system. The experimental result showed that our system triggered more annotations than the annotation tool of YouTube.

  9. BEACON: automated tool for Bacterial GEnome Annotation ComparisON

    KAUST Repository

    Kalkatawi, Manal M.

    2015-08-18

    Background Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). Results The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACON’s utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27 %, while the number of genes without any function assignment is reduced. Conclusions We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/

  10. BEACON: automated tool for Bacterial GEnome Annotation ComparisON.

    Science.gov (United States)

    Kalkatawi, Manal; Alam, Intikhab; Bajic, Vladimir B

    2015-08-18

    Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACON's utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27%, while the number of genes without any function assignment is reduced. We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/ .

  11. JGI Plant Genomics Gene Annotation Pipeline

    Energy Technology Data Exchange (ETDEWEB)

    Shu, Shengqiang; Rokhsar, Dan; Goodstein, David; Hayes, David; Mitros, Therese

    2014-07-14

    Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward this aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.

  12. An annotated check list of the birds of Qwaqwa National Park

    Directory of Open Access Journals (Sweden)

    D.H. De Swardt

    1996-08-01

    Full Text Available This paper presents a check list of 179 bird species occuring in the Qwaqwa National Park which borders the eastern part of Golden Gate Highlands National Park. Data on the distribution, status, habitat preferences and breeding were obtained during several visits between December 1992 and March 1995. The following habitats were preferred: grassland, montane grassland, woodland, rocky hillsides, mountain slopes and riverine areas with Phragmites reedbeds. The conservation of waterbirds, raptors and other localised species such as Orangebreasted Rockjumper, Palecrowned Cisticola, Mountain Pipit and Gurney's Sugarbird is important as these species occur in specialised habitats.

  13. How to become a Bayesian in eight easy steps : An annotated reading list

    NARCIS (Netherlands)

    Etz, A.; Gronau, Q.F.; Dablander, F.; Edelsbrunner, P.A.; Baribault, B.

    In this guide, we present a reading list to serve as a concise introduction to Bayesian data analysis. The introduction is geared toward reviewers, editors, and interested researchers who are new to Bayesian statistics. We provide commentary for eight recommended sources, which together cover the

  14. Annotating temporal information in clinical narratives.

    Science.gov (United States)

    Sun, Weiyi; Rumshisky, Anna; Uzuner, Ozlem

    2013-12-01

    Temporal information in clinical narratives plays an important role in patients' diagnosis, treatment and prognosis. In order to represent narrative information accurately, medical natural language processing (MLP) systems need to correctly identify and interpret temporal information. To promote research in this area, the Informatics for Integrating Biology and the Bedside (i2b2) project developed a temporally annotated corpus of clinical narratives. This corpus contains 310 de-identified discharge summaries, with annotations of clinical events, temporal expressions and temporal relations. This paper describes the process followed for the development of this corpus and discusses annotation guideline development, annotation methodology, and corpus quality. Copyright © 2013 Elsevier Inc. All rights reserved.

  15. Annotated bibliography National Environmental Policy Act (NEPA) documents for Sandia National Laboratories

    International Nuclear Information System (INIS)

    Harris, J.M.

    1995-04-01

    The following annotated bibliography lists documents prepared by the Department of Energy (DOE), and predecessor agencies, to meet the requirements of the National Environmental Policy Act (NEPA) for activities and facilities at Sandia National Laboratories sites. For each NEPA document summary information and a brief discussion of content is provided. This information may be used to reduce the amount of time or cost associated with NEPA compliance for future Sandia National Laboratories projects. This summary may be used to identify model documents, documents to use as sources of information, or documents from which to tier additional NEPA documents

  16. Annotated bibliography National Environmental Policy Act (NEPA) documents for Sandia National Laboratories

    Energy Technology Data Exchange (ETDEWEB)

    Harris, J.M.

    1995-04-01

    The following annotated bibliography lists documents prepared by the Department of Energy (DOE), and predecessor agencies, to meet the requirements of the National Environmental Policy Act (NEPA) for activities and facilities at Sandia National Laboratories sites. For each NEPA document summary information and a brief discussion of content is provided. This information may be used to reduce the amount of time or cost associated with NEPA compliance for future Sandia National Laboratories projects. This summary may be used to identify model documents, documents to use as sources of information, or documents from which to tier additional NEPA documents.

  17. Thermal effects on aquatic organisms: an annotated bibliography of the 1976 literature

    Energy Technology Data Exchange (ETDEWEB)

    Talmage, S.S. (comp.)

    1978-05-01

    This bibliography, containing 784 annotated references on the effects of temperature on aquatic organisms, is part of an assessment of the literature on the effects of thermal power plants on the environment. The effects of thermal discharges at power plant sites are emphasized. Laboratory and field studies on temperature tolerance and the effects of temperature changes on reproduction, development, growth, distribution, physiology, and sensitivity to other stresses are included. Indexes are provided for author, keywords, subject category, geographic location of the study, taxon, and title (alphabetical listing of keywords-in-context of nontrivial words in the title).

  18. Facilitating functional annotation of chicken microarray data

    Directory of Open Access Journals (Sweden)

    Gresham Cathy R

    2009-10-01

    Full Text Available Abstract Background Modeling results from chicken microarray studies is challenging for researchers due to little functional annotation associated with these arrays. The Affymetrix GenChip chicken genome array, one of the biggest arrays that serve as a key research tool for the study of chicken functional genomics, is among the few arrays that link gene products to Gene Ontology (GO. However the GO annotation data presented by Affymetrix is incomplete, for example, they do not show references linked to manually annotated functions. In addition, there is no tool that facilitates microarray researchers to directly retrieve functional annotations for their datasets from the annotated arrays. This costs researchers amount of time in searching multiple GO databases for functional information. Results We have improved the breadth of functional annotations of the gene products associated with probesets on the Affymetrix chicken genome array by 45% and the quality of annotation by 14%. We have also identified the most significant diseases and disorders, different types of genes, and known drug targets represented on Affymetrix chicken genome array. To facilitate functional annotation of other arrays and microarray experimental datasets we developed an Array GO Mapper (AGOM tool to help researchers to quickly retrieve corresponding functional information for their dataset. Conclusion Results from this study will directly facilitate annotation of other chicken arrays and microarray experimental datasets. Researchers will be able to quickly model their microarray dataset into more reliable biological functional information by using AGOM tool. The disease, disorders, gene types and drug targets revealed in the study will allow researchers to learn more about how genes function in complex biological systems and may lead to new drug discovery and development of therapies. The GO annotation data generated will be available for public use via AgBase website and

  19. 40 CFR Table 7 to Subpart Vvvvvv... - Partially Soluble HAP

    Science.gov (United States)

    2010-07-01

    ... 40 Protection of Environment 14 2010-07-01 2010-07-01 false Partially Soluble HAP 7 Table 7 to... Pt. 63, Subpt. VVVVVV, Table 7 Table 7 to Subpart VVVVVV of Part 63—Partially Soluble HAP As required... partially soluble HAP listed in the following table. Partially soluble HAP name CAS No. 1. 1,1,1...

  20. Dictionary-driven protein annotation.

    Science.gov (United States)

    Rigoutsos, Isidore; Huynh, Tien; Floratos, Aris; Parida, Laxmi; Platt, Daniel

    2002-09-01

    Computational methods seeking to automatically determine the properties (functional, structural, physicochemical, etc.) of a protein directly from the sequence have long been the focus of numerous research groups. With the advent of advanced sequencing methods and systems, the number of amino acid sequences that are being deposited in the public databases has been increasing steadily. This has in turn generated a renewed demand for automated approaches that can annotate individual sequences and complete genomes quickly, exhaustively and objectively. In this paper, we present one such approach that is centered around and exploits the Bio-Dictionary, a collection of amino acid patterns that completely covers the natural sequence space and can capture functional and structural signals that have been reused during evolution, within and across protein families. Our annotation approach also makes use of a weighted, position-specific scoring scheme that is unaffected by the over-representation of well-conserved proteins and protein fragments in the databases used. For a given query sequence, the method permits one to determine, in a single pass, the following: local and global similarities between the query and any protein already present in a public database; the likeness of the query to all available archaeal/ bacterial/eukaryotic/viral sequences in the database as a function of amino acid position within the query; the character of secondary structure of the query as a function of amino acid position within the query; the cytoplasmic, transmembrane or extracellular behavior of the query; the nature and position of binding domains, active sites, post-translationally modified sites, signal peptides, etc. In terms of performance, the proposed method is exhaustive, objective and allows for the rapid annotation of individual sequences and full genomes. Annotation examples are presented and discussed in Results, including individual queries and complete genomes that were

  1. The effectiveness of annotated (vs. non-annotated) digital pathology slides as a teaching tool during dermatology and pathology residencies.

    Science.gov (United States)

    Marsch, Amanda F; Espiritu, Baltazar; Groth, John; Hutchens, Kelli A

    2014-06-01

    With today's technology, paraffin-embedded, hematoxylin & eosin-stained pathology slides can be scanned to generate high quality virtual slides. Using proprietary software, digital images can also be annotated with arrows, circles and boxes to highlight certain diagnostic features. Previous studies assessing digital microscopy as a teaching tool did not involve the annotation of digital images. The objective of this study was to compare the effectiveness of annotated digital pathology slides versus non-annotated digital pathology slides as a teaching tool during dermatology and pathology residencies. A study group composed of 31 dermatology and pathology residents was asked to complete an online pre-quiz consisting of 20 multiple choice style questions, each associated with a static digital pathology image. After completion, participants were given access to an online tutorial composed of digitally annotated pathology slides and subsequently asked to complete a post-quiz. A control group of 12 residents completed a non-annotated version of the tutorial. Nearly all participants in the study group improved their quiz score, with an average improvement of 17%, versus only 3% (P = 0.005) in the control group. These results support the notion that annotated digital pathology slides are superior to non-annotated slides for the purpose of resident education. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  2. Automatic annotation of head velocity and acceleration in Anvil

    DEFF Research Database (Denmark)

    Jongejan, Bart

    2012-01-01

    We describe an automatic face tracker plugin for the ANVIL annotation tool. The face tracker produces data for velocity and for acceleration in two dimensions. We compare the annotations generated by the face tracking algorithm with independently made manual annotations for head movements....... The annotations are a useful supplement to manual annotations and may help human annotators to quickly and reliably determine onset of head movements and to suggest which kind of head movement is taking place....

  3. Annotating images by mining image search results

    NARCIS (Netherlands)

    Wang, X.J.; Zhang, L.; Li, X.; Ma, W.Y.

    2008-01-01

    Although it has been studied for years by the computer vision and machine learning communities, image annotation is still far from practical. In this paper, we propose a novel attempt at model-free image annotation, which is a data-driven approach that annotates images by mining their search

  4. Employee Relations Bibliography: Public, Non-Profit and Professional Employment. Essay, Annotated Listing, Indexes.

    Science.gov (United States)

    Tice, Terrence N.

    This comprehensive listing of 2,724 bibliographic items from 1967 through early 1977 includes significant English-language material on the contractual relationship between public employers and employees in the United States and Canada. (There are a few items in French.) Although access is given to the broader areas of public management and…

  5. Potential use of geothermal resources in the Snake River Basin: an environmental overview. Volume II. Annotated bibliography

    Energy Technology Data Exchange (ETDEWEB)

    Spencer, S.G.; Russell, B.F.; Sullivan, J.F. (eds.)

    1979-09-01

    This volume is a partially annotated bibliography of reference materials pertaining to the seven KGRA's. The bibliography is divided into sections by program element as follows: terrestrial ecology, aquatic ecology, heritage resources, socioeconomics and demography, geology, geothermal, soils, hydrology and water quality, seismicity, and subsidence. Cross-referencing is available for those references which are applicable to specific KGRA's. (MHR)

  6. WormBase: Annotating many nematode genomes.

    Science.gov (United States)

    Howe, Kevin; Davis, Paul; Paulini, Michael; Tuli, Mary Ann; Williams, Gary; Yook, Karen; Durbin, Richard; Kersey, Paul; Sternberg, Paul W

    2012-01-01

    WormBase (www.wormbase.org) has been serving the scientific community for over 11 years as the central repository for genomic and genetic information for the soil nematode Caenorhabditis elegans. The resource has evolved from its beginnings as a database housing the genomic sequence and genetic and physical maps of a single species, and now represents the breadth and diversity of nematode research, currently serving genome sequence and annotation for around 20 nematodes. In this article, we focus on WormBase's role of genome sequence annotation, describing how we annotate and integrate data from a growing collection of nematode species and strains. We also review our approaches to sequence curation, and discuss the impact on annotation quality of large functional genomics projects such as modENCODE.

  7. Teaching and Learning Communities through Online Annotation

    Science.gov (United States)

    van der Pluijm, B.

    2016-12-01

    What do colleagues do with your assigned textbook? What they say or think about the material? Want students to be more engaged in their learning experience? If so, online materials that complement standard lecture format provide new opportunity through managed, online group annotation that leverages the ubiquity of internet access, while personalizing learning. The concept is illustrated with the new online textbook "Processes in Structural Geology and Tectonics", by Ben van der Pluijm and Stephen Marshak, which offers a platform for sharing of experiences, supplementary materials and approaches, including readings, mathematical applications, exercises, challenge questions, quizzes, alternative explanations, and more. The annotation framework used is Hypothes.is, which offers a free, open platform markup environment for annotation of websites and PDF postings. The annotations can be public, grouped or individualized, as desired, including export access and download of annotations. A teacher group, hosted by a moderator/owner, limits access to members of a user group of teachers, so that its members can use, copy or transcribe annotations for their own lesson material. Likewise, an instructor can host a student group that encourages sharing of observations, questions and answers among students and instructor. Also, the instructor can create one or more closed groups that offers study help and hints to students. Options galore, all of which aim to engage students and to promote greater responsibility for their learning experience. Beyond new capacity, the ability to analyze student annotation supports individual learners and their needs. For example, student notes can be analyzed for key phrases and concepts, and identify misunderstandings, omissions and problems. Also, example annotations can be shared to enhance notetaking skills and to help with studying. Lastly, online annotation allows active application to lecture posted slides, supporting real-time notetaking

  8. Displaying Annotations for Digitised Globes

    Science.gov (United States)

    Gede, Mátyás; Farbinger, Anna

    2018-05-01

    Thanks to the efforts of the various globe digitising projects, nowadays there are plenty of old globes that can be examined as 3D models on the computer screen. These globes usually contain a lot of interesting details that an average observer would not entirely discover for the first time. The authors developed a website that can display annotations for such digitised globes. These annotations help observers of the globe to discover all the important, interesting details. Annotations consist of a plain text title, a HTML formatted descriptive text and a corresponding polygon and are stored in KML format. The website is powered by the Cesium virtual globe engine.

  9. THE DIMENSIONS OF COMPOSITION ANNOTATION.

    Science.gov (United States)

    MCCOLLY, WILLIAM

    ENGLISH TEACHER ANNOTATIONS WERE STUDIED TO DETERMINE THE DIMENSIONS AND PROPERTIES OF THE ENTIRE SYSTEM FOR WRITING CORRECTIONS AND CRITICISMS ON COMPOSITIONS. FOUR SETS OF COMPOSITIONS WERE WRITTEN BY STUDENTS IN GRADES 9 THROUGH 13. TYPESCRIPTS OF THE COMPOSITIONS WERE ANNOTATED BY CLASSROOM ENGLISH TEACHERS. THEN, 32 ENGLISH TEACHERS JUDGED…

  10. Evaluation of three automated genome annotations for Halorhabdus utahensis.

    Directory of Open Access Journals (Sweden)

    Peter Bakke

    2009-07-01

    Full Text Available Genome annotations are accumulating rapidly and depend heavily on automated annotation systems. Many genome centers offer annotation systems but no one has compared their output in a systematic way to determine accuracy and inherent errors. Errors in the annotations are routinely deposited in databases such as NCBI and used to validate subsequent annotation errors. We submitted the genome sequence of halophilic archaeon Halorhabdus utahensis to be analyzed by three genome annotation services. We have examined the output from each service in a variety of ways in order to compare the methodology and effectiveness of the annotations, as well as to explore the genes, pathways, and physiology of the previously unannotated genome. The annotation services differ considerably in gene calls, features, and ease of use. We had to manually identify the origin of replication and the species-specific consensus ribosome-binding site. Additionally, we conducted laboratory experiments to test H. utahensis growth and enzyme activity. Current annotation practices need to improve in order to more accurately reflect a genome's biological potential. We make specific recommendations that could improve the quality of microbial annotation projects.

  11. MimoSA: a system for minimotif annotation

    Directory of Open Access Journals (Sweden)

    Kundeti Vamsi

    2010-06-01

    Full Text Available Abstract Background Minimotifs are short peptide sequences within one protein, which are recognized by other proteins or molecules. While there are now several minimotif databases, they are incomplete. There are reports of many minimotifs in the primary literature, which have yet to be annotated, while entirely novel minimotifs continue to be published on a weekly basis. Our recently proposed function and sequence syntax for minimotifs enables us to build a general tool that will facilitate structured annotation and management of minimotif data from the biomedical literature. Results We have built the MimoSA application for minimotif annotation. The application supports management of the Minimotif Miner database, literature tracking, and annotation of new minimotifs. MimoSA enables the visualization, organization, selection and editing functions of minimotifs and their attributes in the MnM database. For the literature components, Mimosa provides paper status tracking and scoring of papers for annotation through a freely available machine learning approach, which is based on word correlation. The paper scoring algorithm is also available as a separate program, TextMine. Form-driven annotation of minimotif attributes enables entry of new minimotifs into the MnM database. Several supporting features increase the efficiency of annotation. The layered architecture of MimoSA allows for extensibility by separating the functions of paper scoring, minimotif visualization, and database management. MimoSA is readily adaptable to other annotation efforts that manually curate literature into a MySQL database. Conclusions MimoSA is an extensible application that facilitates minimotif annotation and integrates with the Minimotif Miner database. We have built MimoSA as an application that integrates dynamic abstract scoring with a high performance relational model of minimotif syntax. MimoSA's TextMine, an efficient paper-scoring algorithm, can be used to

  12. Annotate-it: a Swiss-knife approach to annotation, analysis and interpretation of single nucleotide variation in human disease.

    Science.gov (United States)

    Sifrim, Alejandro; Van Houdt, Jeroen Kj; Tranchevent, Leon-Charles; Nowakowska, Beata; Sakai, Ryo; Pavlopoulos, Georgios A; Devriendt, Koen; Vermeesch, Joris R; Moreau, Yves; Aerts, Jan

    2012-01-01

    The increasing size and complexity of exome/genome sequencing data requires new tools for clinical geneticists to discover disease-causing variants. Bottlenecks in identifying the causative variation include poor cross-sample querying, constantly changing functional annotation and not considering existing knowledge concerning the phenotype. We describe a methodology that facilitates exploration of patient sequencing data towards identification of causal variants under different genetic hypotheses. Annotate-it facilitates handling, analysis and interpretation of high-throughput single nucleotide variant data. We demonstrate our strategy using three case studies. Annotate-it is freely available and test data are accessible to all users at http://www.annotate-it.org.

  13. Annotated chemical patent corpus: a gold standard for text mining.

    Directory of Open Access Journals (Sweden)

    Saber A Akhondi

    Full Text Available Exploring the chemical and biological space covered by patent applications is crucial in early-stage medicinal chemistry activities. Patent analysis can provide understanding of compound prior art, novelty checking, validation of biological assays, and identification of new starting points for chemical exploration. Extracting chemical and biological entities from patents through manual extraction by expert curators can take substantial amount of time and resources. Text mining methods can help to ease this process. To validate the performance of such methods, a manually annotated patent corpus is essential. In this study we have produced a large gold standard chemical patent corpus. We developed annotation guidelines and selected 200 full patents from the World Intellectual Property Organization, United States Patent and Trademark Office, and European Patent Office. The patents were pre-annotated automatically and made available to four independent annotator groups each consisting of two to ten annotators. The annotators marked chemicals in different subclasses, diseases, targets, and modes of action. Spelling mistakes and spurious line break due to optical character recognition errors were also annotated. A subset of 47 patents was annotated by at least three annotator groups, from which harmonized annotations and inter-annotator agreement scores were derived. One group annotated the full set. The patent corpus includes 400,125 annotations for the full set and 36,537 annotations for the harmonized set. All patents and annotated entities are publicly available at www.biosemantics.org.

  14. CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences

    Science.gov (United States)

    2012-01-01

    Background The complete sequences of chloroplast genomes provide wealthy information regarding the evolutionary history of species. With the advance of next-generation sequencing technology, the number of completely sequenced chloroplast genomes is expected to increase exponentially, powerful computational tools annotating the genome sequences are in urgent need. Results We have developed a web server CPGAVAS. The server accepts a complete chloroplast genome sequence as input. First, it predicts protein-coding and rRNA genes based on the identification and mapping of the most similar, full-length protein, cDNA and rRNA sequences by integrating results from Blastx, Blastn, protein2genome and est2genome programs. Second, tRNA genes and inverted repeats (IR) are identified using tRNAscan, ARAGORN and vmatch respectively. Third, it calculates the summary statistics for the annotated genome. Fourth, it generates a circular map ready for publication. Fifth, it can create a Sequin file for GenBank submission. Last, it allows the extractions of protein and mRNA sequences for given list of genes and species. The annotation results in GFF3 format can be edited using any compatible annotation editing tools. The edited annotations can then be uploaded to CPGAVAS for update and re-analyses repeatedly. Using known chloroplast genome sequences as test set, we show that CPGAVAS performs comparably to another application DOGMA, while having several superior functionalities. Conclusions CPGAVAS allows the semi-automatic and complete annotation of a chloroplast genome sequence, and the visualization, editing and analysis of the annotation results. It will become an indispensible tool for researchers studying chloroplast genomes. The software is freely accessible from http://www.herbalgenomics.org/cpgavas. PMID:23256920

  15. CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences

    Directory of Open Access Journals (Sweden)

    Liu Chang

    2012-12-01

    Full Text Available Abstract Background The complete sequences of chloroplast genomes provide wealthy information regarding the evolutionary history of species. With the advance of next-generation sequencing technology, the number of completely sequenced chloroplast genomes is expected to increase exponentially, powerful computational tools annotating the genome sequences are in urgent need. Results We have developed a web server CPGAVAS. The server accepts a complete chloroplast genome sequence as input. First, it predicts protein-coding and rRNA genes based on the identification and mapping of the most similar, full-length protein, cDNA and rRNA sequences by integrating results from Blastx, Blastn, protein2genome and est2genome programs. Second, tRNA genes and inverted repeats (IR are identified using tRNAscan, ARAGORN and vmatch respectively. Third, it calculates the summary statistics for the annotated genome. Fourth, it generates a circular map ready for publication. Fifth, it can create a Sequin file for GenBank submission. Last, it allows the extractions of protein and mRNA sequences for given list of genes and species. The annotation results in GFF3 format can be edited using any compatible annotation editing tools. The edited annotations can then be uploaded to CPGAVAS for update and re-analyses repeatedly. Using known chloroplast genome sequences as test set, we show that CPGAVAS performs comparably to another application DOGMA, while having several superior functionalities. Conclusions CPGAVAS allows the semi-automatic and complete annotation of a chloroplast genome sequence, and the visualization, editing and analysis of the annotation results. It will become an indispensible tool for researchers studying chloroplast genomes. The software is freely accessible from http://www.herbalgenomics.org/cpgavas.

  16. A fully automatic end-to-end method for content-based image retrieval of CT scans with similar liver lesion annotations.

    Science.gov (United States)

    Spanier, A B; Caplan, N; Sosna, J; Acar, B; Joskowicz, L

    2018-01-01

    The goal of medical content-based image retrieval (M-CBIR) is to assist radiologists in the decision-making process by retrieving medical cases similar to a given image. One of the key interests of radiologists is lesions and their annotations, since the patient treatment depends on the lesion diagnosis. Therefore, a key feature of M-CBIR systems is the retrieval of scans with the most similar lesion annotations. To be of value, M-CBIR systems should be fully automatic to handle large case databases. We present a fully automatic end-to-end method for the retrieval of CT scans with similar liver lesion annotations. The input is a database of abdominal CT scans labeled with liver lesions, a query CT scan, and optionally one radiologist-specified lesion annotation of interest. The output is an ordered list of the database CT scans with the most similar liver lesion annotations. The method starts by automatically segmenting the liver in the scan. It then extracts a histogram-based features vector from the segmented region, learns the features' relative importance, and ranks the database scans according to the relative importance measure. The main advantages of our method are that it fully automates the end-to-end querying process, that it uses simple and efficient techniques that are scalable to large datasets, and that it produces quality retrieval results using an unannotated CT scan. Our experimental results on 9 CT queries on a dataset of 41 volumetric CT scans from the 2014 Image CLEF Liver Annotation Task yield an average retrieval accuracy (Normalized Discounted Cumulative Gain index) of 0.77 and 0.84 without/with annotation, respectively. Fully automatic end-to-end retrieval of similar cases based on image information alone, rather that on disease diagnosis, may help radiologists to better diagnose liver lesions.

  17. Annotating individual human genomes.

    Science.gov (United States)

    Torkamani, Ali; Scott-Van Zeeland, Ashley A; Topol, Eric J; Schork, Nicholas J

    2011-10-01

    Advances in DNA sequencing technologies have made it possible to rapidly, accurately and affordably sequence entire individual human genomes. As impressive as this ability seems, however, it will not likely amount to much if one cannot extract meaningful information from individual sequence data. Annotating variations within individual genomes and providing information about their biological or phenotypic impact will thus be crucially important in moving individual sequencing projects forward, especially in the context of the clinical use of sequence information. In this paper we consider the various ways in which one might annotate individual sequence variations and point out limitations in the available methods for doing so. It is arguable that, in the foreseeable future, DNA sequencing of individual genomes will become routine for clinical, research, forensic, and personal purposes. We therefore also consider directions and areas for further research in annotating genomic variants. Copyright © 2011 Elsevier Inc. All rights reserved.

  18. ANNOTATING INDIVIDUAL HUMAN GENOMES*

    Science.gov (United States)

    Torkamani, Ali; Scott-Van Zeeland, Ashley A.; Topol, Eric J.; Schork, Nicholas J.

    2014-01-01

    Advances in DNA sequencing technologies have made it possible to rapidly, accurately and affordably sequence entire individual human genomes. As impressive as this ability seems, however, it will not likely to amount to much if one cannot extract meaningful information from individual sequence data. Annotating variations within individual genomes and providing information about their biological or phenotypic impact will thus be crucially important in moving individual sequencing projects forward, especially in the context of the clinical use of sequence information. In this paper we consider the various ways in which one might annotate individual sequence variations and point out limitations in the available methods for doing so. It is arguable that, in the foreseeable future, DNA sequencing of individual genomes will become routine for clinical, research, forensic, and personal purposes. We therefore also consider directions and areas for further research in annotating genomic variants. PMID:21839162

  19. Protein sequence annotation in the genome era: the annotation concept of SWISS-PROT+TREMBL.

    Science.gov (United States)

    Apweiler, R; Gateau, A; Contrino, S; Martin, M J; Junker, V; O'Donovan, C; Lang, F; Mitaritonna, N; Kappus, S; Bairoch, A

    1997-01-01

    SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotation, a minimal level of redundancy and high level of integration with other databases. Ongoing genome sequencing projects have dramatically increased the number of protein sequences to be incorporated into SWISS-PROT. Since we do not want to dilute the quality standards of SWISS-PROT by incorporating sequences without proper sequence analysis and annotation, we cannot speed up the incorporation of new incoming data indefinitely. However, as we also want to make the sequences available as fast as possible, we introduced TREMBL (TRanslation of EMBL nucleotide sequence database), a supplement to SWISS-PROT. TREMBL consists of computer-annotated entries in SWISS-PROT format derived from the translation of all coding sequences (CDS) in the EMBL nucleotide sequence database, except for CDS already included in SWISS-PROT. While TREMBL is already of immense value, its computer-generated annotation does not match the quality of SWISS-PROTs. The main difference is in the protein functional information attached to sequences. With this in mind, we are dedicating substantial effort to develop and apply computer methods to enhance the functional information attached to TREMBL entries.

  20. The GATO gene annotation tool for research laboratories

    Directory of Open Access Journals (Sweden)

    A. Fujita

    2005-11-01

    Full Text Available Large-scale genome projects have generated a rapidly increasing number of DNA sequences. Therefore, development of computational methods to rapidly analyze these sequences is essential for progress in genomic research. Here we present an automatic annotation system for preliminary analysis of DNA sequences. The gene annotation tool (GATO is a Bioinformatics pipeline designed to facilitate routine functional annotation and easy access to annotated genes. It was designed in view of the frequent need of genomic researchers to access data pertaining to a common set of genes. In the GATO system, annotation is generated by querying some of the Web-accessible resources and the information is stored in a local database, which keeps a record of all previous annotation results. GATO may be accessed from everywhere through the internet or may be run locally if a large number of sequences are going to be annotated. It is implemented in PHP and Perl and may be run on any suitable Web server. Usually, installation and application of annotation systems require experience and are time consuming, but GATO is simple and practical, allowing anyone with basic skills in informatics to access it without any special training. GATO can be downloaded at [http://mariwork.iq.usp.br/gato/]. Minimum computer free space required is 2 MB.

  1. GSV Annotated Bibliography

    Energy Technology Data Exchange (ETDEWEB)

    Roberts, Randy S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Pope, Paul A. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Jiang, Ming [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Trucano, Timothy G. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Aragon, Cecilia R. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Ni, Kevin [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Wei, Thomas [Argonne National Lab. (ANL), Argonne, IL (United States); Chilton, Lawrence K. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Bakel, Alan [Argonne National Lab. (ANL), Argonne, IL (United States)

    2010-09-14

    The following annotated bibliography was developed as part of the geospatial algorithm verification and validation (GSV) project for the Simulation, Algorithms and Modeling program of NA-22. Verification and Validation of geospatial image analysis algorithms covers a wide range of technologies. Papers in the bibliography are thus organized into the following five topic areas: Image processing and analysis, usability and validation of geospatial image analysis algorithms, image distance measures, scene modeling and image rendering, and transportation simulation models. Many other papers were studied during the course of the investigation including. The annotations for these articles can be found in the paper "On the verification and validation of geospatial image analysis algorithms".

  2. Annotated bibliography of the Russian languages literature on glaciology for 2015

    Directory of Open Access Journals (Sweden)

    V. M. Kotlyakov

    2017-01-01

    Full Text Available The proposed annual bibliography continues annotated lists of the Russian‑language literature on glaciology that were regularly published in the past. It includes 245 references grouped into the following ten sections: 1  general issues of glaciology; 2  physics and chemistry of ice; 3  atmospheric ice; 4  snow cover; 5  ava‑ lanches and glacial mudflows; 6 sea ice; 7 river and lake ice; 8 icings and ground ice; 9 the glaciers and ice caps; 10 palaeoglaciology. In addition to the works of the current year, some works of earlier years are added, that, for various reasons, were not included in previous bibliographies.

  3. Solar Tutorial and Annotation Resource (STAR)

    Science.gov (United States)

    Showalter, C.; Rex, R.; Hurlburt, N. E.; Zita, E. J.

    2009-12-01

    We have written a software suite designed to facilitate solar data analysis by scientists, students, and the public, anticipating enormous datasets from future instruments. Our “STAR" suite includes an interactive learning section explaining 15 classes of solar events. Users learn software tools that exploit humans’ superior ability (over computers) to identify many events. Annotation tools include time slice generation to quantify loop oscillations, the interpolation of event shapes using natural cubic splines (for loops, sigmoids, and filaments) and closed cubic splines (for coronal holes). Learning these tools in an environment where examples are provided prepares new users to comfortably utilize annotation software with new data. Upon completion of our tutorial, users are presented with media of various solar events and asked to identify and annotate the images, to test their mastery of the system. Goals of the project include public input into the data analysis of very large datasets from future solar satellites, and increased public interest and knowledge about the Sun. In 2010, the Solar Dynamics Observatory (SDO) will be launched into orbit. SDO’s advancements in solar telescope technology will generate a terabyte per day of high-quality data, requiring innovation in data management. While major projects develop automated feature recognition software, so that computers can complete much of the initial event tagging and analysis, still, that software cannot annotate features such as sigmoids, coronal magnetic loops, coronal dimming, etc., due to large amounts of data concentrated in relatively small areas. Previously, solar physicists manually annotated these features, but with the imminent influx of data it is unrealistic to expect specialized researchers to examine every image that computers cannot fully process. A new approach is needed to efficiently process these data. Providing analysis tools and data access to students and the public have proven

  4. Discovering gene annotations in biomedical text databases

    Directory of Open Access Journals (Sweden)

    Ozsoyoglu Gultekin

    2008-03-01

    Full Text Available Abstract Background Genes and gene products are frequently annotated with Gene Ontology concepts based on the evidence provided in genomics articles. Manually locating and curating information about a genomic entity from the biomedical literature requires vast amounts of human effort. Hence, there is clearly a need forautomated computational tools to annotate the genes and gene products with Gene Ontology concepts by computationally capturing the related knowledge embedded in textual data. Results In this article, we present an automated genomic entity annotation system, GEANN, which extracts information about the characteristics of genes and gene products in article abstracts from PubMed, and translates the discoveredknowledge into Gene Ontology (GO concepts, a widely-used standardized vocabulary of genomic traits. GEANN utilizes textual "extraction patterns", and a semantic matching framework to locate phrases matching to a pattern and produce Gene Ontology annotations for genes and gene products. In our experiments, GEANN has reached to the precision level of 78% at therecall level of 61%. On a select set of Gene Ontology concepts, GEANN either outperforms or is comparable to two other automated annotation studies. Use of WordNet for semantic pattern matching improves the precision and recall by 24% and 15%, respectively, and the improvement due to semantic pattern matching becomes more apparent as the Gene Ontology terms become more general. Conclusion GEANN is useful for two distinct purposes: (i automating the annotation of genomic entities with Gene Ontology concepts, and (ii providing existing annotations with additional "evidence articles" from the literature. The use of textual extraction patterns that are constructed based on the existing annotations achieve high precision. The semantic pattern matching framework provides a more flexible pattern matching scheme with respect to "exactmatching" with the advantage of locating approximate

  5. Frame on frames: an annotated bibliography

    International Nuclear Information System (INIS)

    Wright, T.; Tsao, H.J.

    1983-01-01

    The success or failure of any sample survey of a finite population is largely dependent upon the condition and adequacy of the list or frame from which the probability sample is selected. Much of the published survey sampling related work has focused on the measurement of sampling errors and, more recently, on nonsampling errors to a lesser extent. Recent studies on data quality for various types of data collection systems have revealed that the extent of the nonsampling errors far exceeds that of the sampling errors in many cases. While much of this nonsampling error, which is difficult to measure, can be attributed to poor frames, relatively little effort or theoretical work has focused on this contribution to total error. The objective of this paper is to present an annotated bibliography on frames with the hope that it will bring together, for experimenters, a number of suggestions for action when sampling from imperfect frames and that more attention will be given to this area of survey methods research

  6. High-Level waste process and product data annotated bibliography

    International Nuclear Information System (INIS)

    Stegen, G.E.

    1996-01-01

    The objective of this document is to provide information on available issued documents that will assist interested parties in finding available data on high-level waste and transuranic waste feed compositions, properties, behavior in candidate processing operations, and behavior on candidate product glasses made from those wastes. This initial compilation is only a partial list of available references

  7. Annotating the human genome with Disease Ontology

    Science.gov (United States)

    Osborne, John D; Flatow, Jared; Holko, Michelle; Lin, Simon M; Kibbe, Warren A; Zhu, Lihua (Julie); Danila, Maria I; Feng, Gang; Chisholm, Rex L

    2009-01-01

    Background The human genome has been extensively annotated with Gene Ontology for biological functions, but minimally computationally annotated for diseases. Results We used the Unified Medical Language System (UMLS) MetaMap Transfer tool (MMTx) to discover gene-disease relationships from the GeneRIF database. We utilized a comprehensive subset of UMLS, which is disease-focused and structured as a directed acyclic graph (the Disease Ontology), to filter and interpret results from MMTx. The results were validated against the Homayouni gene collection using recall and precision measurements. We compared our results with the widely used Online Mendelian Inheritance in Man (OMIM) annotations. Conclusion The validation data set suggests a 91% recall rate and 97% precision rate of disease annotation using GeneRIF, in contrast with a 22% recall and 98% precision using OMIM. Our thesaurus-based approach allows for comparisons to be made between disease containing databases and allows for increased accuracy in disease identification through synonym matching. The much higher recall rate of our approach demonstrates that annotating human genome with Disease Ontology and GeneRIF for diseases dramatically increases the coverage of the disease annotation of human genome. PMID:19594883

  8. MIPS bacterial genomes functional annotation benchmark dataset.

    Science.gov (United States)

    Tetko, Igor V; Brauner, Barbara; Dunger-Kaltenbach, Irmtraud; Frishman, Goar; Montrone, Corinna; Fobo, Gisela; Ruepp, Andreas; Antonov, Alexey V; Surmeli, Dimitrij; Mewes, Hans-Wernen

    2005-05-15

    Any development of new methods for automatic functional annotation of proteins according to their sequences requires high-quality data (as benchmark) as well as tedious preparatory work to generate sequence parameters required as input data for the machine learning methods. Different program settings and incompatible protocols make a comparison of the analyzed methods difficult. The MIPS Bacterial Functional Annotation Benchmark dataset (MIPS-BFAB) is a new, high-quality resource comprising four bacterial genomes manually annotated according to the MIPS functional catalogue (FunCat). These resources include precalculated sequence parameters, such as sequence similarity scores, InterPro domain composition and other parameters that could be used to develop and benchmark methods for functional annotation of bacterial protein sequences. These data are provided in XML format and can be used by scientists who are not necessarily experts in genome annotation. BFAB is available at http://mips.gsf.de/proj/bfab

  9. Annotating non-coding regions of the genome.

    Science.gov (United States)

    Alexander, Roger P; Fang, Gang; Rozowsky, Joel; Snyder, Michael; Gerstein, Mark B

    2010-08-01

    Most of the human genome consists of non-protein-coding DNA. Recently, progress has been made in annotating these non-coding regions through the interpretation of functional genomics experiments and comparative sequence analysis. One can conceptualize functional genomics analysis as involving a sequence of steps: turning the output of an experiment into a 'signal' at each base pair of the genome; smoothing this signal and segmenting it into small blocks of initial annotation; and then clustering these small blocks into larger derived annotations and networks. Finally, one can relate functional genomics annotations to conserved units and measures of conservation derived from comparative sequence analysis.

  10. The surplus value of semantic annotations

    NARCIS (Netherlands)

    Marx, M.

    2010-01-01

    We compare the costs of semantic annotation of textual documents to its benefits for information processing tasks. Semantic annotation can improve the performance of retrieval tasks and facilitates an improved search experience through faceted search, focused retrieval, better document summaries,

  11. Annotation-based enrichment of Digital Objects using open-source frameworks

    Directory of Open Access Journals (Sweden)

    Marcus Emmanuel Barnes

    2017-07-01

    Full Text Available The W3C Web Annotation Data Model, Protocol, and Vocabulary unify approaches to annotations across the web, enabling their aggregation, discovery and persistence over time. In addition, new javascript libraries provide the ability for users to annotate multi-format content. In this paper, we describe how we have leveraged these developments to provide annotation features alongside Islandora’s existing preservation, access, and management capabilities. We also discuss our experience developing with the Web Annotation Model as an open web architecture standard, as well as our approach to integrating mature external annotation libraries. The resulting software (the Web Annotation Utility Module for Islandora accommodates annotation across multiple formats. This solution can be used in various digital scholarship contexts.

  12. Small terrestrial mammals of Albania: annotated list and distribution

    Directory of Open Access Journals (Sweden)

    Ferdinand Bego

    2009-02-01

    Full Text Available Abstract: Abstract We report for Albania new records of small terrestrial mammals (Erinaceomorpha, Soricomorpha, Rodentia and outline previously published data. Twenty-four species (one hedgehog, six soricomorhps and 17 rodents have been collected in 161 localities surveyed throughout the country. Nine species (Neomys anomalus, Crocidura leucodon, Talpa stankovici, Dryomys nitedula, Muscardinus avellanarius, Micromys minutus, Mus macedonicus, Myodes glareolus, and Microtus thomasi are recorded for Albania for the first time. The present list is far from being complete and presence of a further 11 species has to be confirmed. Riassunto I Micromammiferi dell'Albania: status e distribuzione Viene presentato un quadro della distribuzione dei micromammiferi in Albania, evidenziando le specie di recente scoperta così come alcuni dati già pubblicati. L'esame di 161 località distribuite sull'intero territorio nazionale ha permesso di raccogliere informazioni sulla presenza di 24 specie di micromammiferi (1 Erinaceomorpha, 6 Soricomorpha e 17 Rodentia. Nove specie  (Neomys anomalus, Crocidura leucodon, Talpa stankovici, Dryomys nitedula, Muscardinus avellanarius, Micromys minutus, Mus macedonicus, Myodes glareolus, e Microtus thomasi vengono segnalate per la prima volta. L'elenco qui presentato non può essere considerato definitivo. Ulteriori ricerche potrebbero accertare la presenza di altre 11 specie.

  13. Current and future trends in marine image annotation software

    Science.gov (United States)

    Gomes-Pereira, Jose Nuno; Auger, Vincent; Beisiegel, Kolja; Benjamin, Robert; Bergmann, Melanie; Bowden, David; Buhl-Mortensen, Pal; De Leo, Fabio C.; Dionísio, Gisela; Durden, Jennifer M.; Edwards, Luke; Friedman, Ariell; Greinert, Jens; Jacobsen-Stout, Nancy; Lerner, Steve; Leslie, Murray; Nattkemper, Tim W.; Sameoto, Jessica A.; Schoening, Timm; Schouten, Ronald; Seager, James; Singh, Hanumant; Soubigou, Olivier; Tojeira, Inês; van den Beld, Inge; Dias, Frederico; Tempera, Fernando; Santos, Ricardo S.

    2016-12-01

    Given the need to describe, analyze and index large quantities of marine imagery data for exploration and monitoring activities, a range of specialized image annotation tools have been developed worldwide. Image annotation - the process of transposing objects or events represented in a video or still image to the semantic level, may involve human interactions and computer-assisted solutions. Marine image annotation software (MIAS) have enabled over 500 publications to date. We review the functioning, application trends and developments, by comparing general and advanced features of 23 different tools utilized in underwater image analysis. MIAS requiring human input are basically a graphical user interface, with a video player or image browser that recognizes a specific time code or image code, allowing to log events in a time-stamped (and/or geo-referenced) manner. MIAS differ from similar software by the capability of integrating data associated to video collection, the most simple being the position coordinates of the video recording platform. MIAS have three main characteristics: annotating events in real time, posteriorly to annotation and interact with a database. These range from simple annotation interfaces, to full onboard data management systems, with a variety of toolboxes. Advanced packages allow to input and display data from multiple sensors or multiple annotators via intranet or internet. Posterior human-mediated annotation often include tools for data display and image analysis, e.g. length, area, image segmentation, point count; and in a few cases the possibility of browsing and editing previous dive logs or to analyze the annotations. The interaction with a database allows the automatic integration of annotations from different surveys, repeated annotation and collaborative annotation of shared datasets, browsing and querying of data. Progress in the field of automated annotation is mostly in post processing, for stable platforms or still images

  14. Combinatory annotation of cell membrane receptors and signalling pathways of Bombyx mori prothoracic glands

    Science.gov (United States)

    Moulos, Panagiotis; Samiotaki, Martina; Panayotou, George; Dedos, Skarlatos G.

    2016-01-01

    The cells of prothoracic glands (PG) are the main site of synthesis and secretion of ecdysteroids, the biochemical products of cholesterol conversion to steroids that shape the morphogenic development of insects. Despite the availability of genome sequences from several insect species and the extensive knowledge of certain signalling pathways that underpin ecdysteroidogenesis, the spectrum of signalling molecules and ecdysteroidogenic cascades is still not fully comprehensive. To fill this gap and obtain the complete list of cell membrane receptors expressed in PG cells, we used combinatory bioinformatic, proteomic and transcriptomic analysis and quantitative PCR to annotate and determine the expression profiles of genes identified as putative cell membrane receptors of the model insect species, Bombyx mori, and subsequently enrich the repertoire of signalling pathways that are present in its PG cells. The genome annotation dataset we report here highlights modules and pathways that may be directly involved in ecdysteroidogenesis and aims to disseminate data and assist other researchers in the discovery of the role of such receptors and their ligands. PMID:27576083

  15. PANNZER2: a rapid functional annotation web server.

    Science.gov (United States)

    Törönen, Petri; Medlar, Alan; Holm, Liisa

    2018-05-08

    The unprecedented growth of high-throughput sequencing has led to an ever-widening annotation gap in protein databases. While computational prediction methods are available to make up the shortfall, a majority of public web servers are hindered by practical limitations and poor performance. Here, we introduce PANNZER2 (Protein ANNotation with Z-scoRE), a fast functional annotation web server that provides both Gene Ontology (GO) annotations and free text description predictions. PANNZER2 uses SANSparallel to perform high-performance homology searches, making bulk annotation based on sequence similarity practical. PANNZER2 can output GO annotations from multiple scoring functions, enabling users to see which predictions are robust across predictors. Finally, PANNZER2 predictions scored within the top 10 methods for molecular function and biological process in the CAFA2 NK-full benchmark. The PANNZER2 web server is updated on a monthly schedule and is accessible at http://ekhidna2.biocenter.helsinki.fi/sanspanz/. The source code is available under the GNU Public Licence v3.

  16. Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model

    Directory of Open Access Journals (Sweden)

    Zhai Chengxiang

    2010-05-01

    Full Text Available Abstract Background Large-scale genomic studies often identify large gene lists, for example, the genes sharing the same expression patterns. The interpretation of these gene lists is generally achieved by extracting concepts overrepresented in the gene lists. This analysis often depends on manual annotation of genes based on controlled vocabularies, in particular, Gene Ontology (GO. However, the annotation of genes is a labor-intensive process; and the vocabularies are generally incomplete, leaving some important biological domains inadequately covered. Results We propose a statistical method that uses the primary literature, i.e. free-text, as the source to perform overrepresentation analysis. The method is based on a statistical framework of mixture model and addresses the methodological flaws in several existing programs. We implemented this method within a literature mining system, BeeSpace, taking advantage of its analysis environment and added features that facilitate the interactive analysis of gene sets. Through experimentation with several datasets, we showed that our program can effectively summarize the important conceptual themes of large gene sets, even when traditional GO-based analysis does not yield informative results. Conclusions We conclude that the current work will provide biologists with a tool that effectively complements the existing ones for overrepresentation analysis from genomic experiments. Our program, Genelist Analyzer, is freely available at: http://workerbee.igb.uiuc.edu:8080/BeeSpace/Search.jsp

  17. An annotated list of the species of the genus Corbicula from Indonesia (Mollusca: Corbiculidae)

    NARCIS (Netherlands)

    Djajasasmita, Machfudz

    1977-01-01

    The species of the genus Corbicula known from Indonesia are alphabetically listed and noted. Sixteen out of the 35 described species are considered valid, i.e. C. gustaviana, C. moltkiana, C. sumatrana, C. tobae and C. tumida from Sumatra; C. javanica, C. pulchella and C. rivalis from Java; C.

  18. MitoBamAnnotator: A web-based tool for detecting and annotating heteroplasmy in human mitochondrial DNA sequences.

    Science.gov (United States)

    Zhidkov, Ilia; Nagar, Tal; Mishmar, Dan; Rubin, Eitan

    2011-11-01

    The use of Next-Generation Sequencing of mitochondrial DNA is becoming widespread in biological and clinical research. This, in turn, creates a need for a convenient tool that detects and analyzes heteroplasmy. Here we present MitoBamAnnotator, a user friendly web-based tool that allows maximum flexibility and control in heteroplasmy research. MitoBamAnnotator provides the user with a comprehensively annotated overview of mitochondrial genetic variation, allowing for an in-depth analysis with no prior knowledge in programming. Copyright © 2011 Elsevier B.V. and Mitochondria Research Society. All rights reserved. All rights reserved.

  19. Correction of the Caulobacter crescentus NA1000 genome annotation.

    Directory of Open Access Journals (Sweden)

    Bert Ely

    Full Text Available Bacterial genome annotations are accumulating rapidly in the GenBank database and the use of automated annotation technologies to create these annotations has become the norm. However, these automated methods commonly result in a small, but significant percentage of genome annotation errors. To improve accuracy and reliability, we analyzed the Caulobacter crescentus NA1000 genome utilizing computer programs Artemis and MICheck to manually examine the third codon position GC content, alignment to a third codon position GC frame plot peak, and matches in the GenBank database. We identified 11 new genes, modified the start site of 113 genes, and changed the reading frame of 38 genes that had been incorrectly annotated. Furthermore, our manual method of identifying protein-coding genes allowed us to remove 112 non-coding regions that had been designated as coding regions. The improved NA1000 genome annotation resulted in a reduction in the use of rare codons since noncoding regions with atypical codon usage were removed from the annotation and 49 new coding regions were added to the annotation. Thus, a more accurate codon usage table was generated as well. These results demonstrate that a comparison of the location of peaks third codon position GC content to the location of protein coding regions could be used to verify the annotation of any genome that has a GC content that is greater than 60%.

  20. Annotation of regular polysemy and underspecification

    DEFF Research Database (Denmark)

    Martínez Alonso, Héctor; Pedersen, Bolette Sandford; Bel, Núria

    2013-01-01

    We present the result of an annotation task on regular polysemy for a series of seman- tic classes or dot types in English, Dan- ish and Spanish. This article describes the annotation process, the results in terms of inter-encoder agreement, and the sense distributions obtained with two methods...

  1. PCAS – a precomputed proteome annotation database resource

    Directory of Open Access Journals (Sweden)

    Luo Jingchu

    2003-11-01

    Full Text Available Abstract Background Many model proteomes or "complete" sets of proteins of given organisms are now publicly available. Much effort has been invested in computational annotation of those "draft" proteomes. Motif or domain based algorithms play a pivotal role in functional classification of proteins. Employing most available computational algorithms, mainly motif or domain recognition algorithms, we set up to develop an online proteome annotation system with integrated proteome annotation data to complement existing resources. Results We report here the development of PCAS (ProteinCentric Annotation System as an online resource of pre-computed proteome annotation data. We applied most available motif or domain databases and their analysis methods, including hmmpfam search of HMMs in Pfam, SMART and TIGRFAM, RPS-PSIBLAST search of PSSMs in CDD, pfscan of PROSITE patterns and profiles, as well as PSI-BLAST search of SUPERFAMILY PSSMs. In addition, signal peptide and TM are predicted using SignalP and TMHMM respectively. We mapped SUPERFAMILY and COGs to InterPro, so the motif or domain databases are integrated through InterPro. PCAS displays table summaries of pre-computed data and a graphical presentation of motifs or domains relative to the protein. As of now, PCAS contains human IPI, mouse IPI, and rat IPI, A. thaliana, C. elegans, D. melanogaster, S. cerevisiae, and S. pombe proteome. PCAS is available at http://pak.cbi.pku.edu.cn/proteome/gca.php Conclusion PCAS gives better annotation coverage for model proteomes by employing a wider collection of available algorithms. Besides presenting the most confident annotation data, PCAS also allows customized query so users can inspect statistically less significant boundary information as well. Therefore, besides providing general annotation information, PCAS could be used as a discovery platform. We plan to update PCAS twice a year. We will upgrade PCAS when new proteome annotation algorithms

  2. A semi-automatic annotation tool for cooking video

    Science.gov (United States)

    Bianco, Simone; Ciocca, Gianluigi; Napoletano, Paolo; Schettini, Raimondo; Margherita, Roberto; Marini, Gianluca; Gianforme, Giorgio; Pantaleo, Giuseppe

    2013-03-01

    In order to create a cooking assistant application to guide the users in the preparation of the dishes relevant to their profile diets and food preferences, it is necessary to accurately annotate the video recipes, identifying and tracking the foods of the cook. These videos present particular annotation challenges such as frequent occlusions, food appearance changes, etc. Manually annotate the videos is a time-consuming, tedious and error-prone task. Fully automatic tools that integrate computer vision algorithms to extract and identify the elements of interest are not error free, and false positive and false negative detections need to be corrected in a post-processing stage. We present an interactive, semi-automatic tool for the annotation of cooking videos that integrates computer vision techniques under the supervision of the user. The annotation accuracy is increased with respect to completely automatic tools and the human effort is reduced with respect to completely manual ones. The performance and usability of the proposed tool are evaluated on the basis of the time and effort required to annotate the same video sequences.

  3. Experiments with crowdsourced re-annotation of a POS tagging data set

    DEFF Research Database (Denmark)

    Hovy, Dirk; Plank, Barbara; Søgaard, Anders

    2014-01-01

    Crowdsourcing lets us collect multiple annotations for an item from several annotators. Typically, these are annotations for non-sequential classification tasks. While there has been some work on crowdsourcing named entity annotations, researchers have assumed that syntactic tasks such as part......-of-speech (POS) tagging cannot be crowdsourced. This paper shows that workers can actually annotate sequential data almost as well as experts. Further, we show that the models learned from crowdsourced annotations fare as well as the models learned from expert annotations in downstream tasks....

  4. MPEG-7 based video annotation and browsing

    Science.gov (United States)

    Hoeynck, Michael; Auweiler, Thorsten; Wellhausen, Jens

    2003-11-01

    The huge amount of multimedia data produced worldwide requires annotation in order to enable universal content access and to provide content-based search-and-retrieval functionalities. Since manual video annotation can be time consuming, automatic annotation systems are required. We review recent approaches to content-based indexing and annotation of videos for different kind of sports and describe our approach to automatic annotation of equestrian sports videos. We especially concentrate on MPEG-7 based feature extraction and content description, where we apply different visual descriptors for cut detection. Further, we extract the temporal positions of single obstacles on the course by analyzing MPEG-7 edge information. Having determined single shot positions as well as the visual highlights, the information is jointly stored with meta-textual information in an MPEG-7 description scheme. Based on this information, we generate content summaries which can be utilized in a user-interface in order to provide content-based access to the video stream, but further for media browsing on a streaming server.

  5. Annotation of two large contiguous regions from the Haemonchus contortus genome using RNA-seq and comparative analysis with Caenorhabditis elegans.

    Directory of Open Access Journals (Sweden)

    Roz Laing

    Full Text Available The genomes of numerous parasitic nematodes are currently being sequenced, but their complexity and size, together with high levels of intra-specific sequence variation and a lack of reference genomes, makes their assembly and annotation a challenging task. Haemonchus contortus is an economically significant parasite of livestock that is widely used for basic research as well as for vaccine development and drug discovery. It is one of many medically and economically important parasites within the strongylid nematode group. This group of parasites has the closest phylogenetic relationship with the model organism Caenorhabditis elegans, making comparative analysis a potentially powerful tool for genome annotation and functional studies. To investigate this hypothesis, we sequenced two contiguous fragments from the H. contortus genome and undertook detailed annotation and comparative analysis with C. elegans. The adult H. contortus transcriptome was sequenced using an Illumina platform and RNA-seq was used to annotate a 409 kb overlapping BAC tiling path relating to the X chromosome and a 181 kb BAC insert relating to chromosome I. In total, 40 genes and 12 putative transposable elements were identified. 97.5% of the annotated genes had detectable homologues in C. elegans of which 60% had putative orthologues, significantly higher than previous analyses based on EST analysis. Gene density appears to be less in H. contortus than in C. elegans, with annotated H. contortus genes being an average of two-to-three times larger than their putative C. elegans orthologues due to a greater intron number and size. Synteny appears high but gene order is generally poorly conserved, although areas of conserved microsynteny are apparent. C. elegans operons appear to be partially conserved in H. contortus. Our findings suggest that a combination of RNA-seq and comparative analysis with C. elegans is a powerful approach for the annotation and analysis of strongylid

  6. Thermal effects on aquatic organisms: an annotated bibliography of the 1977 literature

    Energy Technology Data Exchange (ETDEWEB)

    Talmage, S.S. (comp.)

    1978-12-01

    This bibliography, containing 537 references from the 1977 literature, is the seventh in a series of annotated bibliographies on the effects of heat on aquatic organisms. The effects of thermal discharges at power plant sites are emphasized. Laboratory and field studies on temperature tolerance and the effects of temperature changes on reproduction, development, growth, distribution, physiology, and sensitivity to other stresses are included. References in the bibliography are divided into three subject categories: marine systems, freshwater systems, and estuaries. The references are arranged alphabetically by first author. Indexes are provided for author, keywords, subject category, geographic location of the study, taxon, and title (alphabetical listing of keywords-in-context of nontrivial words in the title).

  7. Ground Truth Annotation in T Analyst

    DEFF Research Database (Denmark)

    2015-01-01

    This video shows how to annotate the ground truth tracks in the thermal videos. The ground truth tracks are produced to be able to compare them to tracks obtained from a Computer Vision tracking approach. The program used for annotation is T-Analyst, which is developed by Aliaksei Laureshyn, Ph...

  8. Gene calling and bacterial genome annotation with BG7.

    Science.gov (United States)

    Tobes, Raquel; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Kovach, Evdokim; Alekhin, Alexey; Pareja, Eduardo

    2015-01-01

    New massive sequencing technologies are providing many bacterial genome sequences from diverse taxa but a refined annotation of these genomes is crucial for obtaining scientific findings and new knowledge. Thus, bacterial genome annotation has emerged as a key point to investigate in bacteria. Any efficient tool designed specifically to annotate bacterial genomes sequenced with massively parallel technologies has to consider the specific features of bacterial genomes (absence of introns and scarcity of nonprotein-coding sequence) and of next-generation sequencing (NGS) technologies (presence of errors and not perfectly assembled genomes). These features make it convenient to focus on coding regions and, hence, on protein sequences that are the elements directly related with biological functions. In this chapter we describe how to annotate bacterial genomes with BG7, an open-source tool based on a protein-centered gene calling/annotation paradigm. BG7 is specifically designed for the annotation of bacterial genomes sequenced with NGS. This tool is sequence error tolerant maintaining their capabilities for the annotation of highly fragmented genomes or for annotating mixed sequences coming from several genomes (as those obtained through metagenomics samples). BG7 has been designed with scalability as a requirement, with a computing infrastructure completely based on cloud computing (Amazon Web Services).

  9. Annotation of the Evaluative Language in a Dependency Treebank

    Directory of Open Access Journals (Sweden)

    Šindlerová Jana

    2017-12-01

    Full Text Available In the paper, we present our efforts to annotate evaluative language in the Prague Dependency Treebank 2.0. The project is a follow-up of the series of annotations of small plaintext corpora. It uses automatic identification of potentially evaluative nodes through mapping a Czech subjectivity lexicon to syntactically annotated data. These nodes are then manually checked by an annotator and either dismissed as standing in a non-evaluative context, or confirmed as evaluative. In the latter case, information about the polarity orientation, the source and target of evaluation is added by the annotator. The annotations unveiled several advantages and disadvantages of the chosen framework. The advantages involve more structured and easy-to-handle environment for the annotator, visibility of syntactic patterning of the evaluative state, effective solving of discontinuous structures or a new perspective on the influence of good/bad news. The disadvantages include little capability of treating cases with evaluation spread among more syntactically connected nodes at once, little capability of treating metaphorical expressions, or disregarding the effects of negation and intensification in the current scheme.

  10. An annotated history of container candidate material selection

    International Nuclear Information System (INIS)

    McCright, R.D.

    1988-07-01

    This paper documents events in the Nevada Nuclear Waste Storage Investigations (NNWSI) Project that have influenced the selection of metals and alloys proposed for fabrication of waste package containers for permanent disposal of high-level nuclear waste in a repository at Yucca Mountain, Nevada. The time period from 1981 to 1988 is covered in this annotated history. The history traces the candidate materials that have been considered at different stages of site characterization planning activities. At present, six candidate materials are considered and described in the 1988 Consultation Draft of the NNWSI Site Characterization Plan (SCP). The six materials are grouped into two alloy families, copper-base materials and iron to nickel-base materials with an austenitic structure. The three austenitic candidates resulted from a 1983 survey of a longer list of candidate materials; the other three candidates resulted from a special request from DOE in 1984 to evaluate copper and copper-base alloys. 24 refs., 2 tabs

  11. The caBIG annotation and image Markup project.

    Science.gov (United States)

    Channin, David S; Mongkolwat, Pattanasak; Kleper, Vladimir; Sepukar, Kastubh; Rubin, Daniel L

    2010-04-01

    Image annotation and markup are at the core of medical interpretation in both the clinical and the research setting. Digital medical images are managed with the DICOM standard format. While DICOM contains a large amount of meta-data about whom, where, and how the image was acquired, DICOM says little about the content or meaning of the pixel data. An image annotation is the explanatory or descriptive information about the pixel data of an image that is generated by a human or machine observer. An image markup is the graphical symbols placed over the image to depict an annotation. While DICOM is the standard for medical image acquisition, manipulation, transmission, storage, and display, there are no standards for image annotation and markup. Many systems expect annotation to be reported verbally, while markups are stored in graphical overlays or proprietary formats. This makes it difficult to extract and compute with both of them. The goal of the Annotation and Image Markup (AIM) project is to develop a mechanism, for modeling, capturing, and serializing image annotation and markup data that can be adopted as a standard by the medical imaging community. The AIM project produces both human- and machine-readable artifacts. This paper describes the AIM information model, schemas, software libraries, and tools so as to prepare researchers and developers for their use of AIM.

  12. Interoperable Multimedia Annotation and Retrieval for the Tourism Sector

    NARCIS (Netherlands)

    Chatzitoulousis, Antonios; Efraimidis, Pavlos S.; Athanasiadis, I.N.

    2015-01-01

    The Atlas Metadata System (AMS) employs semantic web annotation techniques in order to create an interoperable information annotation and retrieval platform for the tourism sector. AMS adopts state-of-the-art metadata vocabularies, annotation techniques and semantic web technologies.

  13. Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles.

    Science.gov (United States)

    Cohen, K Bretonnel; Lanfranchi, Arrick; Choi, Miji Joo-Young; Bada, Michael; Baumgartner, William A; Panteleyeva, Natalya; Verspoor, Karin; Palmer, Martha; Hunter, Lawrence E

    2017-08-17

    Coreference resolution is the task of finding strings in text that have the same referent as other strings. Failures of coreference resolution are a common cause of false negatives in information extraction from the scientific literature. In order to better understand the nature of the phenomenon of coreference in biomedical publications and to increase performance on the task, we annotated the Colorado Richly Annotated Full Text (CRAFT) corpus with coreference relations. The corpus was manually annotated with coreference relations, including identity and appositives for all coreferring base noun phrases. The OntoNotes annotation guidelines, with minor adaptations, were used. Interannotator agreement ranges from 0.480 (entity-based CEAF) to 0.858 (Class-B3), depending on the metric that is used to assess it. The resulting corpus adds nearly 30,000 annotations to the previous release of the CRAFT corpus. Differences from related projects include a much broader definition of markables, connection to extensive annotation of several domain-relevant semantic classes, and connection to complete syntactic annotation. Tool performance was benchmarked on the data. A publicly available out-of-the-box, general-domain coreference resolution system achieved an F-measure of 0.14 (B3), while a simple domain-adapted rule-based system achieved an F-measure of 0.42. An ensemble of the two reached F of 0.46. Following the IDENTITY chains in the data would add 106,263 additional named entities in the full 97-paper corpus, for an increase of 76% percent in the semantic classes of the eight ontologies that have been annotated in earlier versions of the CRAFT corpus. The project produced a large data set for further investigation of coreference and coreference resolution in the scientific literature. The work raised issues in the phenomenon of reference in this domain and genre, and the paper proposes that many mentions that would be considered generic in the general domain are not

  14. A Novel Approach to Semantic and Coreference Annotation at LLNL

    Energy Technology Data Exchange (ETDEWEB)

    Firpo, M

    2005-02-04

    A case is made for the importance of high quality semantic and coreference annotation. The challenges of providing such annotation are described. Asperger's Syndrome is introduced, and the connections are drawn between the needs of text annotation and the abilities of persons with Asperger's Syndrome to meet those needs. Finally, a pilot program is recommended wherein semantic annotation is performed by people with Asperger's Syndrome. The primary points embodied in this paper are as follows: (1) Document annotation is essential to the Natural Language Processing (NLP) projects at Lawrence Livermore National Laboratory (LLNL); (2) LLNL does not currently have a system in place to meet its need for text annotation; (3) Text annotation is challenging for a variety of reasons, many related to its very rote nature; (4) Persons with Asperger's Syndrome are particularly skilled at rote verbal tasks, and behavioral experts agree that they would excel at text annotation; and (6) A pilot study is recommend in which two to three people with Asperger's Syndrome annotate documents and then the quality and throughput of their work is evaluated relative to that of their neuro-typical peers.

  15. ORCAN-a web-based meta-server for real-time detection and functional annotation of orthologs.

    Science.gov (United States)

    Zielezinski, Andrzej; Dziubek, Michal; Sliski, Jan; Karlowski, Wojciech M

    2017-04-15

    ORCAN (ORtholog sCANner) is a web-based meta-server for one-click evolutionary and functional annotation of protein sequences. The server combines information from the most popular orthology-prediction resources, including four tools and four online databases. Functional annotation utilizes five additional comparisons between the query and identified homologs, including: sequence similarity, protein domain architectures, functional motifs, Gene Ontology term assignments and a list of associated articles. Furthermore, the server uses a plurality-based rating system to evaluate the orthology relationships and to rank the reference proteins by their evolutionary and functional relevance to the query. Using a dataset of ∼1 million true yeast orthologs as a sample reference set, we show that combining multiple orthology-prediction tools in ORCAN increases the sensitivity and precision by 1-2 percent points. The service is available for free at http://www.combio.pl/orcan/ . wmk@amu.edu.pl. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  16. Gene Ontology annotation of the rice blast fungus, Magnaporthe oryzae

    Directory of Open Access Journals (Sweden)

    Deng Jixin

    2009-02-01

    Full Text Available Abstract Background Magnaporthe oryzae, the causal agent of blast disease of rice, is the most destructive disease of rice worldwide. The genome of this fungal pathogen has been sequenced and an automated annotation has recently been updated to Version 6 http://www.broad.mit.edu/annotation/genome/magnaporthe_grisea/MultiDownloads.html. However, a comprehensive manual curation remains to be performed. Gene Ontology (GO annotation is a valuable means of assigning functional information using standardized vocabulary. We report an overview of the GO annotation for Version 5 of M. oryzae genome assembly. Methods A similarity-based (i.e., computational GO annotation with manual review was conducted, which was then integrated with a literature-based GO annotation with computational assistance. For similarity-based GO annotation a stringent reciprocal best hits method was used to identify similarity between predicted proteins of M. oryzae and GO proteins from multiple organisms with published associations to GO terms. Significant alignment pairs were manually reviewed. Functional assignments were further cross-validated with manually reviewed data, conserved domains, or data determined by wet lab experiments. Additionally, biological appropriateness of the functional assignments was manually checked. Results In total, 6,286 proteins received GO term assignment via the homology-based annotation, including 2,870 hypothetical proteins. Literature-based experimental evidence, such as microarray, MPSS, T-DNA insertion mutation, or gene knockout mutation, resulted in 2,810 proteins being annotated with GO terms. Of these, 1,673 proteins were annotated with new terms developed for Plant-Associated Microbe Gene Ontology (PAMGO. In addition, 67 experiment-determined secreted proteins were annotated with PAMGO terms. Integration of the two data sets resulted in 7,412 proteins (57% being annotated with 1,957 distinct and specific GO terms. Unannotated proteins

  17. Combined evidence annotation of transposable elements in genome sequences.

    Directory of Open Access Journals (Sweden)

    Hadi Quesneville

    2005-07-01

    Full Text Available Transposable elements (TEs are mobile, repetitive sequences that make up significant fractions of metazoan genomes. Despite their near ubiquity and importance in genome and chromosome biology, most efforts to annotate TEs in genome sequences rely on the results of a single computational program, RepeatMasker. In contrast, recent advances in gene annotation indicate that high-quality gene models can be produced from combining multiple independent sources of computational evidence. To elevate the quality of TE annotations to a level comparable to that of gene models, we have developed a combined evidence-model TE annotation pipeline, analogous to systems used for gene annotation, by integrating results from multiple homology-based and de novo TE identification methods. As proof of principle, we have annotated "TE models" in Drosophila melanogaster Release 4 genomic sequences using the combined computational evidence derived from RepeatMasker, BLASTER, TBLASTX, all-by-all BLASTN, RECON, TE-HMM and the previous Release 3.1 annotation. Our system is designed for use with the Apollo genome annotation tool, allowing automatic results to be curated manually to produce reliable annotations. The euchromatic TE fraction of D. melanogaster is now estimated at 5.3% (cf. 3.86% in Release 3.1, and we found a substantially higher number of TEs (n = 6,013 than previously identified (n = 1,572. Most of the new TEs derive from small fragments of a few hundred nucleotides long and highly abundant families not previously annotated (e.g., INE-1. We also estimated that 518 TE copies (8.6% are inserted into at least one other TE, forming a nest of elements. The pipeline allows rapid and thorough annotation of even the most complex TE models, including highly deleted and/or nested elements such as those often found in heterochromatic sequences. Our pipeline can be easily adapted to other genome sequences, such as those of the D. melanogaster heterochromatin or other

  18. NoGOA: predicting noisy GO annotations using evidences and sparse representation.

    Science.gov (United States)

    Yu, Guoxian; Lu, Chang; Wang, Jun

    2017-07-21

    Gene Ontology (GO) is a community effort to represent functional features of gene products. GO annotations (GOA) provide functional associations between GO terms and gene products. Due to resources limitation, only a small portion of annotations are manually checked by curators, and the others are electronically inferred. Although quality control techniques have been applied to ensure the quality of annotations, the community consistently report that there are still considerable noisy (or incorrect) annotations. Given the wide application of annotations, however, how to identify noisy annotations is an important but yet seldom studied open problem. We introduce a novel approach called NoGOA to predict noisy annotations. NoGOA applies sparse representation on the gene-term association matrix to reduce the impact of noisy annotations, and takes advantage of sparse representation coefficients to measure the semantic similarity between genes. Secondly, it preliminarily predicts noisy annotations of a gene based on aggregated votes from semantic neighborhood genes of that gene. Next, NoGOA estimates the ratio of noisy annotations for each evidence code based on direct annotations in GOA files archived on different periods, and then weights entries of the association matrix via estimated ratios and propagates weights to ancestors of direct annotations using GO hierarchy. Finally, it integrates evidence-weighted association matrix and aggregated votes to predict noisy annotations. Experiments on archived GOA files of six model species (H. sapiens, A. thaliana, S. cerevisiae, G. gallus, B. Taurus and M. musculus) demonstrate that NoGOA achieves significantly better results than other related methods and removing noisy annotations improves the performance of gene function prediction. The comparative study justifies the effectiveness of integrating evidence codes with sparse representation for predicting noisy GO annotations. Codes and datasets are available at http://mlda.swu.edu.cn/codes.php?name=NoGOA .

  19. Plann: A command-line application for annotating plastome sequences.

    Science.gov (United States)

    Huang, Daisie I; Cronk, Quentin C B

    2015-08-01

    Plann automates the process of annotating a plastome sequence in GenBank format for either downstream processing or for GenBank submission by annotating a new plastome based on a similar, well-annotated plastome. Plann is a Perl script to be executed on the command line. Plann compares a new plastome sequence to the features annotated in a reference plastome and then shifts the intervals of any matching features to the locations in the new plastome. Plann's output can be used in the National Center for Biotechnology Information's tbl2asn to create a Sequin file for GenBank submission. Unlike Web-based annotation packages, Plann is a locally executable script that will accurately annotate a plastome sequence to a locally specified reference plastome. Because it executes from the command line, it is ready to use in other software pipelines and can be easily rerun as a draft plastome is improved.

  20. Semantator: annotating clinical narratives with semantic web ontologies.

    Science.gov (United States)

    Song, Dezhao; Chute, Christopher G; Tao, Cui

    2012-01-01

    To facilitate clinical research, clinical data needs to be stored in a machine processable and understandable way. Manual annotating clinical data is time consuming. Automatic approaches (e.g., Natural Language Processing systems) have been adopted to convert such data into structured formats; however, the quality of such automatically extracted data may not always be satisfying. In this paper, we propose Semantator, a semi-automatic tool for document annotation with Semantic Web ontologies. With a loaded free text document and an ontology, Semantator supports the creation/deletion of ontology instances for any document fragment, linking/disconnecting instances with the properties in the ontology, and also enables automatic annotation by connecting to the NCBO annotator and cTAKES. By representing annotations in Semantic Web standards, Semantator supports reasoning based upon the underlying semantics of the owl:disjointWith and owl:equivalentClass predicates. We present discussions based on user experiences of using Semantator.

  1. MEETING: Chlamydomonas Annotation Jamboree - October 2003

    Energy Technology Data Exchange (ETDEWEB)

    Grossman, Arthur R

    2007-04-13

    Shotgun sequencing of the nuclear genome of Chlamydomonas reinhardtii (Chlamydomonas throughout) was performed at an approximate 10X coverage by JGI. Roughly half of the genome is now contained on 26 scaffolds, all of which are at least 1.6 Mb, and the coverage of the genome is ~95%. There are now over 200,000 cDNA sequence reads that we have generated as part of the Chlamydomonas genome project (Grossman, 2003; Shrager et al., 2003; Grossman et al. 2007; Merchant et al., 2007); other sequences have also been generated by the Kasuza sequence group (Asamizu et al., 1999; Asamizu et al., 2000) or individual laboratories that have focused on specific genes. Shrager et al. (2003) placed the reads into distinct contigs (an assemblage of reads with overlapping nucleotide sequences), and contigs that group together as part of the same genes have been designated ACEs (assembly of contigs generated from EST information). All of the reads have also been mapped to the Chlamydomonas nuclear genome and the cDNAs and their corresponding genomic sequences have been reassembled, and the resulting assemblage is called an ACEG (an Assembly of contiguous EST sequences supported by genomic sequence) (Jain et al., 2007). Most of the unique genes or ACEGs are also represented by gene models that have been generated by the Joint Genome Institute (JGI, Walnut Creek, CA). These gene models have been placed onto the DNA scaffolds and are presented as a track on the Chlamydomonas genome browser associated with the genome portal (http://genome.jgi-psf.org/Chlre3/Chlre3.home.html). Ultimately, the meeting grant awarded by DOE has helped enormously in the development of an annotation pipeline (a set of guidelines used in the annotation of genes) and resulted in high quality annotation of over 4,000 genes; the annotators were from both Europe and the USA. Some of the people who led the annotation initiative were Arthur Grossman, Olivier Vallon, and Sabeeha Merchant (with many individual

  2. IIS--Integrated Interactome System: a web-based platform for the annotation, analysis and visualization of protein-metabolite-gene-drug interactions by integrating a variety of data sources and tools.

    Science.gov (United States)

    Carazzolle, Marcelo Falsarella; de Carvalho, Lucas Miguel; Slepicka, Hugo Henrique; Vidal, Ramon Oliveira; Pereira, Gonçalo Amarante Guimarães; Kobarg, Jörg; Meirelles, Gabriela Vaz

    2014-01-01

    High-throughput screening of physical, genetic and chemical-genetic interactions brings important perspectives in the Systems Biology field, as the analysis of these interactions provides new insights into protein/gene function, cellular metabolic variations and the validation of therapeutic targets and drug design. However, such analysis depends on a pipeline connecting different tools that can automatically integrate data from diverse sources and result in a more comprehensive dataset that can be properly interpreted. We describe here the Integrated Interactome System (IIS), an integrative platform with a web-based interface for the annotation, analysis and visualization of the interaction profiles of proteins/genes, metabolites and drugs of interest. IIS works in four connected modules: (i) Submission module, which receives raw data derived from Sanger sequencing (e.g. two-hybrid system); (ii) Search module, which enables the user to search for the processed reads to be assembled into contigs/singlets, or for lists of proteins/genes, metabolites and drugs of interest, and add them to the project; (iii) Annotation module, which assigns annotations from several databases for the contigs/singlets or lists of proteins/genes, generating tables with automatic annotation that can be manually curated; and (iv) Interactome module, which maps the contigs/singlets or the uploaded lists to entries in our integrated database, building networks that gather novel identified interactions, protein and metabolite expression/concentration levels, subcellular localization and computed topological metrics, GO biological processes and KEGG pathways enrichment. This module generates a XGMML file that can be imported into Cytoscape or be visualized directly on the web. We have developed IIS by the integration of diverse databases following the need of appropriate tools for a systematic analysis of physical, genetic and chemical-genetic interactions. IIS was validated with yeast two

  3. Ontology modularization to improve semantic medical image annotation.

    Science.gov (United States)

    Wennerberg, Pinar; Schulz, Klaus; Buitelaar, Paul

    2011-02-01

    Searching for medical images and patient reports is a significant challenge in a clinical setting. The contents of such documents are often not described in sufficient detail thus making it difficult to utilize the inherent wealth of information contained within them. Semantic image annotation addresses this problem by describing the contents of images and reports using medical ontologies. Medical images and patient reports are then linked to each other through common annotations. Subsequently, search algorithms can more effectively find related sets of documents on the basis of these semantic descriptions. A prerequisite to realizing such a semantic search engine is that the data contained within should have been previously annotated with concepts from medical ontologies. One major challenge in this regard is the size and complexity of medical ontologies as annotation sources. Manual annotation is particularly time consuming labor intensive in a clinical environment. In this article we propose an approach to reducing the size of clinical ontologies for more efficient manual image and text annotation. More precisely, our goal is to identify smaller fragments of a large anatomy ontology that are relevant for annotating medical images from patients suffering from lymphoma. Our work is in the area of ontology modularization, which is a recent and active field of research. We describe our approach, methods and data set in detail and we discuss our results. Copyright © 2010 Elsevier Inc. All rights reserved.

  4. [Prescription annotations in Welfare Pharmacy].

    Science.gov (United States)

    Han, Yi

    2018-03-01

    Welfare Pharmacy contains medical formulas documented by the government and official prescriptions used by the official pharmacy in the pharmaceutical process. In the last years of Southern Song Dynasty, anonyms gave a lot of prescription annotations, made textual researches for the name, source, composition and origin of the prescriptions, and supplemented important historical data of medical cases and researched historical facts. The annotations of Welfare Pharmacy gathered the essence of medical theory, and can be used as precious materials to correctly understand the syndrome differentiation, compatibility regularity and clinical application of prescriptions. This article deeply investigated the style and form of the prescription annotations in Welfare Pharmacy, the name of prescriptions and the evolution of terminology, the major functions of the prescriptions, processing methods, instructions for taking medicine and taboos of prescriptions, the medical cases and clinical efficacy of prescriptions, the backgrounds, sources, composition and cultural meanings of prescriptions, proposed that the prescription annotations played an active role in the textual dissemination, patent medicine production and clinical diagnosis and treatment of Welfare Pharmacy. This not only helps understand the changes in the names and terms of traditional Chinese medicines in Welfare Pharmacy, but also provides the basis for understanding the knowledge sources, compatibility regularity, important drug innovations and clinical medications of prescriptions in Welfare Pharmacy. Copyright© by the Chinese Pharmaceutical Association.

  5. A framework for annotating human genome in disease context.

    Science.gov (United States)

    Xu, Wei; Wang, Huisong; Cheng, Wenqing; Fu, Dong; Xia, Tian; Kibbe, Warren A; Lin, Simon M

    2012-01-01

    Identification of gene-disease association is crucial to understanding disease mechanism. A rapid increase in biomedical literatures, led by advances of genome-scale technologies, poses challenge for manually-curated-based annotation databases to characterize gene-disease associations effectively and timely. We propose an automatic method-The Disease Ontology Annotation Framework (DOAF) to provide a comprehensive annotation of the human genome using the computable Disease Ontology (DO), the NCBO Annotator service and NCBI Gene Reference Into Function (GeneRIF). DOAF can keep the resulting knowledgebase current by periodically executing automatic pipeline to re-annotate the human genome using the latest DO and GeneRIF releases at any frequency such as daily or monthly. Further, DOAF provides a computable and programmable environment which enables large-scale and integrative analysis by working with external analytic software or online service platforms. A user-friendly web interface (doa.nubic.northwestern.edu) is implemented to allow users to efficiently query, download, and view disease annotations and the underlying evidences.

  6. Annotating abstract pronominal anaphora in the DAD project

    DEFF Research Database (Denmark)

    Navarretta, Costanza; Olsen, Sussi Anni

    2008-01-01

    n this paper we present an extension of the MATE/GNOME annotation scheme for anaphora (Poesio 2004) which accounts for abstract anaphora in Danish and Italian. By abstract anaphora it is here meant pronouns whose linguistic antecedents are verbal phrases, clauses and discourse segments. The exten......n this paper we present an extension of the MATE/GNOME annotation scheme for anaphora (Poesio 2004) which accounts for abstract anaphora in Danish and Italian. By abstract anaphora it is here meant pronouns whose linguistic antecedents are verbal phrases, clauses and discourse segments....... The extended scheme, which we call the DAD annotation scheme, allows to annotate information about abstract anaphora which is important to investigate their use, see Webber (1988), Gundel et al. (2003), Navarretta (2004) and which can influence their automatic treatment. Intercoder agreement scores obtained...... by applying the DAD annotation scheme on texts and dialogues in the two languages are given and show that th information proposed in the scheme can be recognised in a reliable way....

  7. Annotated bibliography

    International Nuclear Information System (INIS)

    1997-08-01

    Under a cooperative agreement with the U.S. Department of Energy's Office of Science and Technology, Waste Policy Institute (WPI) is conducting a five-year research project to develop a research-based approach for integrating communication products in stakeholder involvement related to innovative technology. As part of the research, WPI developed this annotated bibliography which contains almost 100 citations of articles/books/resources involving topics related to communication and public involvement aspects of deploying innovative cleanup technology. To compile the bibliography, WPI performed on-line literature searches (e.g., Dialog, International Association of Business Communicators Public Relations Society of America, Chemical Manufacturers Association, etc.), consulted past years proceedings of major environmental waste cleanup conferences (e.g., Waste Management), networked with professional colleagues and DOE sites to gather reports or case studies, and received input during the August 1996 Research Design Team meeting held to discuss the project's research methodology. Articles were selected for annotation based upon their perceived usefulness to the broad range of public involvement and communication practitioners

  8. Supporting Keyword Search for Image Retrieval with Integration of Probabilistic Annotation

    Directory of Open Access Journals (Sweden)

    Tie Hua Zhou

    2015-05-01

    Full Text Available The ever-increasing quantities of digital photo resources are annotated with enriching vocabularies to form semantic annotations. Photo-sharing social networks have boosted the need for efficient and intuitive querying to respond to user requirements in large-scale image collections. In order to help users formulate efficient and effective image retrieval, we present a novel integration of a probabilistic model based on keyword query architecture that models the probability distribution of image annotations: allowing users to obtain satisfactory results from image retrieval via the integration of multiple annotations. We focus on the annotation integration step in order to specify the meaning of each image annotation, thus leading to the most representative annotations of the intent of a keyword search. For this demonstration, we show how a probabilistic model has been integrated to semantic annotations to allow users to intuitively define explicit and precise keyword queries in order to retrieve satisfactory image results distributed in heterogeneous large data sources. Our experiments on SBU (collected by Stony Brook University database show that (i our integrated annotation contains higher quality representatives and semantic matches; and (ii the results indicating annotation integration can indeed improve image search result quality.

  9. Quick Pad Tagger : An Efficient Graphical User Interface for Building Annotated Corpora with Multiple Annotation Layers

    OpenAIRE

    Marc Schreiber; Kai Barkschat; Bodo Kraft; Albert Zundorf

    2015-01-01

    More and more domain specific applications in the internet make use of Natural Language Processing (NLP) tools (e. g. Information Extraction systems). The output quality of these applications relies on the output quality of the used NLP tools. Often, the quality can be increased by annotating a domain specific corpus. However, annotating a corpus is a time consuming and exhaustive task. To reduce the annota tion time we present...

  10. Supplementary Material for: BEACON: automated tool for Bacterial GEnome Annotation ComparisON

    KAUST Repository

    Kalkatawi, Manal M.; Alam, Intikhab; Bajic, Vladimir B.

    2015-01-01

    Abstract Background Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). Results The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACONâ s utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27 %, while the number of genes without any function assignment is reduced. Conclusions We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/ .

  11. Extending eScience Provenance with User-Submitted Semantic Annotations

    Science.gov (United States)

    Michaelis, J.; Zednik, S.; West, P.; Fox, P. A.; McGuinness, D. L.

    2010-12-01

    eScience based systems generate provenance of their data products, related to such things as: data processing, data collection conditions, expert evaluation, and data product quality. Recent advances in web-based technology offer users the possibility of making annotations to both data products and steps in accompanying provenance traces, thereby expanding the utility of such provenance for others. These contributing users may have varying backgrounds, ranging from system experts to outside domain experts to citizen scientists. Furthermore, such users may wish to make varying types of annotations - ranging from documenting the purpose of a provenance step to raising concerns about the quality of data dependencies. Semantic Web technologies allow for such kinds of rich annotations to be made to provenance through the use of ontology vocabularies for (i) organizing provenance, and (ii) organizing user/annotation classifications. Furthermore, through Linked Data practices, Semantic linkages may be made from provenance steps to external data of interest. A desire for Semantically-annotated provenance has been motivated by data management issues in the Mauna Loa Solar Observatory’s (MLSO) Advanced Coronal Observing System (ACOS). In ACOS, photomoeter-based readings are taken of solar activity and subsequently processed into final data products consumable by end users. At intermediate stages of ACOS processing, factors such as evaluations by human experts and weather conditions are logged, which could impact data product quality. If such factors are linked via user-submitted annotations to provenance, it could be significantly beneficial for other users. Likewise, the background of a user could impact the credibility of their annotations. For example, an annotation made by a citizen scientist describing the purpose of a provenance step may not be as reliable as a similar annotation made by an ACOS project member. For this work, we have developed a software package that

  12. Harnessing Collaborative Annotations on Online Formative Assessments

    Science.gov (United States)

    Lin, Jian-Wei; Lai, Yuan-Cheng

    2013-01-01

    This paper harnesses collaborative annotations by students as learning feedback on online formative assessments to improve the learning achievements of students. Through the developed Web platform, students can conduct formative assessments, collaboratively annotate, and review historical records in a convenient way, while teachers can generate…

  13. Crowdsourcing and annotating NER for Twitter #drift

    DEFF Research Database (Denmark)

    Fromreide, Hege; Hovy, Dirk; Søgaard, Anders

    2014-01-01

    We present two new NER datasets for Twitter; a manually annotated set of 1,467 tweets (kappa=0.942) and a set of 2,975 expert-corrected, crowdsourced NER annotated tweets from the dataset described in Finin et al. (2010). In our experiments with these datasets, we observe two important points: (a......) language drift on Twitter is significant, and while off-the-shelf systems have been reported to perform well on in-sample data, they often perform poorly on new samples of tweets, (b) state-of-the-art performance across various datasets can beobtained from crowdsourced annotations, making it more feasible...

  14. An open annotation ontology for science on web 3.0.

    Science.gov (United States)

    Ciccarese, Paolo; Ocana, Marco; Garcia Castro, Leyla Jael; Das, Sudeshna; Clark, Tim

    2011-05-17

    There is currently a gap between the rich and expressive collection of published biomedical ontologies, and the natural language expression of biomedical papers consumed on a daily basis by scientific researchers. The purpose of this paper is to provide an open, shareable structure for dynamic integration of biomedical domain ontologies with the scientific document, in the form of an Annotation Ontology (AO), thus closing this gap and enabling application of formal biomedical ontologies directly to the literature as it emerges. Initial requirements for AO were elicited by analysis of integration needs between biomedical web communities, and of needs for representing and integrating results of biomedical text mining. Analysis of strengths and weaknesses of previous efforts in this area was also performed. A series of increasingly refined annotation tools were then developed along with a metadata model in OWL, and deployed for feedback and additional requirements the ontology to users at a major pharmaceutical company and a major academic center. Further requirements and critiques of the model were also elicited through discussions with many colleagues and incorporated into the work. This paper presents Annotation Ontology (AO), an open ontology in OWL-DL for annotating scientific documents on the web. AO supports both human and algorithmic content annotation. It enables "stand-off" or independent metadata anchored to specific positions in a web document by any one of several methods. In AO, the document may be annotated but is not required to be under update control of the annotator. AO contains a provenance model to support versioning, and a set model for specifying groups and containers of annotation. AO is freely available under open source license at http://purl.org/ao/, and extensive documentation including screencasts is available on AO's Google Code page: http://code.google.com/p/annotation-ontology/ . The Annotation Ontology meets critical requirements for

  15. ACID: annotation of cassette and integron data

    Directory of Open Access Journals (Sweden)

    Stokes Harold W

    2009-04-01

    Full Text Available Abstract Background Although integrons and their associated gene cassettes are present in ~10% of bacteria and can represent up to 3% of the genome in which they are found, very few have been properly identified and annotated in public databases. These genetic elements have been overlooked in comparison to other vectors that facilitate lateral gene transfer between microorganisms. Description By automating the identification of integron integrase genes and of the non-coding cassette-associated attC recombination sites, we were able to assemble a database containing all publicly available sequence information regarding these genetic elements. Specialists manually curated the database and this information was used to improve the automated detection and annotation of integrons and their encoded gene cassettes. ACID (annotation of cassette and integron data can be searched using a range of queries and the data can be downloaded in a number of formats. Users can readily annotate their own data and integrate it into ACID using the tools provided. Conclusion ACID is a community resource providing easy access to annotations of integrons and making tools available to detect them in novel sequence data. ACID also hosts a forum to prompt integron-related discussion, which can hopefully lead to a more universal definition of this genetic element.

  16. Use of Annotations for Component and Framework Interoperability

    Science.gov (United States)

    David, O.; Lloyd, W.; Carlson, J.; Leavesley, G. H.; Geter, F.

    2009-12-01

    The popular programming languages Java and C# provide annotations, a form of meta-data construct. Software frameworks for web integration, web services, database access, and unit testing now take advantage of annotations to reduce the complexity of APIs and the quantity of integration code between the application and framework infrastructure. Adopting annotation features in frameworks has been observed to lead to cleaner and leaner application code. The USDA Object Modeling System (OMS) version 3.0 fully embraces the annotation approach and additionally defines a meta-data standard for components and models. In version 3.0 framework/model integration previously accomplished using API calls is now achieved using descriptive annotations. This enables the framework to provide additional functionality non-invasively such as implicit multithreading, and auto-documenting capabilities while achieving a significant reduction in the size of the model source code. Using a non-invasive methodology leads to models and modeling components with only minimal dependencies on the modeling framework. Since models and modeling components are not directly bound to framework by the use of specific APIs and/or data types they can more easily be reused both within the framework as well as outside of it. To study the effectiveness of an annotation based framework approach with other modeling frameworks, a framework-invasiveness study was conducted to evaluate the effects of framework design on model code quality. A monthly water balance model was implemented across several modeling frameworks and several software metrics were collected. The metrics selected were measures of non-invasive design methods for modeling frameworks from a software engineering perspective. It appears that the use of annotations positively impacts several software quality measures. In a next step, the PRMS model was implemented in OMS 3.0 and is currently being implemented for water supply forecasting in the

  17. Creating Gaze Annotations in Head Mounted Displays

    DEFF Research Database (Denmark)

    Mardanbeigi, Diako; Qvarfordt, Pernilla

    2015-01-01

    To facilitate distributed communication in mobile settings, we developed GazeNote for creating and sharing gaze annotations in head mounted displays (HMDs). With gaze annotations it possible to point out objects of interest within an image and add a verbal description. To create an annota- tion...

  18. Ontological Annotation with WordNet

    Energy Technology Data Exchange (ETDEWEB)

    Sanfilippo, Antonio P.; Tratz, Stephen C.; Gregory, Michelle L.; Chappell, Alan R.; Whitney, Paul D.; Posse, Christian; Paulson, Patrick R.; Baddeley, Bob; Hohimer, Ryan E.; White, Amanda M.

    2006-06-06

    Semantic Web applications require robust and accurate annotation tools that are capable of automating the assignment of ontological classes to words in naturally occurring text (ontological annotation). Most current ontologies do not include rich lexical databases and are therefore not easily integrated with word sense disambiguation algorithms that are needed to automate ontological annotation. WordNet provides a potentially ideal solution to this problem as it offers a highly structured lexical conceptual representation that has been extensively used to develop word sense disambiguation algorithms. However, WordNet has not been designed as an ontology, and while it can be easily turned into one, the result of doing this would present users with serious practical limitations due to the great number of concepts (synonym sets) it contains. Moreover, mapping WordNet to an existing ontology may be difficult and requires substantial labor. We propose to overcome these limitations by developing an analytical platform that (1) provides a WordNet-based ontology offering a manageable and yet comprehensive set of concept classes, (2) leverages the lexical richness of WordNet to give an extensive characterization of concept class in terms of lexical instances, and (3) integrates a class recognition algorithm that automates the assignment of concept classes to words in naturally occurring text. The ensuing framework makes available an ontological annotation platform that can be effectively integrated with intelligence analysis systems to facilitate evidence marshaling and sustain the creation and validation of inference models.

  19. Challenges in Whole-Genome Annotation of Pyrosequenced Eukaryotic Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Kuo, Alan; Grigoriev, Igor

    2009-04-17

    Pyrosequencing technologies such as 454/Roche and Solexa/Illumina vastly lower the cost of nucleotide sequencing compared to the traditional Sanger method, and thus promise to greatly expand the number of sequenced eukaryotic genomes. However, the new technologies also bring new challenges such as shorter reads and new kinds and higher rates of sequencing errors, which complicate genome assembly and gene prediction. At JGI we are deploying 454 technology for the sequencing and assembly of ever-larger eukaryotic genomes. Here we describe our first whole-genome annotation of a purely 454-sequenced fungal genome that is larger than a yeast (>30 Mbp). The pezizomycotine (filamentous ascomycote) Aspergillus carbonarius belongs to the Aspergillus section Nigri species complex, members of which are significant as platforms for bioenergy and bioindustrial technology, as members of soil microbial communities and players in the global carbon cycle, and as agricultural toxigens. Application of a modified version of the standard JGI Annotation Pipeline has so far predicted ~;;10k genes. ~;;12percent of these preliminary annotations suffer a potential frameshift error, which is somewhat higher than the ~;;9percent rate in the Sanger-sequenced and conventionally assembled and annotated genome of fellow Aspergillus section Nigri member A. niger. Also,>90percent of A. niger genes have potential homologs in the A. carbonarius preliminary annotation. Weconclude, and with further annotation and comparative analysis expect to confirm, that 454 sequencing strategies provide a promising substrate for annotation of modestly sized eukaryotic genomes. We will also present results of annotation of a number of other pyrosequenced fungal genomes of bioenergy interest.

  20. 76 FR 11350 - National Oil and Hazardous Substances Pollution Contingency Plan; National Priorities List...

    Science.gov (United States)

    2011-03-02

    ..., Reporting and recordkeeping requirements, Superfund, Water pollution control, Water supply. Dated: February... and Hazardous Substances Pollution Contingency Plan; National Priorities List: Partial Deletion of the... Mexico, from the National Priorities List (NPL). The NPL, promulgated pursuant to section 105 of the...

  1. Functional annotation of hierarchical modularity.

    Directory of Open Access Journals (Sweden)

    Kanchana Padmanabhan

    Full Text Available In biological networks of molecular interactions in a cell, network motifs that are biologically relevant are also functionally coherent, or form functional modules. These functionally coherent modules combine in a hierarchical manner into larger, less cohesive subsystems, thus revealing one of the essential design principles of system-level cellular organization and function-hierarchical modularity. Arguably, hierarchical modularity has not been explicitly taken into consideration by most, if not all, functional annotation systems. As a result, the existing methods would often fail to assign a statistically significant functional coherence score to biologically relevant molecular machines. We developed a methodology for hierarchical functional annotation. Given the hierarchical taxonomy of functional concepts (e.g., Gene Ontology and the association of individual genes or proteins with these concepts (e.g., GO terms, our method will assign a Hierarchical Modularity Score (HMS to each node in the hierarchy of functional modules; the HMS score and its p-value measure functional coherence of each module in the hierarchy. While existing methods annotate each module with a set of "enriched" functional terms in a bag of genes, our complementary method provides the hierarchical functional annotation of the modules and their hierarchically organized components. A hierarchical organization of functional modules often comes as a bi-product of cluster analysis of gene expression data or protein interaction data. Otherwise, our method will automatically build such a hierarchy by directly incorporating the functional taxonomy information into the hierarchy search process and by allowing multi-functional genes to be part of more than one component in the hierarchy. In addition, its underlying HMS scoring metric ensures that functional specificity of the terms across different levels of the hierarchical taxonomy is properly treated. We have evaluated our

  2. Fluid Annotations in a Open World

    DEFF Research Database (Denmark)

    Zellweger, Polle Trescott; Bouvin, Niels Olof; Jehøj, Henning

    2001-01-01

    Fluid Documents use animated typographical changes to provide a novel and appealing user experience for hypertext browsing and for viewing document annotations in context. This paper describes an effort to broaden the utility of Fluid Documents by using the open hypermedia Arakne Environment to l...... to layer fluid annotations and links on top of abitrary HTML pages on the World Wide Web. Changes to both Fluid Documents and Arakne are required....

  3. neXtA5: accelerating annotation of articles via automated approaches in neXtProt.

    Science.gov (United States)

    Mottin, Luc; Gobeill, Julien; Pasche, Emilie; Michel, Pierre-André; Cusin, Isabelle; Gaudet, Pascale; Ruch, Patrick

    2016-01-01

    The rapid increase in the number of published articles poses a challenge for curated databases to remain up-to-date. To help the scientific community and database curators deal with this issue, we have developed an application, neXtA5, which prioritizes the literature for specific curation requirements. Our system, neXtA5, is a curation service composed of three main elements. The first component is a named-entity recognition module, which annotates MEDLINE over some predefined axes. This report focuses on three axes: Diseases, the Molecular Function and Biological Process sub-ontologies of the Gene Ontology (GO). The automatic annotations are then stored in a local database, BioMed, for each annotation axis. Additional entities such as species and chemical compounds are also identified. The second component is an existing search engine, which retrieves the most relevant MEDLINE records for any given query. The third component uses the content of BioMed to generate an axis-specific ranking, which takes into account the density of named-entities as stored in the Biomed database. The two ranked lists are ultimately merged using a linear combination, which has been specifically tuned to support the annotation of each axis. The fine-tuning of the coefficients is formally reported for each axis-driven search. Compared with PubMed, which is the system used by most curators, the improvement is the following: +231% for Diseases, +236% for Molecular Functions and +3153% for Biological Process when measuring the precision of the top-returned PMID (P0 or mean reciprocal rank). The current search methods significantly improve the search effectiveness of curators for three important curation axes. Further experiments are being performed to extend the curation types, in particular protein-protein interactions, which require specific relationship extraction capabilities. In parallel, user-friendly interfaces powered with a set of JSON web services are currently being

  4. Model and Interoperability using Meta Data Annotations

    Science.gov (United States)

    David, O.

    2011-12-01

    Software frameworks and architectures are in need for meta data to efficiently support model integration. Modelers have to know the context of a model, often stepping into modeling semantics and auxiliary information usually not provided in a concise structure and universal format, consumable by a range of (modeling) tools. XML often seems the obvious solution for capturing meta data, but its wide adoption to facilitate model interoperability is limited by XML schema fragmentation, complexity, and verbosity outside of a data-automation process. Ontologies seem to overcome those shortcomings, however the practical significance of their use remains to be demonstrated. OMS version 3 took a different approach for meta data representation. The fundamental building block of a modular model in OMS is a software component representing a single physical process, calibration method, or data access approach. Here, programing language features known as Annotations or Attributes were adopted. Within other (non-modeling) frameworks it has been observed that annotations lead to cleaner and leaner application code. Framework-supported model integration, traditionally accomplished using Application Programming Interfaces (API) calls is now achieved using descriptive code annotations. Fully annotated components for various hydrological and Ag-system models now provide information directly for (i) model assembly and building, (ii) data flow analysis for implicit multi-threading or visualization, (iii) automated and comprehensive model documentation of component dependencies, physical data properties, (iv) automated model and component testing, calibration, and optimization, and (v) automated audit-traceability to account for all model resources leading to a particular simulation result. Such a non-invasive methodology leads to models and modeling components with only minimal dependencies on the modeling framework but a strong reference to its originating code. Since models and

  5. Black English Annotations for Elementary Reading Programs.

    Science.gov (United States)

    Prasad, Sandre

    This report describes a program that uses annotations in the teacher's editions of existing reading programs to indicate the characteristics of black English that may interfere with the reading process of black children. The first part of the report provides a rationale for the annotation approach, explaining that the discrepancy between written…

  6. MetaStorm: A Public Resource for Customizable Metagenomics Annotation.

    Directory of Open Access Journals (Sweden)

    Gustavo Arango-Argoty

    Full Text Available Metagenomics is a trending research area, calling for the need to analyze large quantities of data generated from next generation DNA sequencing technologies. The need to store, retrieve, analyze, share, and visualize such data challenges current online computational systems. Interpretation and annotation of specific information is especially a challenge for metagenomic data sets derived from environmental samples, because current annotation systems only offer broad classification of microbial diversity and function. Moreover, existing resources are not configured to readily address common questions relevant to environmental systems. Here we developed a new online user-friendly metagenomic analysis server called MetaStorm (http://bench.cs.vt.edu/MetaStorm/, which facilitates customization of computational analysis for metagenomic data sets. Users can upload their own reference databases to tailor the metagenomics annotation to focus on various taxonomic and functional gene markers of interest. MetaStorm offers two major analysis pipelines: an assembly-based annotation pipeline and the standard read annotation pipeline used by existing web servers. These pipelines can be selected individually or together. Overall, MetaStorm provides enhanced interactive visualization to allow researchers to explore and manipulate taxonomy and functional annotation at various levels of resolution.

  7. MetaStorm: A Public Resource for Customizable Metagenomics Annotation.

    Science.gov (United States)

    Arango-Argoty, Gustavo; Singh, Gargi; Heath, Lenwood S; Pruden, Amy; Xiao, Weidong; Zhang, Liqing

    2016-01-01

    Metagenomics is a trending research area, calling for the need to analyze large quantities of data generated from next generation DNA sequencing technologies. The need to store, retrieve, analyze, share, and visualize such data challenges current online computational systems. Interpretation and annotation of specific information is especially a challenge for metagenomic data sets derived from environmental samples, because current annotation systems only offer broad classification of microbial diversity and function. Moreover, existing resources are not configured to readily address common questions relevant to environmental systems. Here we developed a new online user-friendly metagenomic analysis server called MetaStorm (http://bench.cs.vt.edu/MetaStorm/), which facilitates customization of computational analysis for metagenomic data sets. Users can upload their own reference databases to tailor the metagenomics annotation to focus on various taxonomic and functional gene markers of interest. MetaStorm offers two major analysis pipelines: an assembly-based annotation pipeline and the standard read annotation pipeline used by existing web servers. These pipelines can be selected individually or together. Overall, MetaStorm provides enhanced interactive visualization to allow researchers to explore and manipulate taxonomy and functional annotation at various levels of resolution.

  8. MIPS: analysis and annotation of genome information in 2007.

    Science.gov (United States)

    Mewes, H W; Dietmann, S; Frishman, D; Gregory, R; Mannhaupt, G; Mayer, K F X; Münsterkötter, M; Ruepp, A; Spannagl, M; Stümpflen, V; Rattei, T

    2008-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) combines automatic processing of large amounts of sequences with manual annotation of selected model genomes. Due to the massive growth of the available data, the depth of annotation varies widely between independent databases. Also, the criteria for the transfer of information from known to orthologous sequences are diverse. To cope with the task of global in-depth genome annotation has become unfeasible. Therefore, our efforts are dedicated to three levels of annotation: (i) the curation of selected genomes, in particular from fungal and plant taxa (e.g. CYGD, MNCDB, MatDB), (ii) the comprehensive, consistent, automatic annotation employing exhaustive methods for the computation of sequence similarities and sequence-related attributes as well as the classification of individual sequences (SIMAP, PEDANT and FunCat) and (iii) the compilation of manually curated databases for protein interactions based on scrutinized information from the literature to serve as an accepted set of reliable annotated interaction data (MPACT, MPPI, CORUM). All databases and tools described as well as the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).

  9. MetaStorm: A Public Resource for Customizable Metagenomics Annotation

    Science.gov (United States)

    Arango-Argoty, Gustavo; Singh, Gargi; Heath, Lenwood S.; Pruden, Amy; Xiao, Weidong; Zhang, Liqing

    2016-01-01

    Metagenomics is a trending research area, calling for the need to analyze large quantities of data generated from next generation DNA sequencing technologies. The need to store, retrieve, analyze, share, and visualize such data challenges current online computational systems. Interpretation and annotation of specific information is especially a challenge for metagenomic data sets derived from environmental samples, because current annotation systems only offer broad classification of microbial diversity and function. Moreover, existing resources are not configured to readily address common questions relevant to environmental systems. Here we developed a new online user-friendly metagenomic analysis server called MetaStorm (http://bench.cs.vt.edu/MetaStorm/), which facilitates customization of computational analysis for metagenomic data sets. Users can upload their own reference databases to tailor the metagenomics annotation to focus on various taxonomic and functional gene markers of interest. MetaStorm offers two major analysis pipelines: an assembly-based annotation pipeline and the standard read annotation pipeline used by existing web servers. These pipelines can be selected individually or together. Overall, MetaStorm provides enhanced interactive visualization to allow researchers to explore and manipulate taxonomy and functional annotation at various levels of resolution. PMID:27632579

  10. Improving Microbial Genome Annotations in an Integrated Database Context

    Science.gov (United States)

    Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken; Anderson, Iain; Mavromatis, Konstantinos; Kyrpides, Nikos C.; Ivanova, Natalia N.

    2013-01-01

    Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG) family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at http://img.jgi.doe.gov/. PMID:23424620

  11. Improving microbial genome annotations in an integrated database context.

    Directory of Open Access Journals (Sweden)

    I-Min A Chen

    Full Text Available Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at http://img.jgi.doe.gov/.

  12. The Bologna Annotation Resource (BAR 3.0): improving protein functional annotation.

    Science.gov (United States)

    Profiti, Giuseppe; Martelli, Pier Luigi; Casadio, Rita

    2017-07-03

    BAR 3.0 updates our server BAR (Bologna Annotation Resource) for predicting protein structural and functional features from sequence. We increase data volume, query capabilities and information conveyed to the user. The core of BAR 3.0 is a graph-based clustering procedure of UniProtKB sequences, following strict pairwise similarity criteria (sequence identity ≥40% with alignment coverage ≥90%). Each cluster contains the available annotation downloaded from UniProtKB, GO, PFAM and PDB. After statistical validation, GO terms and PFAM domains are cluster-specific and annotate new sequences entering the cluster after satisfying similarity constraints. BAR 3.0 includes 28 869 663 sequences in 1 361 773 clusters, of which 22.2% (22 241 661 sequences) and 47.4% (24 555 055 sequences) have at least one validated GO term and one PFAM domain, respectively. 1.4% of the clusters (36% of all sequences) include PDB structures and the cluster is associated to a hidden Markov model that allows building template-target alignment suitable for structural modeling. Some other 3 399 026 sequences are singletons. BAR 3.0 offers an improved search interface, allowing queries by UniProtKB-accession, Fasta sequence, GO-term, PFAM-domain, organism, PDB and ligand/s. When evaluated on the CAFA2 targets, BAR 3.0 largely outperforms our previous version and scores among state-of-the-art methods. BAR 3.0 is publicly available and accessible at http://bar.biocomp.unibo.it/bar3. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. BAT: An open-source, web-based audio events annotation tool

    OpenAIRE

    Blai Meléndez-Catalan, Emilio Molina, Emilia Gómez

    2017-01-01

    In this paper we present BAT (BMAT Annotation Tool), an open-source, web-based tool for the manual annotation of events in audio recordings developed at BMAT (Barcelona Music and Audio Technologies). The main feature of the tool is that it provides an easy way to annotate the salience of simultaneous sound sources. Additionally, it allows to define multiple ontologies to adapt to multiple tasks and offers the possibility to cross-annotate audio data. Moreover, it is easy to install and deploy...

  14. Annotating images by mining image search results.

    Science.gov (United States)

    Wang, Xin-Jing; Zhang, Lei; Li, Xirong; Ma, Wei-Ying

    2008-11-01

    Although it has been studied for years by the computer vision and machine learning communities, image annotation is still far from practical. In this paper, we propose a novel attempt at model-free image annotation, which is a data-driven approach that annotates images by mining their search results. Some 2.4 million images with their surrounding text are collected from a few photo forums to support this approach. The entire process is formulated in a divide-and-conquer framework where a query keyword is provided along with the uncaptioned image to improve both the effectiveness and efficiency. This is helpful when the collected data set is not dense everywhere. In this sense, our approach contains three steps: 1) the search process to discover visually and semantically similar search results, 2) the mining process to identify salient terms from textual descriptions of the search results, and 3) the annotation rejection process to filter out noisy terms yielded by Step 2. To ensure real-time annotation, two key techniques are leveraged-one is to map the high-dimensional image visual features into hash codes, the other is to implement it as a distributed system, of which the search and mining processes are provided as Web services. As a typical result, the entire process finishes in less than 1 second. Since no training data set is required, our approach enables annotating with unlimited vocabulary and is highly scalable and robust to outliers. Experimental results on both real Web images and a benchmark image data set show the effectiveness and efficiency of the proposed algorithm. It is also worth noting that, although the entire approach is illustrated within the divide-and conquer framework, a query keyword is not crucial to our current implementation. We provide experimental results to prove this.

  15. Collaborative Paper-Based Annotation of Lecture Slides

    Science.gov (United States)

    Steimle, Jurgen; Brdiczka, Oliver; Muhlhauser, Max

    2009-01-01

    In a study of notetaking in university courses, we found that the large majority of students prefer paper to computer-based media like Tablet PCs for taking notes and making annotations. Based on this finding, we developed CoScribe, a concept and system which supports students in making collaborative handwritten annotations on printed lecture…

  16. Music journals in South Africa 1854-2010: an annotated bibliography

    African Journals Online (AJOL)

    Music journals in South Africa 1854-2010: an annotated bibliography. ... The article focuses on presenting an annotated bibliography of music journalism in South Africa from as early as 1854 until 2010. Most of ... Key words: annotated bibliography, electronic journals, music journals, periodicals, South African music history ...

  17. An annotated corpus with nanomedicine and pharmacokinetic parameters.

    Science.gov (United States)

    Lewinski, Nastassja A; Jimenez, Ivan; McInnes, Bridget T

    2017-01-01

    A vast amount of data on nanomedicines is being generated and published, and natural language processing (NLP) approaches can automate the extraction of unstructured text-based data. Annotated corpora are a key resource for NLP and information extraction methods which employ machine learning. Although corpora are available for pharmaceuticals, resources for nanomedicines and nanotechnology are still limited. To foster nanotechnology text mining (NanoNLP) efforts, we have constructed a corpus of annotated drug product inserts taken from the US Food and Drug Administration's Drugs@FDA online database. In this work, we present the development of the Engineered Nanomedicine Database corpus to support the evaluation of nanomedicine entity extraction. The data were manually annotated for 21 entity mentions consisting of nanomedicine physicochemical characterization, exposure, and biologic response information of 41 Food and Drug Administration-approved nanomedicines. We evaluate the reliability of the manual annotations and demonstrate the use of the corpus by evaluating two state-of-the-art named entity extraction systems, OpenNLP and Stanford NER. The annotated corpus is available open source and, based on these results, guidelines and suggestions for future development of additional nanomedicine corpora are provided.

  18. Plann: A command-line application for annotating plastome sequences1

    Science.gov (United States)

    Huang, Daisie I.; Cronk, Quentin C. B.

    2015-01-01

    Premise of the study: Plann automates the process of annotating a plastome sequence in GenBank format for either downstream processing or for GenBank submission by annotating a new plastome based on a similar, well-annotated plastome. Methods and Results: Plann is a Perl script to be executed on the command line. Plann compares a new plastome sequence to the features annotated in a reference plastome and then shifts the intervals of any matching features to the locations in the new plastome. Plann’s output can be used in the National Center for Biotechnology Information’s tbl2asn to create a Sequin file for GenBank submission. Conclusions: Unlike Web-based annotation packages, Plann is a locally executable script that will accurately annotate a plastome sequence to a locally specified reference plastome. Because it executes from the command line, it is ready to use in other software pipelines and can be easily rerun as a draft plastome is improved. PMID:26312193

  19. Evaluation of web-based annotation of ophthalmic images for multicentric clinical trials.

    Science.gov (United States)

    Chalam, K V; Jain, P; Shah, V A; Shah, Gaurav Y

    2006-06-01

    An Internet browser-based annotation system can be used to identify and describe features in digitalized retinal images, in multicentric clinical trials, in real time. In this web-based annotation system, the user employs a mouse to draw and create annotations on a transparent layer, that encapsulates the observations and interpretations of a specific image. Multiple annotation layers may be overlaid on a single image. These layers may correspond to annotations by different users on the same image or annotations of a temporal sequence of images of a disease process, over a period of time. In addition, geometrical properties of annotated figures may be computed and measured. The annotations are stored in a central repository database on a server, which can be retrieved by multiple users in real time. This system facilitates objective evaluation of digital images and comparison of double-blind readings of digital photographs, with an identifiable audit trail. Annotation of ophthalmic images allowed clinically feasible and useful interpretation to track properties of an area of fundus pathology. This provided an objective method to monitor properties of pathologies over time, an essential component of multicentric clinical trials. The annotation system also allowed users to view stereoscopic images that are stereo pairs. This web-based annotation system is useful and valuable in monitoring patient care, in multicentric clinical trials, telemedicine, teaching and routine clinical settings.

  20. Essential Annotation Schema for Ecology (EASE)-A framework supporting the efficient data annotation and faceted navigation in ecology.

    Science.gov (United States)

    Pfaff, Claas-Thido; Eichenberg, David; Liebergesell, Mario; König-Ries, Birgitta; Wirth, Christian

    2017-01-01

    Ecology has become a data intensive science over the last decades which often relies on the reuse of data in cross-experimental analyses. However, finding data which qualifies for the reuse in a specific context can be challenging. It requires good quality metadata and annotations as well as efficient search strategies. To date, full text search (often on the metadata only) is the most widely used search strategy although it is known to be inaccurate. Faceted navigation is providing a filter mechanism which is based on fine granular metadata, categorizing search objects along numeric and categorical parameters relevant for their discovery. Selecting from these parameters during a full text search creates a system of filters which allows to refine and improve the results towards more relevance. We developed a framework for the efficient annotation and faceted navigation in ecology. It consists of an XML schema for storing the annotation of search objects and is accompanied by a vocabulary focused on ecology to support the annotation process. The framework consolidates ideas which originate from widely accepted metadata standards, textbooks, scientific literature, and vocabularies as well as from expert knowledge contributed by researchers from ecology and adjacent disciplines.

  1. Essential Annotation Schema for Ecology (EASE-A framework supporting the efficient data annotation and faceted navigation in ecology.

    Directory of Open Access Journals (Sweden)

    Claas-Thido Pfaff

    Full Text Available Ecology has become a data intensive science over the last decades which often relies on the reuse of data in cross-experimental analyses. However, finding data which qualifies for the reuse in a specific context can be challenging. It requires good quality metadata and annotations as well as efficient search strategies. To date, full text search (often on the metadata only is the most widely used search strategy although it is known to be inaccurate. Faceted navigation is providing a filter mechanism which is based on fine granular metadata, categorizing search objects along numeric and categorical parameters relevant for their discovery. Selecting from these parameters during a full text search creates a system of filters which allows to refine and improve the results towards more relevance. We developed a framework for the efficient annotation and faceted navigation in ecology. It consists of an XML schema for storing the annotation of search objects and is accompanied by a vocabulary focused on ecology to support the annotation process. The framework consolidates ideas which originate from widely accepted metadata standards, textbooks, scientific literature, and vocabularies as well as from expert knowledge contributed by researchers from ecology and adjacent disciplines.

  2. 78 FR 73449 - National Oil and Hazardous Substances Pollution Contingency Plan; National Priorities List...

    Science.gov (United States)

    2013-12-06

    ... Substances Pollution Contingency Plan (NCP). This partial deletion pertains to the soil of 1,154 residential...] National Oil and Hazardous Substances Pollution Contingency Plan; National Priorities List: Partial Deletion of the Omaha Lead Superfund Site AGENCY: Environmental Protection Agency (EPA). ACTION: Final rule...

  3. 78 FR 69360 - National Oil and Hazardous Substances Pollution Contingency Plan; National Priorities List...

    Science.gov (United States)

    2013-11-19

    ...] National Oil and Hazardous Substances Pollution Contingency Plan; National Priorities List: Partial... and Hazardous Substances Pollution Contingency Plan (NCP). The EPA and the State of California... Corp Air Station Superfund Site without prior Notice of Intent for Partial Deletion because EPA views...

  4. 75 FR 43115 - National Oil and Hazardous Substances Pollution Contingency Plan; National Priorities List...

    Science.gov (United States)

    2010-07-23

    ... and Hazardous Substances Pollution Contingency Plan; National Priorities List: Intent to Partially..., as amended, is an appendix of the National Oil and Hazardous Substances Pollution Contingency Plan... Intent for Partial Deletion because EPA views this as a noncontroversial revision and anticipates no...

  5. Annotating Logical Forms for EHR Questions.

    Science.gov (United States)

    Roberts, Kirk; Demner-Fushman, Dina

    2016-05-01

    This paper discusses the creation of a semantically annotated corpus of questions about patient data in electronic health records (EHRs). The goal is to provide the training data necessary for semantic parsers to automatically convert EHR questions into a structured query. A layered annotation strategy is used which mirrors a typical natural language processing (NLP) pipeline. First, questions are syntactically analyzed to identify multi-part questions. Second, medical concepts are recognized and normalized to a clinical ontology. Finally, logical forms are created using a lambda calculus representation. We use a corpus of 446 questions asking for patient-specific information. From these, 468 specific questions are found containing 259 unique medical concepts and requiring 53 unique predicates to represent the logical forms. We further present detailed characteristics of the corpus, including inter-annotator agreement results, and describe the challenges automatic NLP systems will face on this task.

  6. Managing and Querying Image Annotation and Markup in XML.

    Science.gov (United States)

    Wang, Fusheng; Pan, Tony; Sharma, Ashish; Saltz, Joel

    2010-01-01

    Proprietary approaches for representing annotations and image markup are serious barriers for researchers to share image data and knowledge. The Annotation and Image Markup (AIM) project is developing a standard based information model for image annotation and markup in health care and clinical trial environments. The complex hierarchical structures of AIM data model pose new challenges for managing such data in terms of performance and support of complex queries. In this paper, we present our work on managing AIM data through a native XML approach, and supporting complex image and annotation queries through native extension of XQuery language. Through integration with xService, AIM databases can now be conveniently shared through caGrid.

  7. Managing and Querying Image Annotation and Markup in XML

    Science.gov (United States)

    Wang, Fusheng; Pan, Tony; Sharma, Ashish; Saltz, Joel

    2010-01-01

    Proprietary approaches for representing annotations and image markup are serious barriers for researchers to share image data and knowledge. The Annotation and Image Markup (AIM) project is developing a standard based information model for image annotation and markup in health care and clinical trial environments. The complex hierarchical structures of AIM data model pose new challenges for managing such data in terms of performance and support of complex queries. In this paper, we present our work on managing AIM data through a native XML approach, and supporting complex image and annotation queries through native extension of XQuery language. Through integration with xService, AIM databases can now be conveniently shared through caGrid. PMID:21218167

  8. Expressed Peptide Tags: An additional layer of data for genome annotation

    Energy Technology Data Exchange (ETDEWEB)

    Savidor, Alon [ORNL; Donahoo, Ryan S [ORNL; Hurtado-Gonzales, Oscar [University of Tennessee, Knoxville (UTK); Verberkmoes, Nathan C [ORNL; Shah, Manesh B [ORNL; Lamour, Kurt H [ORNL; McDonald, W Hayes [ORNL

    2006-01-01

    While genome sequencing is becoming ever more routine, genome annotation remains a challenging process. Identification of the coding sequences within the genomic milieu presents a tremendous challenge, especially for eukaryotes with their complex gene architectures. Here we present a method to assist the annotation process through the use of proteomic data and bioinformatics. Mass spectra of digested protein preparations of the organism of interest were acquired and searched against a protein database created by a six frame translation of the genome. The identified peptides were mapped back to the genome, compared to the current annotation, and then categorized as supporting or extending the current genome annotation. We named the classified peptides Expressed Peptide Tags (EPTs). The well annotated bacterium Rhodopseudomonas palustris was used as a control for the method and showed high degree of correlation between EPT mapping and the current annotation, with 86% of the EPTs confirming existing gene calls and less than 1% of the EPTs expanding on the current annotation. The eukaryotic plant pathogens Phytophthora ramorum and Phytophthora sojae, whose genomes have been recently sequenced and are much less well annotated, were also subjected to this method. A series of algorithmic steps were taken to increase the confidence of EPT identification for these organisms, including generation of smaller sub-databases to be searched against, and definition of EPT criteria that accommodates the more complex eukaryotic gene architecture. As expected, the analysis of the Phytophthora species showed less correlation between EPT mapping and their current annotation. While ~77% of Phytophthora EPTs supported the current annotation, a portion of them (7.2% and 12.6% for P. ramorum and P. sojae, respectively) suggested modification to current gene calls or identified novel genes that were missed by the current genome annotation of these organisms.

  9. Gene expression and functional annotation of the human and mouse choroid plexus epithelium.

    Directory of Open Access Journals (Sweden)

    Sarah F Janssen

    Full Text Available BACKGROUND: The choroid plexus epithelium (CPE is a lobed neuro-epithelial structure that forms the outer blood-brain barrier. The CPE protrudes into the brain ventricles and produces the cerebrospinal fluid (CSF, which is crucial for brain homeostasis. Malfunction of the CPE is possibly implicated in disorders like Alzheimer disease, hydrocephalus or glaucoma. To study human genetic diseases and potential new therapies, mouse models are widely used. This requires a detailed knowledge of similarities and differences in gene expression and functional annotation between the species. The aim of this study is to analyze and compare gene expression and functional annotation of healthy human and mouse CPE. METHODS: We performed 44k Agilent microarray hybridizations with RNA derived from laser dissected healthy human and mouse CPE cells. We functionally annotated and compared the gene expression data of human and mouse CPE using the knowledge database Ingenuity. We searched for common and species specific gene expression patterns and function between human and mouse CPE. We also made a comparison with previously published CPE human and mouse gene expression data. RESULTS: Overall, the human and mouse CPE transcriptomes are very similar. Their major functionalities included epithelial junctions, transport, energy production, neuro-endocrine signaling, as well as immunological, neurological and hematological functions and disorders. The mouse CPE presented two additional functions not found in the human CPE: carbohydrate metabolism and a more extensive list of (neural developmental functions. We found three genes specifically expressed in the mouse CPE compared to human CPE, being ACE, PON1 and TRIM3 and no human specifically expressed CPE genes compared to mouse CPE. CONCLUSION: Human and mouse CPE transcriptomes are very similar, and display many common functionalities. Nonetheless, we also identified a few genes and pathways which suggest that the CPE

  10. 75 FR 55479 - National Oil and Hazardous Substances Pollution Contingency Plan; National Priorities List...

    Science.gov (United States)

    2010-09-13

    ... and Hazardous Substances Pollution Contingency Plan; National Priorities List: Partial Deletion of the... . SUPPLEMENTARY INFORMATION: The portion of the site to be deleted from the NPL is the surface media (soil... further actions. List of Subjects in 40 CFR Part 300 Environmental protection, Air pollution control...

  11. Fuzzy Emotional Semantic Analysis and Automated Annotation of Scene Images

    Directory of Open Access Journals (Sweden)

    Jianfang Cao

    2015-01-01

    Full Text Available With the advances in electronic and imaging techniques, the production of digital images has rapidly increased, and the extraction and automated annotation of emotional semantics implied by images have become issues that must be urgently addressed. To better simulate human subjectivity and ambiguity for understanding scene images, the current study proposes an emotional semantic annotation method for scene images based on fuzzy set theory. A fuzzy membership degree was calculated to describe the emotional degree of a scene image and was implemented using the Adaboost algorithm and a back-propagation (BP neural network. The automated annotation method was trained and tested using scene images from the SUN Database. The annotation results were then compared with those based on artificial annotation. Our method showed an annotation accuracy rate of 91.2% for basic emotional values and 82.4% after extended emotional values were added, which correspond to increases of 5.5% and 8.9%, respectively, compared with the results from using a single BP neural network algorithm. Furthermore, the retrieval accuracy rate based on our method reached approximately 89%. This study attempts to lay a solid foundation for the automated emotional semantic annotation of more types of images and therefore is of practical significance.

  12. Multiview Hessian regularization for image annotation.

    Science.gov (United States)

    Liu, Weifeng; Tao, Dacheng

    2013-07-01

    The rapid development of computer hardware and Internet technology makes large scale data dependent models computationally tractable, and opens a bright avenue for annotating images through innovative machine learning algorithms. Semisupervised learning (SSL) therefore received intensive attention in recent years and was successfully deployed in image annotation. One representative work in SSL is Laplacian regularization (LR), which smoothes the conditional distribution for classification along the manifold encoded in the graph Laplacian, however, it is observed that LR biases the classification function toward a constant function that possibly results in poor generalization. In addition, LR is developed to handle uniformly distributed data (or single-view data), although instances or objects, such as images and videos, are usually represented by multiview features, such as color, shape, and texture. In this paper, we present multiview Hessian regularization (mHR) to address the above two problems in LR-based image annotation. In particular, mHR optimally combines multiple HR, each of which is obtained from a particular view of instances, and steers the classification function that varies linearly along the data manifold. We apply mHR to kernel least squares and support vector machines as two examples for image annotation. Extensive experiments on the PASCAL VOC'07 dataset validate the effectiveness of mHR by comparing it with baseline algorithms, including LR and HR.

  13. Ten steps to get started in Genome Assembly and Annotation

    Science.gov (United States)

    Dominguez Del Angel, Victoria; Hjerde, Erik; Sterck, Lieven; Capella-Gutierrez, Salvadors; Notredame, Cederic; Vinnere Pettersson, Olga; Amselem, Joelle; Bouri, Laurent; Bocs, Stephanie; Klopp, Christophe; Gibrat, Jean-Francois; Vlasova, Anna; Leskosek, Brane L.; Soler, Lucile; Binzer-Panchal, Mahesh; Lantz, Henrik

    2018-01-01

    As a part of the ELIXIR-EXCELERATE efforts in capacity building, we present here 10 steps to facilitate researchers getting started in genome assembly and genome annotation. The guidelines given are broadly applicable, intended to be stable over time, and cover all aspects from start to finish of a general assembly and annotation project. Intrinsic properties of genomes are discussed, as is the importance of using high quality DNA. Different sequencing technologies and generally applicable workflows for genome assembly are also detailed. We cover structural and functional annotation and encourage readers to also annotate transposable elements, something that is often omitted from annotation workflows. The importance of data management is stressed, and we give advice on where to submit data and how to make your results Findable, Accessible, Interoperable, and Reusable (FAIR). PMID:29568489

  14. Sharing Map Annotations in Small Groups: X Marks the Spot

    Science.gov (United States)

    Congleton, Ben; Cerretani, Jacqueline; Newman, Mark W.; Ackerman, Mark S.

    Advances in location-sensing technology, coupled with an increasingly pervasive wireless Internet, have made it possible (and increasingly easy) to access and share information with context of one’s geospatial location. We conducted a four-phase study, with 27 students, to explore the practices surrounding the creation, interpretation and sharing of map annotations in specific social contexts. We found that annotation authors consider multiple factors when deciding how to annotate maps, including the perceived utility to the audience and how their contributions will reflect on the image they project to others. Consumers of annotations value the novelty of information, but must be convinced of the author’s credibility. In this paper we describe our study, present the results, and discuss implications for the design of software for sharing map annotations.

  15. Annotation-based feature extraction from sets of SBML models.

    Science.gov (United States)

    Alm, Rebekka; Waltemath, Dagmar; Wolfien, Markus; Wolkenhauer, Olaf; Henkel, Ron

    2015-01-01

    Model repositories such as BioModels Database provide computational models of biological systems for the scientific community. These models contain rich semantic annotations that link model entities to concepts in well-established bio-ontologies such as Gene Ontology. Consequently, thematically similar models are likely to share similar annotations. Based on this assumption, we argue that semantic annotations are a suitable tool to characterize sets of models. These characteristics improve model classification, allow to identify additional features for model retrieval tasks, and enable the comparison of sets of models. In this paper we discuss four methods for annotation-based feature extraction from model sets. We tested all methods on sets of models in SBML format which were composed from BioModels Database. To characterize each of these sets, we analyzed and extracted concepts from three frequently used ontologies, namely Gene Ontology, ChEBI and SBO. We find that three out of the methods are suitable to determine characteristic features for arbitrary sets of models: The selected features vary depending on the underlying model set, and they are also specific to the chosen model set. We show that the identified features map on concepts that are higher up in the hierarchy of the ontologies than the concepts used for model annotations. Our analysis also reveals that the information content of concepts in ontologies and their usage for model annotation do not correlate. Annotation-based feature extraction enables the comparison of model sets, as opposed to existing methods for model-to-keyword comparison, or model-to-model comparison.

  16. Annotation of a hybrid partial genome of the Coffee Rust (Hemileia vastatrix contributes to the gene repertoire catalogue of the Pucciniales

    Directory of Open Access Journals (Sweden)

    Marco Aurelio Cristancho

    2014-10-01

    Full Text Available Coffee leaf rust caused by the fungus Hemileia vastatrix is the most damaging disease to coffee worldwide. The pathogen has recently appeared in multiple outbreaks in coffee producing countries resulting in significant yield losses and increases in costs related to its control. New races/isolates are constantly emerging as evidenced by the presence of the fungus in plants that were previously resistant. Genomic studies are opening new avenues for the study of the evolution of pathogens, the detailed description of plant-pathogen interactions and the development of molecular techniques for the identification of individual isolates. For this purpose we sequenced 8 different H. vastatrix isolates using NGS technologies and gathered partial genome assemblies due to the large repetitive content in the coffee rust hybrid genome; 74.4% of the assembled contigs harbor repetitive sequences. A hybrid assembly of 333Mb was built based on the 8 isolates; this assembly was used for subsequent analyses.Analysis of the conserved gene space showed that the hybrid H. vastatrix genome, though highly fragmented, had a satisfactory level of completion with 91.94% of core protein-coding orthologous genes present. RNA-Seq from urediniospores was used to guide the de novo annotation of the H. vastatrix gene complement. In total, 14,445 genes organized in 3,921 families were uncovered; a considerable proportion of the predicted proteins (73.8% were homologous to other Pucciniales species genomes. Several gene families related to the fungal lifestyle were identified, particularly 483 predicted secreted proteins that represent candidate effector genes and will provide interesting hints to decipher virulence in the coffee rust fungus. The genome sequence of Hva will serve as a template to understand the molecular mechanisms used by this fungus to attack the coffee plant, to study the diversity of this species and for the development of molecular markers to distinguish

  17. Roadmap for annotating transposable elements in eukaryote genomes.

    Science.gov (United States)

    Permal, Emmanuelle; Flutre, Timothée; Quesneville, Hadi

    2012-01-01

    Current high-throughput techniques have made it feasible to sequence even the genomes of non-model organisms. However, the annotation process now represents a bottleneck to genome analysis, especially when dealing with transposable elements (TE). Combined approaches, using both de novo and knowledge-based methods to detect TEs, are likely to produce reasonably comprehensive and sensitive results. This chapter provides a roadmap for researchers involved in genome projects to address this issue. At each step of the TE annotation process, from the identification of TE families to the annotation of TE copies, we outline the tools and good practices to be used.

  18. The Effects of Multimedia Annotations on Iranian EFL Learners’ L2 Vocabulary Learning

    Directory of Open Access Journals (Sweden)

    Saeideh Ahangari

    2010-05-01

    Full Text Available In our modern technological world, Computer-Assisted Language learning (CALL is a new realm towards learning a language in general, and learning L2 vocabulary in particular. It is assumed that the use of multimedia annotations promotes language learners’ vocabulary acquisition. Therefore, this study set out to investigate the effects of different multimedia annotations (still picture annotations, dynamic picture annotations, and written annotations on L2 vocabulary learning. To fulfill this objective, the researchers selected sixty four EFL learners as the participants of this study. The participants were randomly assigned to one of the four groups: a control group that received no annotations and three experimental groups that received:  still picture annotations, dynamic picture annotations, and written annotations. Each participant was required to take a pre-test. A vocabulary post- test was also designed and administered to the participants in order to assess the efficacy of each annotation. First for each group a paired t-test was conducted between their pre and post test scores in order to observe their improvement; then through an ANCOVA test the performance of four groups was compared. The results showed that using multimedia annotations resulted in a significant difference in the participants’ vocabulary learning. Based on the results of the present study, multimedia annotations are suggested as a vocabulary teaching strategy.

  19. Evaluating Functional Annotations of Enzymes Using the Gene Ontology.

    Science.gov (United States)

    Holliday, Gemma L; Davidson, Rebecca; Akiva, Eyal; Babbitt, Patricia C

    2017-01-01

    The Gene Ontology (GO) (Ashburner et al., Nat Genet 25(1):25-29, 2000) is a powerful tool in the informatics arsenal of methods for evaluating annotations in a protein dataset. From identifying the nearest well annotated homologue of a protein of interest to predicting where misannotation has occurred to knowing how confident you can be in the annotations assigned to those proteins is critical. In this chapter we explore what makes an enzyme unique and how we can use GO to infer aspects of protein function based on sequence similarity. These can range from identification of misannotation or other errors in a predicted function to accurate function prediction for an enzyme of entirely unknown function. Although GO annotation applies to any gene products, we focus here a describing our approach for hierarchical classification of enzymes in the Structure-Function Linkage Database (SFLD) (Akiva et al., Nucleic Acids Res 42(Database issue):D521-530, 2014) as a guide for informed utilisation of annotation transfer based on GO terms.

  20. AutoFACT: An Automatic Functional Annotation and Classification Tool

    Directory of Open Access Journals (Sweden)

    Lang B Franz

    2005-06-01

    Full Text Available Abstract Background Assignment of function to new molecular sequence data is an essential step in genomics projects. The usual process involves similarity searches of a given sequence against one or more databases, an arduous process for large datasets. Results We present AutoFACT, a fully automated and customizable annotation tool that assigns biologically informative functions to a sequence. Key features of this tool are that it (1 analyzes nucleotide and protein sequence data; (2 determines the most informative functional description by combining multiple BLAST reports from several user-selected databases; (3 assigns putative metabolic pathways, functional classes, enzyme classes, GeneOntology terms and locus names; and (4 generates output in HTML, text and GFF formats for the user's convenience. We have compared AutoFACT to four well-established annotation pipelines. The error rate of functional annotation is estimated to be only between 1–2%. Comparison of AutoFACT to the traditional top-BLAST-hit annotation method shows that our procedure increases the number of functionally informative annotations by approximately 50%. Conclusion AutoFACT will serve as a useful annotation tool for smaller sequencing groups lacking dedicated bioinformatics staff. It is implemented in PERL and runs on LINUX/UNIX platforms. AutoFACT is available at http://megasun.bch.umontreal.ca/Software/AutoFACT.htm.

  1. TAM: a method for enrichment and depletion analysis of a microRNA category in a list of microRNAs.

    Science.gov (United States)

    Lu, Ming; Shi, Bing; Wang, Juan; Cao, Qun; Cui, Qinghua

    2010-08-09

    MicroRNAs (miRNAs) are a class of important gene regulators. The number of identified miRNAs has been increasing dramatically in recent years. An emerging major challenge is the interpretation of the genome-scale miRNA datasets, including those derived from microarray and deep-sequencing. It is interesting and important to know the common rules or patterns behind a list of miRNAs, (i.e. the deregulated miRNAs resulted from an experiment of miRNA microarray or deep-sequencing). For the above purpose, this study presents a method and develops a tool (TAM) for annotations of meaningful human miRNAs categories. We first integrated miRNAs into various meaningful categories according to prior knowledge, such as miRNA family, miRNA cluster, miRNA function, miRNA associated diseases, and tissue specificity. Using TAM, given lists of miRNAs can be rapidly annotated and summarized according to the integrated miRNA categorical data. Moreover, given a list of miRNAs, TAM can be used to predict novel related miRNAs. Finally, we confirmed the usefulness and reliability of TAM by applying it to deregulated miRNAs in acute myocardial infarction (AMI) from two independent experiments. TAM can efficiently identify meaningful categories for given miRNAs. In addition, TAM can be used to identify novel miRNA biomarkers. TAM tool, source codes, and miRNA category data are freely available at http://cmbi.bjmu.edu.cn/tam.

  2. LeARN: a platform for detecting, clustering and annotating non-coding RNAs

    Directory of Open Access Journals (Sweden)

    Schiex Thomas

    2008-01-01

    Full Text Available Abstract Background In the last decade, sequencing projects have led to the development of a number of annotation systems dedicated to the structural and functional annotation of protein-coding genes. These annotation systems manage the annotation of the non-protein coding genes (ncRNAs in a very crude way, allowing neither the edition of the secondary structures nor the clustering of ncRNA genes into families which are crucial for appropriate annotation of these molecules. Results LeARN is a flexible software package which handles the complete process of ncRNA annotation by integrating the layers of automatic detection and human curation. Conclusion This software provides the infrastructure to deal properly with ncRNAs in the framework of any annotation project. It fills the gap between existing prediction software, that detect independent ncRNA occurrences, and public ncRNA repositories, that do not offer the flexibility and interactivity required for annotation projects. The software is freely available from the download section of the website http://bioinfo.genopole-toulouse.prd.fr/LeARN

  3. Essential Annotation Schema for Ecology (EASE)—A framework supporting the efficient data annotation and faceted navigation in ecology

    Science.gov (United States)

    Eichenberg, David; Liebergesell, Mario; König-Ries, Birgitta; Wirth, Christian

    2017-01-01

    Ecology has become a data intensive science over the last decades which often relies on the reuse of data in cross-experimental analyses. However, finding data which qualifies for the reuse in a specific context can be challenging. It requires good quality metadata and annotations as well as efficient search strategies. To date, full text search (often on the metadata only) is the most widely used search strategy although it is known to be inaccurate. Faceted navigation is providing a filter mechanism which is based on fine granular metadata, categorizing search objects along numeric and categorical parameters relevant for their discovery. Selecting from these parameters during a full text search creates a system of filters which allows to refine and improve the results towards more relevance. We developed a framework for the efficient annotation and faceted navigation in ecology. It consists of an XML schema for storing the annotation of search objects and is accompanied by a vocabulary focused on ecology to support the annotation process. The framework consolidates ideas which originate from widely accepted metadata standards, textbooks, scientific literature, and vocabularies as well as from expert knowledge contributed by researchers from ecology and adjacent disciplines. PMID:29023519

  4. MicroScope: a platform for microbial genome annotation and comparative genomics.

    Science.gov (United States)

    Vallenet, D; Engelen, S; Mornico, D; Cruveiller, S; Fleury, L; Lajus, A; Rouy, Z; Roche, D; Salvignol, G; Scarpelli, C; Médigue, C

    2009-01-01

    The initial outcome of genome sequencing is the creation of long text strings written in a four letter alphabet. The role of in silico sequence analysis is to assist biologists in the act of associating biological knowledge with these sequences, allowing investigators to make inferences and predictions that can be tested experimentally. A wide variety of software is available to the scientific community, and can be used to identify genomic objects, before predicting their biological functions. However, only a limited number of biologically interesting features can be revealed from an isolated sequence. Comparative genomics tools, on the other hand, by bringing together the information contained in numerous genomes simultaneously, allow annotators to make inferences based on the idea that evolution and natural selection are central to the definition of all biological processes. We have developed the MicroScope platform in order to offer a web-based framework for the systematic and efficient revision of microbial genome annotation and comparative analysis (http://www.genoscope.cns.fr/agc/microscope). Starting with the description of the flow chart of the annotation processes implemented in the MicroScope pipeline, and the development of traditional and novel microbial annotation and comparative analysis tools, this article emphasizes the essential role of expert annotation as a complement of automatic annotation. Several examples illustrate the use of implemented tools for the review and curation of annotations of both new and publicly available microbial genomes within MicroScope's rich integrated genome framework. The platform is used as a viewer in order to browse updated annotation information of available microbial genomes (more than 440 organisms to date), and in the context of new annotation projects (117 bacterial genomes). The human expertise gathered in the MicroScope database (about 280,000 independent annotations) contributes to improve the quality of

  5. COGNATE: comparative gene annotation characterizer.

    Science.gov (United States)

    Wilbrandt, Jeanne; Misof, Bernhard; Niehuis, Oliver

    2017-07-17

    The comparison of gene and genome structures across species has the potential to reveal major trends of genome evolution. However, such a comparative approach is currently hampered by a lack of standardization (e.g., Elliott TA, Gregory TR, Philos Trans Royal Soc B: Biol Sci 370:20140331, 2015). For example, testing the hypothesis that the total amount of coding sequences is a reliable measure of potential proteome diversity (Wang M, Kurland CG, Caetano-Anollés G, PNAS 108:11954, 2011) requires the application of standardized definitions of coding sequence and genes to create both comparable and comprehensive data sets and corresponding summary statistics. However, such standard definitions either do not exist or are not consistently applied. These circumstances call for a standard at the descriptive level using a minimum of parameters as well as an undeviating use of standardized terms, and for software that infers the required data under these strict definitions. The acquisition of a comprehensive, descriptive, and standardized set of parameters and summary statistics for genome publications and further analyses can thus greatly benefit from the availability of an easy to use standard tool. We developed a new open-source command-line tool, COGNATE (Comparative Gene Annotation Characterizer), which uses a given genome assembly and its annotation of protein-coding genes for a detailed description of the respective gene and genome structure parameters. Additionally, we revised the standard definitions of gene and genome structures and provide the definitions used by COGNATE as a working draft suggestion for further reference. Complete parameter lists and summary statistics are inferred using this set of definitions to allow down-stream analyses and to provide an overview of the genome and gene repertoire characteristics. COGNATE is written in Perl and freely available at the ZFMK homepage ( https://www.zfmk.de/en/COGNATE ) and on github ( https

  6. Radioactive occurrences in veins and igneous and metamorphic rocks of New Mexico with annotated bibliography

    International Nuclear Information System (INIS)

    McLemore, V.T.

    1982-02-01

    The primary objectives of this report are to list known radioactive occurrences in veins and igneous and metamorphic rocks in New Mexico, and to provide an annotated bibliography of geologic reports concerning these regions. Only plutonic, metamorphic, vein, and Precambrian quartz-pebble conglomerate uranium deposits are considered in this report; other nonsandstone uranium deposits (such as shale, limestone, phosphorite, coal, evaporative precipitates, and fossil placer deposits) will be considered at a later time. These objectives were achieved through a literature search. Some field examinations of some of the radioactive occurrences have been completed. A table of known radioactive occurrences in veins and igneous and metamorphic rocks was compiled from the literature (Appendix I)

  7. Automating Ontological Annotation with WordNet

    Energy Technology Data Exchange (ETDEWEB)

    Sanfilippo, Antonio P.; Tratz, Stephen C.; Gregory, Michelle L.; Chappell, Alan R.; Whitney, Paul D.; Posse, Christian; Paulson, Patrick R.; Baddeley, Bob L.; Hohimer, Ryan E.; White, Amanda M.

    2006-01-22

    Semantic Web applications require robust and accurate annotation tools that are capable of automating the assignment of ontological classes to words in naturally occurring text (ontological annotation). Most current ontologies do not include rich lexical databases and are therefore not easily integrated with word sense disambiguation algorithms that are needed to automate ontological annotation. WordNet provides a potentially ideal solution to this problem as it offers a highly structured lexical conceptual representation that has been extensively used to develop word sense disambiguation algorithms. However, WordNet has not been designed as an ontology, and while it can be easily turned into one, the result of doing this would present users with serious practical limitations due to the great number of concepts (synonym sets) it contains. Moreover, mapping WordNet to an existing ontology may be difficult and requires substantial labor. We propose to overcome these limitations by developing an analytical platform that (1) provides a WordNet-based ontology offering a manageable and yet comprehensive set of concept classes, (2) leverages the lexical richness of WordNet to give an extensive characterization of concept class in terms of lexical instances, and (3) integrates a class recognition algorithm that automates the assignment of concept classes to words in naturally occurring text. The ensuing framework makes available an ontological annotation platform that can be effectively integrated with intelligence analysis systems to facilitate evidence marshaling and sustain the creation and validation of inference models.

  8. ONEMercury: Towards Automatic Annotation of Earth Science Metadata

    Science.gov (United States)

    Tuarob, S.; Pouchard, L. C.; Noy, N.; Horsburgh, J. S.; Palanisamy, G.

    2012-12-01

    Earth sciences have become more data-intensive, requiring access to heterogeneous data collected from multiple places, times, and thematic scales. For example, research on climate change may involve exploring and analyzing observational data such as the migration of animals and temperature shifts across the earth, as well as various model-observation inter-comparison studies. Recently, DataONE, a federated data network built to facilitate access to and preservation of environmental and ecological data, has come to exist. ONEMercury has recently been implemented as part of the DataONE project to serve as a portal for discovering and accessing environmental and observational data across the globe. ONEMercury harvests metadata from the data hosted by multiple data repositories and makes it searchable via a common search interface built upon cutting edge search engine technology, allowing users to interact with the system, intelligently filter the search results on the fly, and fetch the data from distributed data sources. Linking data from heterogeneous sources always has a cost. A problem that ONEMercury faces is the different levels of annotation in the harvested metadata records. Poorly annotated records tend to be missed during the search process as they lack meaningful keywords. Furthermore, such records would not be compatible with the advanced search functionality offered by ONEMercury as the interface requires a metadata record be semantically annotated. The explosion of the number of metadata records harvested from an increasing number of data repositories makes it impossible to annotate the harvested records manually, urging the need for a tool capable of automatically annotating poorly curated metadata records. In this paper, we propose a topic-model (TM) based approach for automatic metadata annotation. Our approach mines topics in the set of well annotated records and suggests keywords for poorly annotated records based on topic similarity. We utilize the

  9. A Selected Annotated Bibliography on Work Time Options.

    Science.gov (United States)

    Ivantcho, Barbara

    This annotated bibliography is divided into three sections. Section I contains annotations of general publications on work time options. Section II presents resources on flexitime and the compressed work week. In Section III are found resources related to these reduced work time options: permanent part-time employment, job sharing, voluntary…

  10. 76 FR 70057 - National Oil and Hazardous Substance Pollution Contingency Plan; National Priorities List...

    Science.gov (United States)

    2011-11-10

    ... and Hazardous Substance Pollution Contingency Plan; National Priorities List: Partial Deletion of the...). Refer to Figures 1 to 3 in the deletion docket to view the location of the two parcels being proposed... Substances Pollution Contingency Plan (NCP). This direct final partial deletion is being published by EPA...

  11. Prepare-Participate-Connect: Active Learning with Video Annotation

    Science.gov (United States)

    Colasante, Meg; Douglas, Kathy

    2016-01-01

    Annotation of video provides students with the opportunity to view and engage with audiovisual content in an interactive and participatory way rather than in passive-receptive mode. This article discusses research into the use of video annotation in four vocational programs at RMIT University in Melbourne, which allowed students to interact with…

  12. Developing Annotation Solutions for Online Data Driven Learning

    Science.gov (United States)

    Perez-Paredes, Pascual; Alcaraz-Calero, Jose M.

    2009-01-01

    Although "annotation" is a widely-researched topic in Corpus Linguistics (CL), its potential role in Data Driven Learning (DDL) has not been addressed in depth by Foreign Language Teaching (FLT) practitioners. Furthermore, most of the research in the use of DDL methods pays little attention to annotation in the design and implementation…

  13. Re-annotation and re-analysis of the Campylobacter jejuni NCTC11168 genome sequence

    Directory of Open Access Journals (Sweden)

    Dorrell Nick

    2007-06-01

    Full Text Available Abstract Background Campylobacter jejuni is the leading bacterial cause of human gastroenteritis in the developed world. To improve our understanding of this important human pathogen, the C. jejuni NCTC11168 genome was sequenced and published in 2000. The original annotation was a milestone in Campylobacter research, but is outdated. We now describe the complete re-annotation and re-analysis of the C. jejuni NCTC11168 genome using current database information, novel tools and annotation techniques not used during the original annotation. Results Re-annotation was carried out using sequence database searches such as FASTA, along with programs such as TMHMM for additional support. The re-annotation also utilises sequence data from additional Campylobacter strains and species not available during the original annotation. Re-annotation was accompanied by a full literature search that was incorporated into the updated EMBL file [EMBL: AL111168]. The C. jejuni NCTC11168 re-annotation reduced the total number of coding sequences from 1654 to 1643, of which 90.0% have additional information regarding the identification of new motifs and/or relevant literature. Re-annotation has led to 18.2% of coding sequence product functions being revised. Conclusions Major updates were made to genes involved in the biosynthesis of important surface structures such as lipooligosaccharide, capsule and both O- and N-linked glycosylation. This re-annotation will be a key resource for Campylobacter research and will also provide a prototype for the re-annotation and re-interpretation of other bacterial genomes.

  14. Comparison of concept recognizers for building the Open Biomedical Annotator

    Directory of Open Access Journals (Sweden)

    Rubin Daniel

    2009-09-01

    Full Text Available Abstract The National Center for Biomedical Ontology (NCBO is developing a system for automated, ontology-based access to online biomedical resources (Shah NH, et al.: Ontology-driven indexing of public datasets for translational bioinformatics. BMC Bioinformatics 2009, 10(Suppl 2:S1. The system's indexing workflow processes the text metadata of diverse resources such as datasets from GEO and ArrayExpress to annotate and index them with concepts from appropriate ontologies. This indexing requires the use of a concept-recognition tool to identify ontology concepts in the resource's textual metadata. In this paper, we present a comparison of two concept recognizers – NLM's MetaMap and the University of Michigan's Mgrep. We utilize a number of data sources and dictionaries to evaluate the concept recognizers in terms of precision, recall, speed of execution, scalability and customizability. Our evaluations demonstrate that Mgrep has a clear edge over MetaMap for large-scale service oriented applications. Based on our analysis we also suggest areas of potential improvements for Mgrep. We have subsequently used Mgrep to build the Open Biomedical Annotator service. The Annotator service has access to a large dictionary of biomedical terms derived from the United Medical Language System (UMLS and NCBO ontologies. The Annotator also leverages the hierarchical structure of the ontologies and their mappings to expand annotations. The Annotator service is available to the community as a REST Web service for creating ontology-based annotations of their data.

  15. Effects of Reviewing Annotations and Homework Solutions on Math Learning Achievement

    Science.gov (United States)

    Hwang, Wu-Yuin; Chen, Nian-Shing; Shadiev, Rustam; Li, Jin-Sing

    2011-01-01

    Previous studies have demonstrated that making annotations can be a meaningful and useful learning method that promote metacognition and enhance learning achievement. A web-based annotation system, Virtual Pen (VPEN), which provides for the creation and review of annotations and homework solutions, has been developed to foster learning process…

  16. 75 FR 54821 - National Oil and Hazardous Substance Pollution Contingency Plan; National Priorities List; Intent...

    Science.gov (United States)

    2010-09-09

    ... and Hazardous Substance Pollution Contingency Plan; National Priorities List; Intent for Partial... amended, is an Appendix of the National Oil and Hazardous Substances Pollution Contingency Plan (NCP). The... Superfund Site without prior Notice of Intent for Partial Deletion because EPA views this as a...

  17. Automatic Compound Annotation from Mass Spectrometry Data Using MAGMa.

    Science.gov (United States)

    Ridder, Lars; van der Hooft, Justin J J; Verhoeven, Stefan

    2014-01-01

    The MAGMa software for automatic annotation of mass spectrometry based fragmentation data was applied to 16 MS/MS datasets of the CASMI 2013 contest. Eight solutions were submitted in category 1 (molecular formula assignments) and twelve in category 2 (molecular structure assignment). The MS/MS peaks of each challenge were matched with in silico generated substructures of candidate molecules from PubChem, resulting in penalty scores that were used for candidate ranking. In 6 of the 12 submitted solutions in category 2, the correct chemical structure obtained the best score, whereas 3 molecules were ranked outside the top 5. All top ranked molecular formulas submitted in category 1 were correct. In addition, we present MAGMa results generated retrospectively for the remaining challenges. Successful application of the MAGMa algorithm required inclusion of the relevant candidate molecules, application of the appropriate mass tolerance and a sufficient degree of in silico fragmentation of the candidate molecules. Furthermore, the effect of the exhaustiveness of the candidate lists and limitations of substructure based scoring are discussed.

  18. A Set of Annotation Interfaces for Alignment of Parallel Corpora

    Directory of Open Access Journals (Sweden)

    Singh Anil Kumar

    2014-09-01

    Full Text Available Annotation interfaces for parallel corpora which fit in well with other tools can be very useful. We describe a set of annotation interfaces which fulfill this criterion. This set includes a sentence alignment interface, two different word or word group alignment interfaces and an initial version of a parallel syntactic annotation alignment interface. These tools can be used for manual alignment, or they can be used to correct automatic alignments. Manual alignment can be performed in combination with certain kinds of linguistic annotation. Most of these interfaces use a representation called the Shakti Standard Format that has been found to be very robust and has been used for large and successful projects. It ties together the different interfaces, so that the data created by them is portable across all tools which support this representation. The existence of a query language for data stored in this representation makes it possible to build tools that allow easy search and modification of annotated parallel data.

  19. LocusTrack: Integrated visualization of GWAS results and genomic annotation.

    Science.gov (United States)

    Cuellar-Partida, Gabriel; Renteria, Miguel E; MacGregor, Stuart

    2015-01-01

    Genome-wide association studies (GWAS) are an important tool for the mapping of complex traits and diseases. Visual inspection of genomic annotations may be used to generate insights into the biological mechanisms underlying GWAS-identified loci. We developed LocusTrack, a web-based application that annotates and creates plots of regional GWAS results and incorporates user-specified tracks that display annotations such as linkage disequilibrium (LD), phylogenetic conservation, chromatin state, and other genomic and regulatory elements. Currently, LocusTrack can integrate annotation tracks from the UCSC genome-browser as well as from any tracks provided by the user. LocusTrack is an easy-to-use application and can be accessed at the following URL: http://gump.qimr.edu.au/general/gabrieC/LocusTrack/. Users can upload and manage GWAS results and select from and/or provide annotation tracks using simple and intuitive menus. LocusTrack scripts and associated data can be downloaded from the website and run locally.

  20. "Annotated Lectures": Student-Instructor Interaction in Large-Scale Global Education

    Directory of Open Access Journals (Sweden)

    Roger Diehl

    2009-10-01

    Full Text Available We describe an "Annotated Lectures" system, which will be used in a global virtual teaching and student collaboration event on embodied intelligence presented by the University of Zurich. The lectures will be broadcasted via video-conference to lecture halls of different universities around the globe. Among other collaboration features, an "Annotated Lectures" system will be implemented in a 3D collaborative virtual environment and used by the participating students to make annotations to the video-recorded lectures, which will be sent to and answered by their supervisors, and forwarded to the lecturers in an aggregated way. The "Annotated Lectures" system aims to overcome the issues of limited studentinstructor interaction in large-scale education, and to foster an intercultural and multidisciplinary discourse among students who review the lectures in a group. After presenting the concept of the "Annotated Lectures" system, we discuss a prototype version including a description of the technical components and its expected benefit for large-scale global education.

  1. Annotation an effective device for student feedback: a critical review of the literature.

    Science.gov (United States)

    Ball, Elaine C

    2010-05-01

    The paper examines hand-written annotation, its many features, difficulties and strengths as a feedback tool. It extends and clarifies what modest evidence is in the public domain and offers an evaluation of how to use annotation effectively in the support of student feedback [Marshall, C.M., 1998a. The Future of Annotation in a Digital (paper) World. Presented at the 35th Annual GLSLIS Clinic: Successes and Failures of Digital Libraries, June 20-24, University of Illinois at Urbana-Champaign, March 24, pp. 1-20; Marshall, C.M., 1998b. Toward an ecology of hypertext annotation. Hypertext. In: Proceedings of the Ninth ACM Conference on Hypertext and Hypermedia, June 20-24, Pittsburgh Pennsylvania, US, pp. 40-49; Wolfe, J.L., Nuewirth, C.M., 2001. From the margins to the centre: the future of annotation. Journal of Business and Technical Communication, 15(3), 333-371; Diyanni, R., 2002. One Hundred Great Essays. Addison-Wesley, New York; Wolfe, J.L., 2002. Marginal pedagogy: how annotated texts affect writing-from-source texts. Written Communication, 19(2), 297-333; Liu, K., 2006. Annotation as an index to critical writing. Urban Education, 41, 192-207; Feito, A., Donahue, P., 2008. Minding the gap annotation as preparation for discussion. Arts and Humanities in Higher Education, 7(3), 295-307; Ball, E., 2009. A participatory action research study on handwritten annotation feedback and its impact on staff and students. Systemic Practice and Action Research, 22(2), 111-124; Ball, E., Franks, H., McGrath, M., Leigh, J., 2009. Annotation is a valuable tool to enhance learning and assessment in student essays. Nurse Education Today, 29(3), 284-291]. Although a significant number of studies examine annotation, this is largely related to on-line tools and computer mediated communication and not hand-written annotation as comment, phrase or sign written on the student essay to provide critique. Little systematic research has been conducted to consider how this latter form

  2. BioCause: Annotating and analysing causality in the biomedical domain.

    Science.gov (United States)

    Mihăilă, Claudiu; Ohta, Tomoko; Pyysalo, Sampo; Ananiadou, Sophia

    2013-01-16

    Biomedical corpora annotated with event-level information represent an important resource for domain-specific information extraction (IE) systems. However, bio-event annotation alone cannot cater for all the needs of biologists. Unlike work on relation and event extraction, most of which focusses on specific events and named entities, we aim to build a comprehensive resource, covering all statements of causal association present in discourse. Causality lies at the heart of biomedical knowledge, such as diagnosis, pathology or systems biology, and, thus, automatic causality recognition can greatly reduce the human workload by suggesting possible causal connections and aiding in the curation of pathway models. A biomedical text corpus annotated with such relations is, hence, crucial for developing and evaluating biomedical text mining. We have defined an annotation scheme for enriching biomedical domain corpora with causality relations. This schema has subsequently been used to annotate 851 causal relations to form BioCause, a collection of 19 open-access full-text biomedical journal articles belonging to the subdomain of infectious diseases. These documents have been pre-annotated with named entity and event information in the context of previous shared tasks. We report an inter-annotator agreement rate of over 60% for triggers and of over 80% for arguments using an exact match constraint. These increase significantly using a relaxed match setting. Moreover, we analyse and describe the causality relations in BioCause from various points of view. This information can then be leveraged for the training of automatic causality detection systems. Augmenting named entity and event annotations with information about causal discourse relations could benefit the development of more sophisticated IE systems. These will further influence the development of multiple tasks, such as enabling textual inference to detect entailments, discovering new facts and providing new

  3. Automatic Function Annotations for Hoare Logic

    Directory of Open Access Journals (Sweden)

    Daniel Matichuk

    2012-11-01

    Full Text Available In systems verification we are often concerned with multiple, inter-dependent properties that a program must satisfy. To prove that a program satisfies a given property, the correctness of intermediate states of the program must be characterized. However, this intermediate reasoning is not always phrased such that it can be easily re-used in the proofs of subsequent properties. We introduce a function annotation logic that extends Hoare logic in two important ways: (1 when proving that a function satisfies a Hoare triple, intermediate reasoning is automatically stored as function annotations, and (2 these function annotations can be exploited in future Hoare logic proofs. This reduces duplication of reasoning between the proofs of different properties, whilst serving as a drop-in replacement for traditional Hoare logic to avoid the costly process of proof refactoring. We explain how this was implemented in Isabelle/HOL and applied to an experimental branch of the seL4 microkernel to significantly reduce the size and complexity of existing proofs.

  4. Automatically Annotated Mapping for Indoor Mobile Robot Applications

    DEFF Research Database (Denmark)

    Özkil, Ali Gürcan; Howard, Thomas J.

    2012-01-01

    This paper presents a new and practical method for mapping and annotating indoor environments for mobile robot use. The method makes use of 2D occupancy grid maps for metric representation, and topology maps to indicate the connectivity of the ‘places-of-interests’ in the environment. Novel use...... localization and mapping in topology space, and fuses camera and robot pose estimations to build an automatically annotated global topo-metric map. It is developed as a framework for a hospital service robot and tested in a real hospital. Experiments show that the method is capable of producing globally...... consistent, automatically annotated hybrid metric-topological maps that is needed by mobile service robots....

  5. Annotated checklist of the recent and extinct pythons (Serpentes, Pythonidae), with notes on nomenclature, taxonomy, and distribution

    Science.gov (United States)

    Schleip, Wulf D.; O’Shea, Mark

    2010-01-01

    Abstract McDiarmid et al. (1999) published the first part of their planned taxonomic catalog of the snakes of the world. Since then, several new python taxa have been described in both the scientific literature and non-peer-reviewed publications. This checklist evaluates the nomenclatural status of the names and discusses the taxonomic status of the new taxa, and aims to continue the work of McDiarmid et al. (1999) for the family Pythonidae, covering the period 1999 to 2010. Numerous new taxa are listed, and where appropriate recent synonymies are included and annotations are made. A checklist and a taxonomic identification key of valid taxa are provided. PMID:21594030

  6. AGORA : Organellar genome annotation from the amino acid and nucleotide references.

    Science.gov (United States)

    Jung, Jaehee; Kim, Jong Im; Jeong, Young-Sik; Yi, Gangman

    2018-03-29

    Next-generation sequencing (NGS) technologies have led to the accumulation of highthroughput sequence data from various organisms in biology. To apply gene annotation of organellar genomes for various organisms, more optimized tools for functional gene annotation are required. Almost all gene annotation tools are mainly focused on the chloroplast genome of land plants or the mitochondrial genome of animals.We have developed a web application AGORA for the fast, user-friendly, and improved annotations of organellar genomes. AGORA annotates genes based on a BLAST-based homology search and clustering with selected reference sequences from the NCBI database or user-defined uploaded data. AGORA can annotate the functional genes in almost all mitochondrion and plastid genomes of eukaryotes. The gene annotation of a genome with an exon-intron structure within a gene or inverted repeat region is also available. It provides information of start and end positions of each gene, BLAST results compared with the reference sequence, and visualization of gene map by OGDRAW. Users can freely use the software, and the accessible URL is https://bigdata.dongguk.edu/gene_project/AGORA/.The main module of the tool is implemented by the python and php, and the web page is built by the HTML and CSS to support all browsers. gangman@dongguk.edu.

  7. An annotated corpus with nanomedicine and pharmacokinetic parameters

    Directory of Open Access Journals (Sweden)

    Lewinski NA

    2017-10-01

    Full Text Available Nastassja A Lewinski,1 Ivan Jimenez,1 Bridget T McInnes2 1Department of Chemical and Life Science Engineering, Virginia Commonwealth University, Richmond, VA, 2Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA Abstract: A vast amount of data on nanomedicines is being generated and published, and natural language processing (NLP approaches can automate the extraction of unstructured text-based data. Annotated corpora are a key resource for NLP and information extraction methods which employ machine learning. Although corpora are available for pharmaceuticals, resources for nanomedicines and nanotechnology are still limited. To foster nanotechnology text mining (NanoNLP efforts, we have constructed a corpus of annotated drug product inserts taken from the US Food and Drug Administration’s Drugs@FDA online database. In this work, we present the development of the Engineered Nanomedicine Database corpus to support the evaluation of nanomedicine entity extraction. The data were manually annotated for 21 entity mentions consisting of nanomedicine physicochemical characterization, exposure, and biologic response information of 41 Food and Drug Administration-approved nanomedicines. We evaluate the reliability of the manual annotations and demonstrate the use of the corpus by evaluating two state-of-the-art named entity extraction systems, OpenNLP and Stanford NER. The annotated corpus is available open source and, based on these results, guidelines and suggestions for future development of additional nanomedicine corpora are provided. Keywords: nanotechnology, informatics, natural language processing, text mining, corpora

  8. Annotated Tsunami bibliography: 1962-1976

    International Nuclear Information System (INIS)

    Pararas-Carayannis, G.; Dong, B.; Farmer, R.

    1982-08-01

    This compilation contains annotated citations to nearly 3000 tsunami-related publications from 1962 to 1976 in English and several other languages. The foreign-language citations have English titles and abstracts

  9. Assessment of community-submitted ontology annotations from a novel database-journal partnership.

    Science.gov (United States)

    Berardini, Tanya Z; Li, Donghui; Muller, Robert; Chetty, Raymond; Ploetz, Larry; Singh, Shanker; Wensel, April; Huala, Eva

    2012-01-01

    As the scientific literature grows, leading to an increasing volume of published experimental data, so does the need to access and analyze this data using computational tools. The most commonly used method to convert published experimental data on gene function into controlled vocabulary annotations relies on a professional curator, employed by a model organism database or a more general resource such as UniProt, to read published articles and compose annotation statements based on the articles' contents. A more cost-effective and scalable approach capable of capturing gene function data across the whole range of biological research organisms in computable form is urgently needed. We have analyzed a set of ontology annotations generated through collaborations between the Arabidopsis Information Resource and several plant science journals. Analysis of the submissions entered using the online submission tool shows that most community annotations were well supported and the ontology terms chosen were at an appropriate level of specificity. Of the 503 individual annotations that were submitted, 97% were approved and community submissions captured 72% of all possible annotations. This new method for capturing experimental results in a computable form provides a cost-effective way to greatly increase the available body of annotations without sacrificing annotation quality. Database URL: www.arabidopsis.org.

  10. Annotation-Based Whole Genomic Prediction and Selection

    DEFF Research Database (Denmark)

    Kadarmideen, Haja; Do, Duy Ngoc; Janss, Luc

    Genomic selection is widely used in both animal and plant species, however, it is performed with no input from known genomic or biological role of genetic variants and therefore is a black box approach in a genomic era. This study investigated the role of different genomic regions and detected QTLs...... in their contribution to estimated genomic variances and in prediction of genomic breeding values by applying SNP annotation approaches to feed efficiency. Ensembl Variant Predictor (EVP) and Pig QTL database were used as the source of genomic annotation for 60K chip. Genomic prediction was performed using the Bayes...... classes. Predictive accuracy was 0.531, 0.532, 0.302, and 0.344 for DFI, RFI, ADG and BF, respectively. The contribution per SNP to total genomic variance was similar among annotated classes across different traits. Predictive performance of SNP classes did not significantly differ from randomized SNP...

  11. Creating New Medical Ontologies for Image Annotation A Case Study

    CERN Document Server

    Stanescu, Liana; Brezovan, Marius; Mihai, Cristian Gabriel

    2012-01-01

    Creating New Medical Ontologies for Image Annotation focuses on the problem of the medical images automatic annotation process, which is solved in an original manner by the authors. All the steps of this process are described in detail with algorithms, experiments and results. The original algorithms proposed by authors are compared with other efficient similar algorithms. In addition, the authors treat the problem of creating ontologies in an automatic way, starting from Medical Subject Headings (MESH). They have presented some efficient and relevant annotation models and also the basics of the annotation model used by the proposed system: Cross Media Relevance Models. Based on a text query the system will retrieve the images that contain objects described by the keywords.

  12. Elucidating high-dimensional cancer hallmark annotation via enriched ontology.

    Science.gov (United States)

    Yan, Shankai; Wong, Ka-Chun

    2017-09-01

    Cancer hallmark annotation is a promising technique that could discover novel knowledge about cancer from the biomedical literature. The automated annotation of cancer hallmarks could reveal relevant cancer transformation processes in the literature or extract the articles that correspond to the cancer hallmark of interest. It acts as a complementary approach that can retrieve knowledge from massive text information, advancing numerous focused studies in cancer research. Nonetheless, the high-dimensional nature of cancer hallmark annotation imposes a unique challenge. To address the curse of dimensionality, we compared multiple cancer hallmark annotation methods on 1580 PubMed abstracts. Based on the insights, a novel approach, UDT-RF, which makes use of ontological features is proposed. It expands the feature space via the Medical Subject Headings (MeSH) ontology graph and utilizes novel feature selections for elucidating the high-dimensional cancer hallmark annotation space. To demonstrate its effectiveness, state-of-the-art methods are compared and evaluated by a multitude of performance metrics, revealing the full performance spectrum on the full set of cancer hallmarks. Several case studies are conducted, demonstrating how the proposed approach could reveal novel insights into cancers. https://github.com/cskyan/chmannot. Copyright © 2017 Elsevier Inc. All rights reserved.

  13. Rfam: annotating families of non-coding RNA sequences.

    Science.gov (United States)

    Daub, Jennifer; Eberhardt, Ruth Y; Tate, John G; Burge, Sarah W

    2015-01-01

    The primary task of the Rfam database is to collate experimentally validated noncoding RNA (ncRNA) sequences from the published literature and facilitate the prediction and annotation of new homologues in novel nucleotide sequences. We group homologous ncRNA sequences into "families" and related families are further grouped into "clans." We collate and manually curate data cross-references for these families from other databases and external resources. Our Web site offers researchers a simple interface to Rfam and provides tools with which to annotate their own sequences using our covariance models (CMs), through our tools for searching, browsing, and downloading information on Rfam families. In this chapter, we will work through examples of annotating a query sequence, collating family information, and searching for data.

  14. Intra-species sequence comparisons for annotating genomes

    Energy Technology Data Exchange (ETDEWEB)

    Boffelli, Dario; Weer, Claire V.; Weng, Li; Lewis, Keith D.; Shoukry, Malak I.; Pachter, Lior; Keys, David N.; Rubin, Edward M.

    2004-07-15

    Analysis of sequence variation among members of a single species offers a potential approach to identify functional DNA elements responsible for biological features unique to that species. Due to its high rate of allelic polymorphism and ease of genetic manipulability, we chose the sea squirt, Ciona intestinalis, to explore intra-species sequence comparisons for genome annotation. A large number of C. intestinalis specimens were collected from four continents and a set of genomic intervals amplified, resequenced and analyzed to determine the mutation rates at each nucleotide in the sequence. We found that regions with low mutation rates efficiently demarcated functionally constrained sequences: these include a set of noncoding elements, which we showed in C intestinalis transgenic assays to act as tissue-specific enhancers, as well as the location of coding sequences. This illustrates that comparisons of multiple members of a species can be used for genome annotation, suggesting a path for the annotation of the sequenced genomes of organisms occupying uncharacterized phylogenetic branches of the animal kingdom and raises the possibility that the resequencing of a large number of Homo sapiens individuals might be used to annotate the human genome and identify sequences defining traits unique to our species. The sequence data from this study has been submitted to GenBank under accession nos. AY667278-AY667407.

  15. Experimental-confirmation and functional-annotation of predicted proteins in the chicken genome

    Directory of Open Access Journals (Sweden)

    McCarthy Fiona M

    2007-11-01

    Full Text Available Abstract Background The chicken genome was sequenced because of its phylogenetic position as a non-mammalian vertebrate, its use as a biomedical model especially to study embryology and development, its role as a source of human disease organisms and its importance as the major source of animal derived food protein. However, genomic sequence data is, in itself, of limited value; generally it is not equivalent to understanding biological function. The benefit of having a genome sequence is that it provides a basis for functional genomics. However, the sequence data currently available is poorly structurally and functionally annotated and many genes do not have standard nomenclature assigned. Results We analysed eight chicken tissues and improved the chicken genome structural annotation by providing experimental support for the in vivo expression of 7,809 computationally predicted proteins, including 30 chicken proteins that were only electronically predicted or hypothetical translations in human. To improve functional annotation (based on Gene Ontology, we mapped these identified proteins to their human and mouse orthologs and used this orthology to transfer Gene Ontology (GO functional annotations to the chicken proteins. The 8,213 orthology-based GO annotations that we produced represent an 8% increase in currently available chicken GO annotations. Orthologous chicken products were also assigned standardized nomenclature based on current chicken nomenclature guidelines. Conclusion We demonstrate the utility of high-throughput expression proteomics for rapid experimental structural annotation of a newly sequenced eukaryote genome. These experimentally-supported predicted proteins were further annotated by assigning the proteins with standardized nomenclature and functional annotation. This method is widely applicable to a diverse range of species. Moreover, information from one genome can be used to improve the annotation of other genomes and

  16. Automatically annotating topics in transcripts of patient-provider interactions via machine learning.

    Science.gov (United States)

    Wallace, Byron C; Laws, M Barton; Small, Kevin; Wilson, Ira B; Trikalinos, Thomas A

    2014-05-01

    Annotated patient-provider encounters can provide important insights into clinical communication, ultimately suggesting how it might be improved to effect better health outcomes. But annotating outpatient transcripts with Roter or General Medical Interaction Analysis System (GMIAS) codes is expensive, limiting the scope of such analyses. We propose automatically annotating transcripts of patient-provider interactions with topic codes via machine learning. We use a conditional random field (CRF) to model utterance topic probabilities. The model accounts for the sequential structure of conversations and the words comprising utterances. We assess predictive performance via 10-fold cross-validation over GMIAS-annotated transcripts of 360 outpatient visits (>230,000 utterances). We then use automated in place of manual annotations to reproduce an analysis of 116 additional visits from a randomized trial that used GMIAS to assess the efficacy of an intervention aimed at improving communication around antiretroviral (ARV) adherence. With respect to 6 topic codes, the CRF achieved a mean pairwise kappa compared with human annotators of 0.49 (range: 0.47-0.53) and a mean overall accuracy of 0.64 (range: 0.62-0.66). With respect to the RCT reanalysis, results using automated annotations agreed with those obtained using manual ones. According to the manual annotations, the median number of ARV-related utterances without and with the intervention was 49.5 versus 76, respectively (paired sign test P = 0.07). When automated annotations were used, the respective numbers were 39 versus 55 (P = 0.04). While moderately accurate, the predicted annotations are far from perfect. Conversational topics are intermediate outcomes, and their utility is still being researched. This foray into automated topic inference suggests that machine learning methods can classify utterances comprising patient-provider interactions into clinically relevant topics with reasonable accuracy.

  17. Annotating risk factors for heart disease in clinical narratives for diabetic patients.

    Science.gov (United States)

    Stubbs, Amber; Uzuner, Özlem

    2015-12-01

    The 2014 i2b2/UTHealth natural language processing shared task featured a track focused on identifying risk factors for heart disease (specifically, Cardiac Artery Disease) in clinical narratives. For this track, we used a "light" annotation paradigm to annotate a set of 1304 longitudinal medical records describing 296 patients for risk factors and the times they were present. We designed the annotation task for this track with the goal of balancing annotation load and time with quality, so as to generate a gold standard corpus that can benefit a clinically-relevant task. We applied light annotation procedures and determined the gold standard using majority voting. On average, the agreement of annotators with the gold standard was above 0.95, indicating high reliability. The resulting document-level annotations generated for each record in each longitudinal EMR in this corpus provide information that can support studies of progression of heart disease risk factors in the included patients over time. These annotations were used in the Risk Factor track of the 2014 i2b2/UTHealth shared task. Participating systems achieved a mean micro-averaged F1 measure of 0.815 and a maximum F1 measure of 0.928 for identifying these risk factors in patient records. Copyright © 2015 Elsevier Inc. All rights reserved.

  18. Annotating Emotions in Meetings

    NARCIS (Netherlands)

    Reidsma, Dennis; Heylen, Dirk K.J.; Ordelman, Roeland J.F.

    We present the results of two trials testing procedures for the annotation of emotion and mental state of the AMI corpus. The first procedure is an adaptation of the FeelTrace method, focusing on a continuous labelling of emotion dimensions. The second method is centered around more discrete

  19. Microtask crowdsourcing for disease mention annotation in PubMed abstracts.

    Science.gov (United States)

    Good, Benjamin M; Nanis, Max; Wu, Chunlei; Su, Andrew I

    2015-01-01

    Identifying concepts and relationships in biomedical text enables knowledge to be applied in computational analyses. Many biological natural language processing (BioNLP) projects attempt to address this challenge, but the state of the art still leaves much room for improvement. Progress in BioNLP research depends on large, annotated corpora for evaluating information extraction systems and training machine learning models. Traditionally, such corpora are created by small numbers of expert annotators often working over extended periods of time. Recent studies have shown that workers on microtask crowdsourcing platforms such as Amazon's Mechanical Turk (AMT) can, in aggregate, generate high-quality annotations of biomedical text. Here, we investigated the use of the AMT in capturing disease mentions in PubMed abstracts. We used the NCBI Disease corpus as a gold standard for refining and benchmarking our crowdsourcing protocol. After several iterations, we arrived at a protocol that reproduced the annotations of the 593 documents in the 'training set' of this gold standard with an overall F measure of 0.872 (precision 0.862, recall 0.883). The output can also be tuned to optimize for precision (max = 0.984 when recall = 0.269) or recall (max = 0.980 when precision = 0.436). Each document was completed by 15 workers, and their annotations were merged based on a simple voting method. In total 145 workers combined to complete all 593 documents in the span of 9 days at a cost of $.066 per abstract per worker. The quality of the annotations, as judged with the F measure, increases with the number of workers assigned to each task; however minimal performance gains were observed beyond 8 workers per task. These results add further evidence that microtask crowdsourcing can be a valuable tool for generating well-annotated corpora in BioNLP. Data produced for this analysis are available at http://figshare.com/articles/Disease_Mention_Annotation_with_Mechanical_Turk/1126402.

  20. Consumer energy research: an annotated bibliography. Vol. 3

    Energy Technology Data Exchange (ETDEWEB)

    Anderson, D.C.; McDougall, G.H.G.

    1983-04-01

    This annotated bibliography attempts to provide a comprehensive package of existing information in consumer related energy research. A concentrated effort was made to collect unpublished material as well as material from journals and other sources, including governments, utilities research institutes and private firms. A deliberate effort was made to include agencies outside North America. For the most part the bibliography is limited to annotations of empiracal studies. However, it includes a number of descriptive reports which appear to make a significant contribution to understanding consumers and energy use. The format of the annotations displays the author, date of publication, title and source of the study. Annotations of empirical studies are divided into four parts: objectives, methods, variables and findings/implications. Care was taken to provide a reasonable amount of detail in the annotations to enable the reader to understand the methodology, the results and the degree to which the implications fo the study can be generalized to other situations. Studies are arranged alphabetically by author. The content of the studies reviewed is classified in a series of tables which are intended to provide a summary of sources, types and foci of the various studies. These tables are intended to aid researchers interested in specific topics to locate those studies most relevant to their work. The studies are categorized using a number of different classification criteria, for example, methodology used, type of energy form, type of policy initiative, and type of consumer activity. A general overview of the studies is also presented. 17 tabs.

  1. GENECODIS-Grid: An online grid-based tool to predict functional information in gene lists

    International Nuclear Information System (INIS)

    Nogales, R.; Mejia, E.; Vicente, C.; Montes, E.; Delgado, A.; Perez Griffo, F. J.; Tirado, F.; Pascual-Montano, A.

    2007-01-01

    In this work we introduce GeneCodis-Grid, a grid-based alternative to a bioinformatics tool named Genecodis that integrates different sources of biological information to search for biological features (annotations) that frequently co-occur in a set of genes and rank them by statistical significance. GeneCodis-Grid is a web-based application that takes advantage of two independent grid networks and a computer cluster managed by a meta-scheduler and a web server that host the application. The mining of concurrent biological annotations provides significant information for the functional analysis of gene list obtained by high throughput experiments in biology. Due to the large popularity of this tool, that has registered more than 13000 visits since its publication in January 2007, there is a strong need to facilitate users from different sites to access the system simultaneously. In addition, the complexity of some of the statistical tests used in this approach has made this technique a good candidate for its implementation in a Grid opportunistic environment. (Author)

  2. Image annotation based on positive-negative instances learning

    Science.gov (United States)

    Zhang, Kai; Hu, Jiwei; Liu, Quan; Lou, Ping

    2017-07-01

    Automatic image annotation is now a tough task in computer vision, the main sense of this tech is to deal with managing the massive image on the Internet and assisting intelligent retrieval. This paper designs a new image annotation model based on visual bag of words, using the low level features like color and texture information as well as mid-level feature as SIFT, and mixture the pic2pic, label2pic and label2label correlation to measure the correlation degree of labels and images. We aim to prune the specific features for each single label and formalize the annotation task as a learning process base on Positive-Negative Instances Learning. Experiments are performed using the Corel5K Dataset, and provide a quite promising result when comparing with other existing methods.

  3. An annotated list of the flora of the Bisley Area Luquillo Experimental Forest, Puerto Rico 1987 to 1992

    Science.gov (United States)

    Jesus Danilo Chinea; Renee J. Beymer; Carlos Rivera; Ines Sastre de Jeses; F.N. Scatena

    1993-01-01

    Known species of plants, including bryophytes and ferns, are listed for the area of the Bisley experimental watershed area, a subtropical wet forest in the Luquillo Mountains of northeastern Puerto Rico.

  4. An Annotated Dataset of 14 Meat Images

    DEFF Research Database (Denmark)

    Stegmann, Mikkel Bille

    2002-01-01

    This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given.......This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given....

  5. Inconsistencies of genome annotations in apicomplexan parasites revealed by 5'-end-one-pass and full-length sequences of oligo-capped cDNAs

    Directory of Open Access Journals (Sweden)

    Sugano Sumio

    2009-07-01

    Full Text Available Abstract Background Apicomplexan parasites are causative agents of various diseases including malaria and have been targets of extensive genomic sequencing. We generated 5'-EST collections for six apicomplexa parasites using our full-length oligo-capping cDNA library method. To improve upon the current genome annotations, as well as to validate the importance for physical cDNA clone resources, we generated a large-scale collection of full-length cDNAs for several apicomplexa parasites. Results In this study, we used a total of 61,056 5'-end-single-pass cDNA sequences from Plasmodium falciparum, P. vivax, P. yoelii, P. berghei, Cryptosporidium parvum, and Toxoplasma gondii. We compared these partially sequenced cDNA sequences with the currently annotated gene models and observed significant inconsistencies between the two datasets. In particular, we found that on average 14% of the exons in the current gene models were not supported by any cDNA evidence, and that 16% of the current gene models may contain at least one mis-annotation and should be re-evaluated. We also identified a large number of transcripts that had been previously unidentified. For 732 cDNAs in T. gondii, the entire sequences were determined in order to evaluate the annotated gene models at the complete full-length transcript level. We found that 41% of the T. gondii gene models contained at least one inconsistency. We also identified and confirmed by RT-PCR 140 previously unidentified transcripts found in the intergenic regions of the current gene annotations. We show that the majority of these discrepancies are due to questionable predictions of one or two extra exons in the upstream or downstream regions of the genes. Conclusion Our data indicates that the current gene models are likely to still be incomplete and have much room for improvement. Our unique full-length cDNA information is especially useful for further refinement of the annotations for the genomes of

  6. False positive reduction in protein-protein interaction predictions using gene ontology annotations

    Directory of Open Access Journals (Sweden)

    Lin Yen-Han

    2007-07-01

    Full Text Available Abstract Background Many crucial cellular operations such as metabolism, signalling, and regulations are based on protein-protein interactions. However, the lack of robust protein-protein interaction information is a challenge. One reason for the lack of solid protein-protein interaction information is poor agreement between experimental findings and computational sets that, in turn, comes from huge false positive predictions in computational approaches. Reduction of false positive predictions and enhancing true positive fraction of computationally predicted protein-protein interaction datasets based on highly confident experimental results has not been adequately investigated. Results Gene Ontology (GO annotations were used to reduce false positive protein-protein interactions (PPI pairs resulting from computational predictions. Using experimentally obtained PPI pairs as a training dataset, eight top-ranking keywords were extracted from GO molecular function annotations. The sensitivity of these keywords is 64.21% in the yeast experimental dataset and 80.83% in the worm experimental dataset. The specificities, a measure of recovery power, of these keywords applied to four predicted PPI datasets for each studied organisms, are 48.32% and 46.49% (by average of four datasets in yeast and worm, respectively. Based on eight top-ranking keywords and co-localization of interacting proteins a set of two knowledge rules were deduced and applied to remove false positive protein pairs. The 'strength', a measure of improvement provided by the rules was defined based on the signal-to-noise ratio and implemented to measure the applicability of knowledge rules applying to the predicted PPI datasets. Depending on the employed PPI-predicting methods, the strength varies between two and ten-fold of randomly removing protein pairs from the datasets. Conclusion Gene Ontology annotations along with the deduced knowledge rules could be implemented to partially

  7. Graph-based sequence annotation using a data integration approach

    Directory of Open Access Journals (Sweden)

    Pesch Robert

    2008-06-01

    Full Text Available The automated annotation of data from high throughput sequencing and genomics experiments is a significant challenge for bioinformatics. Most current approaches rely on sequential pipelines of gene finding and gene function prediction methods that annotate a gene with information from different reference data sources. Each function prediction method contributes evidence supporting a functional assignment. Such approaches generally ignore the links between the information in the reference datasets. These links, however, are valuable for assessing the plausibility of a function assignment and can be used to evaluate the confidence in a prediction. We are working towards a novel annotation system that uses the network of information supporting the function assignment to enrich the annotation process for use by expert curators and predicting the function of previously unannotated genes. In this paper we describe our success in the first stages of this development. We present the data integration steps that are needed to create the core database of integrated reference databases (UniProt, PFAM, PDB, GO and the pathway database Ara- Cyc which has been established in the ONDEX data integration system. We also present a comparison between different methods for integration of GO terms as part of the function assignment pipeline and discuss the consequences of this analysis for improving the accuracy of gene function annotation.

  8. EST-PAC a web package for EST annotation and protein sequence prediction

    Directory of Open Access Journals (Sweden)

    Strahm Yvan

    2006-10-01

    Full Text Available Abstract With the decreasing cost of DNA sequencing technology and the vast diversity of biological resources, researchers increasingly face the basic challenge of annotating a larger number of expressed sequences tags (EST from a variety of species. This typically consists of a series of repetitive tasks, which should be automated and easy to use. The results of these annotation tasks need to be stored and organized in a consistent way. All these operations should be self-installing, platform independent, easy to customize and amenable to using distributed bioinformatics resources available on the Internet. In order to address these issues, we present EST-PAC a web oriented multi-platform software package for expressed sequences tag (EST annotation. EST-PAC provides a solution for the administration of EST and protein sequence annotations accessible through a web interface. Three aspects of EST annotation are automated: 1 searching local or remote biological databases for sequence similarities using Blast services, 2 predicting protein coding sequence from EST data and, 3 annotating predicted protein sequences with functional domain predictions. In practice, EST-PAC integrates the BLASTALL suite, EST-Scan2 and HMMER in a relational database system accessible through a simple web interface. EST-PAC also takes advantage of the relational database to allow consistent storage, powerful queries of results and, management of the annotation process. The system allows users to customize annotation strategies and provides an open-source data-management environment for research and education in bioinformatics.

  9. An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets.

    Science.gov (United States)

    Hosseini, Parsa; Tremblay, Arianne; Matthews, Benjamin F; Alkharouf, Nadim W

    2010-07-02

    The data produced by an Illumina flow cell with all eight lanes occupied, produces well over a terabyte worth of images with gigabytes of reads following sequence alignment. The ability to translate such reads into meaningful annotation is therefore of great concern and importance. Very easily, one can get flooded with such a great volume of textual, unannotated data irrespective of read quality or size. CASAVA, a optional analysis tool for Illumina sequencing experiments, enables the ability to understand INDEL detection, SNP information, and allele calling. To not only extract from such analysis, a measure of gene expression in the form of tag-counts, but furthermore to annotate such reads is therefore of significant value. We developed TASE (Tag counting and Analysis of Solexa Experiments), a rapid tag-counting and annotation software tool specifically designed for Illumina CASAVA sequencing datasets. Developed in Java and deployed using jTDS JDBC driver and a SQL Server backend, TASE provides an extremely fast means of calculating gene expression through tag-counts while annotating sequenced reads with the gene's presumed function, from any given CASAVA-build. Such a build is generated for both DNA and RNA sequencing. Analysis is broken into two distinct components: DNA sequence or read concatenation, followed by tag-counting and annotation. The end result produces output containing the homology-based functional annotation and respective gene expression measure signifying how many times sequenced reads were found within the genomic ranges of functional annotations. TASE is a powerful tool to facilitate the process of annotating a given Illumina Solexa sequencing dataset. Our results indicate that both homology-based annotation and tag-count analysis are achieved in very efficient times, providing researchers to delve deep in a given CASAVA-build and maximize information extraction from a sequencing dataset. TASE is specially designed to translate sequence data

  10. BEACON: automated tool for Bacterial GEnome Annotation ComparisON

    KAUST Repository

    Kalkatawi, Manal M.; Alam, Intikhab; Bajic, Vladimir B.

    2015-01-01

    We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/

  11. Automatic medical image annotation and keyword-based image retrieval using relevance feedback.

    Science.gov (United States)

    Ko, Byoung Chul; Lee, JiHyeon; Nam, Jae-Yeal

    2012-08-01

    This paper presents novel multiple keywords annotation for medical images, keyword-based medical image retrieval, and relevance feedback method for image retrieval for enhancing image retrieval performance. For semantic keyword annotation, this study proposes a novel medical image classification method combining local wavelet-based center symmetric-local binary patterns with random forests. For keyword-based image retrieval, our retrieval system use the confidence score that is assigned to each annotated keyword by combining probabilities of random forests with predefined body relation graph. To overcome the limitation of keyword-based image retrieval, we combine our image retrieval system with relevance feedback mechanism based on visual feature and pattern classifier. Compared with other annotation and relevance feedback algorithms, the proposed method shows both improved annotation performance and accurate retrieval results.

  12. Annotating the Function of the Human Genome with Gene Ontology and Disease Ontology.

    Science.gov (United States)

    Hu, Yang; Zhou, Wenyang; Ren, Jun; Dong, Lixiang; Wang, Yadong; Jin, Shuilin; Cheng, Liang

    2016-01-01

    Increasing evidences indicated that function annotation of human genome in molecular level and phenotype level is very important for systematic analysis of genes. In this study, we presented a framework named Gene2Function to annotate Gene Reference into Functions (GeneRIFs), in which each functional description of GeneRIFs could be annotated by a text mining tool Open Biomedical Annotator (OBA), and each Entrez gene could be mapped to Human Genome Organisation Gene Nomenclature Committee (HGNC) gene symbol. After annotating all the records about human genes of GeneRIFs, 288,869 associations between 13,148 mRNAs and 7,182 terms, 9,496 associations between 948 microRNAs and 533 terms, and 901 associations between 139 long noncoding RNAs (lncRNAs) and 297 terms were obtained as a comprehensive annotation resource of human genome. High consistency of term frequency of individual gene (Pearson correlation = 0.6401, p = 2.2e - 16) and gene frequency of individual term (Pearson correlation = 0.1298, p = 3.686e - 14) in GeneRIFs and GOA shows our annotation resource is very reliable.

  13. The duplicated genes database: identification and functional annotation of co-localised duplicated genes across genomes.

    Directory of Open Access Journals (Sweden)

    Marion Ouedraogo

    Full Text Available BACKGROUND: There has been a surge in studies linking genome structure and gene expression, with special focus on duplicated genes. Although initially duplicated from the same sequence, duplicated genes can diverge strongly over evolution and take on different functions or regulated expression. However, information on the function and expression of duplicated genes remains sparse. Identifying groups of duplicated genes in different genomes and characterizing their expression and function would therefore be of great interest to the research community. The 'Duplicated Genes Database' (DGD was developed for this purpose. METHODOLOGY: Nine species were included in the DGD. For each species, BLAST analyses were conducted on peptide sequences corresponding to the genes mapped on a same chromosome. Groups of duplicated genes were defined based on these pairwise BLAST comparisons and the genomic location of the genes. For each group, Pearson correlations between gene expression data and semantic similarities between functional GO annotations were also computed when the relevant information was available. CONCLUSIONS: The Duplicated Gene Database provides a list of co-localised and duplicated genes for several species with the available gene co-expression level and semantic similarity value of functional annotation. Adding these data to the groups of duplicated genes provides biological information that can prove useful to gene expression analyses. The Duplicated Gene Database can be freely accessed through the DGD website at http://dgd.genouest.org.

  14. Partial differential equations & boundary value problems with Maple

    CERN Document Server

    Articolo, George A

    2009-01-01

    Partial Differential Equations and Boundary Value Problems with Maple presents all of the material normally covered in a standard course on partial differential equations, while focusing on the natural union between this material and the powerful computational software, Maple. The Maple commands are so intuitive and easy to learn, students can learn what they need to know about the software in a matter of hours- an investment that provides substantial returns. Maple''s animation capabilities allow students and practitioners to see real-time displays of the solutions of partial differential equations.  Maple files can be found on the books website. Ancillary list: Maple files- http://www.elsevierdirect.com/companion.jsp?ISBN=9780123747327  Provides a quick overview of the software w/simple commands needed to get startedIncludes review material on linear algebra and Ordinary Differential equations, and their contribution in solving partial differential equationsIncorporates an early introduction to Sturm-L...

  15. An effective approach for annotation of protein families with low sequence similarity and conserved motifs: identifying GDSL hydrolases across the plant kingdom.

    Science.gov (United States)

    Vujaklija, Ivan; Bielen, Ana; Paradžik, Tina; Biđin, Siniša; Goldstein, Pavle; Vujaklija, Dušica

    2016-02-18

    The massive accumulation of protein sequences arising from the rapid development of high-throughput sequencing, coupled with automatic annotation, results in high levels of incorrect annotations. In this study, we describe an approach to decrease annotation errors of protein families characterized by low overall sequence similarity. The GDSL lipolytic family comprises proteins with multifunctional properties and high potential for pharmaceutical and industrial applications. The number of proteins assigned to this family has increased rapidly over the last few years. In particular, the natural abundance of GDSL enzymes reported recently in plants indicates that they could be a good source of novel GDSL enzymes. We noticed that a significant proportion of annotated sequences lack specific GDSL motif(s) or catalytic residue(s). Here, we applied motif-based sequence analyses to identify enzymes possessing conserved GDSL motifs in selected proteomes across the plant kingdom. Motif-based HMM scanning (Viterbi decoding-VD and posterior decoding-PD) and the here described PD/VD protocol were successfully applied on 12 selected plant proteomes to identify sequences with GDSL motifs. A significant number of identified GDSL sequences were novel. Moreover, our scanning approach successfully detected protein sequences lacking at least one of the essential motifs (171/820) annotated by Pfam profile search (PfamA) as GDSL. Based on these analyses we provide a curated list of GDSL enzymes from the selected plants. CLANS clustering and phylogenetic analysis helped us to gain a better insight into the evolutionary relationship of all identified GDSL sequences. Three novel GDSL subfamilies as well as unreported variations in GDSL motifs were discovered in this study. In addition, analyses of selected proteomes showed a remarkable expansion of GDSL enzymes in the lycophyte, Selaginella moellendorffii. Finally, we provide a general motif-HMM scanner which is easily accessible through

  16. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects.

    Science.gov (United States)

    Holt, Carson; Yandell, Mark

    2011-12-22

    Second-generation sequencing technologies are precipitating major shifts with regards to what kinds of genomes are being sequenced and how they are annotated. While the first generation of genome projects focused on well-studied model organisms, many of today's projects involve exotic organisms whose genomes are largely terra incognita. This complicates their annotation, because unlike first-generation projects, there are no pre-existing 'gold-standard' gene-models with which to train gene-finders. Improvements in genome assembly and the wide availability of mRNA-seq data are also creating opportunities to update and re-annotate previously published genome annotations. Today's genome projects are thus in need of new genome annotation tools that can meet the challenges and opportunities presented by second-generation sequencing technologies. We present MAKER2, a genome annotation and data management tool designed for second-generation genome projects. MAKER2 is a multi-threaded, parallelized application that can process second-generation datasets of virtually any size. We show that MAKER2 can produce accurate annotations for novel genomes where training-data are limited, of low quality or even non-existent. MAKER2 also provides an easy means to use mRNA-seq data to improve annotation quality; and it can use these data to update legacy annotations, significantly improving their quality. We also show that MAKER2 can evaluate the quality of genome annotations, and identify and prioritize problematic annotations for manual review. MAKER2 is the first annotation engine specifically designed for second-generation genome projects. MAKER2 scales to datasets of any size, requires little in the way of training data, and can use mRNA-seq data to improve annotation quality. It can also update and manage legacy genome annotation datasets.

  17. Annotation: The Savant Syndrome

    Science.gov (United States)

    Heaton, Pamela; Wallace, Gregory L.

    2004-01-01

    Background: Whilst interest has focused on the origin and nature of the savant syndrome for over a century, it is only within the past two decades that empirical group studies have been carried out. Methods: The following annotation briefly reviews relevant research and also attempts to address outstanding issues in this research area.…

  18. Analysis of LYSA-calculus with explicit confidentiality annotations

    DEFF Research Database (Denmark)

    Gao, Han; Nielson, Hanne Riis

    2006-01-01

    Recently there has been an increased research interest in applying process calculi in the verification of cryptographic protocols due to their ability to formally model protocols. This work presents LYSA with explicit confidentiality annotations for indicating the expected behavior of target...... malicious activities performed by attackers as specified by the confidentiality annotations. The proposed analysis approach is fully automatic without the need of human intervention and has been applied successfully to a number of protocols....

  19. First generation annotations for the fathead minnow (Pimephales promelas) genome

    Science.gov (United States)

    Ab initio gene prediction and evidence alignment were used to produce the first annotations for the fathead minnow SOAPdenovo genome assembly. Additionally, a genome browser hosted at genome.setac.org provides simplified access to the annotation data in context with fathead minno...

  20. Selected annotated bibliographies for adaptive filtering of digital image data

    Science.gov (United States)

    Mayers, Margaret; Wood, Lynnette

    1988-01-01

    Digital spatial filtering is an important tool both for enhancing the information content of satellite image data and for implementing cosmetic effects which make the imagery more interpretable and appealing to the eye. Spatial filtering is a context-dependent operation that alters the gray level of a pixel by computing a weighted average formed from the gray level values of other pixels in the immediate vicinity.Traditional spatial filtering involves passing a particular filter or set of filters over an entire image. This assumes that the filter parameter values are appropriate for the entire image, which in turn is based on the assumption that the statistics of the image are constant over the image. However, the statistics of an image may vary widely over the image, requiring an adaptive or "smart" filter whose parameters change as a function of the local statistical properties of the image. Then a pixel would be averaged only with more typical members of the same population. This annotated bibliography cites some of the work done in the area of adaptive filtering. The methods usually fall into two categories, (a) those that segment the image into subregions, each assumed to have stationary statistics, and use a different filter on each subregion, and (b) those that use a two-dimensional "sliding window" to continuously estimate the filter either the spatial or frequency domain, or may utilize both domains. They may be used to deal with images degraded by space variant noise, to suppress undesirable local radiometric statistics while enforcing desirable (user-defined) statistics, to treat problems where space-variant point spread functions are involved, to segment images into regions of constant value for classification, or to "tune" images in order to remove (nonstationary) variations in illumination, noise, contrast, shadows, or haze.Since adpative filtering, like nonadaptive filtering, is used in image processing to accomplish various goals, this bibliography

  1. Extracting Cross-Ontology Weighted Association Rules from Gene Ontology Annotations.

    Science.gov (United States)

    Agapito, Giuseppe; Milano, Marianna; Guzzi, Pietro Hiram; Cannataro, Mario

    2016-01-01

    Gene Ontology (GO) is a structured repository of concepts (GO Terms) that are associated to one or more gene products through a process referred to as annotation. The analysis of annotated data is an important opportunity for bioinformatics. There are different approaches of analysis, among those, the use of association rules (AR) which provides useful knowledge, discovering biologically relevant associations between terms of GO, not previously known. In a previous work, we introduced GO-WAR (Gene Ontology-based Weighted Association Rules), a methodology for extracting weighted association rules from ontology-based annotated datasets. We here adapt the GO-WAR algorithm to mine cross-ontology association rules, i.e., rules that involve GO terms present in the three sub-ontologies of GO. We conduct a deep performance evaluation of GO-WAR by mining publicly available GO annotated datasets, showing how GO-WAR outperforms current state of the art approaches.

  2. Pipeline transportation of emerging partially upgraded bitumen

    International Nuclear Information System (INIS)

    Luhning, R.W.; Anand, A.; Blackmore, T.; Lawson, D.S.

    2002-01-01

    The recoverable reserves of Canada's vast oil deposits is estimated to be 335 billion barrels (bbl), most of which are in the Alberta oil sands. Canada was the largest import supplier of crude oil to the United States in 2001, followed by Saudi Arabia. By 2011, the production of oil sands is expected to increase to 50 per cent of Canada's oil, and conventional oil production will decline as more production will be provided by synthetic light oil and bitumen. This paper lists the announced oil sands projects. If all are to proceed, production would reach 3,445,000 bbl per day by 2011. The three main challenges regarding the transportation and marketing of this new production were described. The first is to expand the physical capacity of existing pipelines. The second is the supply of low viscosity diluent (such as natural gas condensate or synthetic diluent) to reduce the viscosity and density of the bitumen as it passes through the pipelines. The current pipeline specifications and procedures to transport partially upgraded products are presented. The final challenge is the projected refinery market constraint to process the bitumen and synthetic light oil into consumer fuel products. These challenges can be addressed by modifying refineries and increasing Canadian access in Petroleum Administration Defense District (PADD) II and IV. The technology for partial upgrading of bitumen to produce pipeline specification oil, reduce diluent requirements and add sales value, is currently under development. The number of existing refineries to potentially accept partially upgraded product is listed. The partially upgraded bitumen will be in demand for additional upgrading to end user products, and new opportunities will be presented as additional pipeline capacity is made available to transport crude to U.S. markets and overseas. The paper describes the following emerging partial upgrading methods: the OrCrude upgrading process, rapid thermal processing, CPJ process for

  3. Jannovar: a java library for exome annotation.

    Science.gov (United States)

    Jäger, Marten; Wang, Kai; Bauer, Sebastian; Smedley, Damian; Krawitz, Peter; Robinson, Peter N

    2014-05-01

    Transcript-based annotation and pedigree analysis are two basic steps in the computational analysis of whole-exome sequencing experiments in genetic diagnostics and disease-gene discovery projects. Here, we present Jannovar, a stand-alone Java application as well as a Java library designed to be used in larger software frameworks for exome and genome analysis. Jannovar uses an interval tree to identify all transcripts affected by a given variant, and provides Human Genome Variation Society-compliant annotations both for variants affecting coding sequences and splice junctions as well as untranslated regions and noncoding RNA transcripts. Jannovar can also perform family-based pedigree analysis with Variant Call Format (VCF) files with data from members of a family segregating a Mendelian disorder. Using a desktop computer, Jannovar requires a few seconds to annotate a typical VCF file with exome data. Jannovar is freely available under the BSD2 license. Source code as well as the Java application and library file can be downloaded from http://compbio.charite.de (with tutorial) and https://github.com/charite/jannovar. © 2014 WILEY PERIODICALS, INC.

  4. Graph-based sequence annotation using a data integration approach.

    Science.gov (United States)

    Pesch, Robert; Lysenko, Artem; Hindle, Matthew; Hassani-Pak, Keywan; Thiele, Ralf; Rawlings, Christopher; Köhler, Jacob; Taubert, Jan

    2008-08-25

    The automated annotation of data from high throughput sequencing and genomics experiments is a significant challenge for bioinformatics. Most current approaches rely on sequential pipelines of gene finding and gene function prediction methods that annotate a gene with information from different reference data sources. Each function prediction method contributes evidence supporting a functional assignment. Such approaches generally ignore the links between the information in the reference datasets. These links, however, are valuable for assessing the plausibility of a function assignment and can be used to evaluate the confidence in a prediction. We are working towards a novel annotation system that uses the network of information supporting the function assignment to enrich the annotation process for use by expert curators and predicting the function of previously unannotated genes. In this paper we describe our success in the first stages of this development. We present the data integration steps that are needed to create the core database of integrated reference databases (UniProt, PFAM, PDB, GO and the pathway database Ara-Cyc) which has been established in the ONDEX data integration system. We also present a comparison between different methods for integration of GO terms as part of the function assignment pipeline and discuss the consequences of this analysis for improving the accuracy of gene function annotation. The methods and algorithms presented in this publication are an integral part of the ONDEX system which is freely available from http://ondex.sf.net/.

  5. Linking Disparate Datasets of the Earth Sciences with the SemantEco Annotator

    Science.gov (United States)

    Seyed, P.; Chastain, K.; McGuinness, D. L.

    2013-12-01

    Use of Semantic Web technologies for data management in the Earth sciences (and beyond) has great potential but is still in its early stages, since the challenges of translating data into a more explicit or semantic form for immediate use within applications has not been fully addressed. In this abstract we help address this challenge by introducing the SemantEco Annotator, which enables anyone, regardless of expertise, to semantically annotate tabular Earth Science data and translate it into linked data format, while applying the logic inherent in community-standard vocabularies to guide the process. The Annotator was conceived under a desire to unify dataset content from a variety of sources under common vocabularies, for use in semantically-enabled web applications. Our current use case employs linked data generated by the Annotator for use in the SemantEco environment, which utilizes semantics to help users explore, search, and visualize water or air quality measurement and species occurrence data through a map-based interface. The generated data can also be used immediately to facilitate discovery and search capabilities within 'big data' environments. The Annotator provides a method for taking information about a dataset, that may only be known to its maintainers, and making it explicit, in a uniform and machine-readable fashion, such that a person or information system can more easily interpret the underlying structure and meaning. Its primary mechanism is to enable a user to formally describe how columns of a tabular dataset relate and/or describe entities. For example, if a user identifies columns for latitude and longitude coordinates, we can infer the data refers to a point that can be plotted on a map. Further, it can be made explicit that measurements of 'nitrate' and 'NO3-' are of the same entity through vocabulary assignments, thus more easily utilizing data sets that use different nomenclatures. The Annotator provides an extensive and searchable

  6. Apollo telescope mount: a partial listing of scientific publications and presentations, supplement 3

    International Nuclear Information System (INIS)

    Reynolds, J.M.; Fields, S.A.; Snoddy, W.C.

    1979-06-01

    Compilations of bibliographies from the principal investigator groups of the Skylab solar observatory facility that gathered data from May 28, 1973, to February 8, 1974 are presented. The analysis of these data is presently under way. The publications listed are divided into the following categories: (1) journal publications; (2) journal publications submitted; (3) other publications; (4) presentations - national and international meetings; and (5) other presentations

  7. Annotation of phenotypic diversity: decoupling data curation and ontology curation using Phenex.

    Science.gov (United States)

    Balhoff, James P; Dahdul, Wasila M; Dececchi, T Alexander; Lapp, Hilmar; Mabee, Paula M; Vision, Todd J

    2014-01-01

    Phenex (http://phenex.phenoscape.org/) is a desktop application for semantically annotating the phenotypic character matrix datasets common in evolutionary biology. Since its initial publication, we have added new features that address several major bottlenecks in the efficiency of the phenotype curation process: allowing curators during the data curation phase to provisionally request terms that are not yet available from a relevant ontology; supporting quality control against annotation guidelines to reduce later manual review and revision; and enabling the sharing of files for collaboration among curators. We decoupled data annotation from ontology development by creating an Ontology Request Broker (ORB) within Phenex. Curators can use the ORB to request a provisional term for use in data annotation; the provisional term can be automatically replaced with a permanent identifier once the term is added to an ontology. We added a set of annotation consistency checks to prevent common curation errors, reducing the need for later correction. We facilitated collaborative editing by improving the reliability of Phenex when used with online folder sharing services, via file change monitoring and continual autosave. With the addition of these new features, and in particular the Ontology Request Broker, Phenex users have been able to focus more effectively on data annotation. Phenoscape curators using Phenex have reported a smoother annotation workflow, with much reduced interruptions from ontology maintenance and file management issues.

  8. Automated evaluation of annotators for museum collections using subjective login

    NARCIS (Netherlands)

    Ceolin, D.; Nottamkandath, A.; Fokkink, W.J.; Dimitrakos, Th.; Moona, R.; Patel, Dh.; Harrison McKnight, D.

    2012-01-01

    Museums are rapidly digitizing their collections, and face a huge challenge to annotate every digitized artifact in store. Therefore they are opening up their archives for receiving annotations from experts world-wide. This paper presents an architecture for choosing the most eligible set of

  9. Annotation of the protein coding regions of the equine genome

    DEFF Research Database (Denmark)

    Hestand, Matthew S.; Kalbfleisch, Theodore S.; Coleman, Stephen J.

    2015-01-01

    Current gene annotation of the horse genome is largely derived from in silico predictions and cross-species alignments. Only a small number of genes are annotated based on equine EST and mRNA sequences. To expand the number of equine genes annotated from equine experimental evidence, we sequenced m...... and appear to be small errors in the equine reference genome, since they are also identified as homozygous variants by genomic DNA resequencing of the reference horse. Taken together, we provide a resource of equine mRNA structures and protein coding variants that will enhance equine and cross...

  10. Genome Annotation and Transcriptomics of Oil-Producing Algae

    Science.gov (United States)

    2015-03-16

    AFRL-OSR-VA-TR-2015-0103 GENOME ANNOTATION AND TRANSCRIPTOMICS OF OIL-PRODUCING ALGAE Sabeeha Merchant UNIVERSITY OF CALIFORNIA LOS ANGELES Final...2010 To 12-31-2014 4. TITLE AND SUBTITLE GENOME ANNOTATION AND TRANSCRIPTOMICS OF OIL-PRODUCING ALGAE 5a. CONTRACT NUMBER FA9550-10-1-0095 5b...NOTES 14. ABSTRACT Most algae accumulate triacylglycerols (TAGs) when they are starved for essential nutrients like N, S, P (or Si in the case of some

  11. Annotating smart environment sensor data for activity learning.

    Science.gov (United States)

    Szewcyzk, S; Dwan, K; Minor, B; Swedlove, B; Cook, D

    2009-01-01

    The pervasive sensing technologies found in smart homes offer unprecedented opportunities for providing health monitoring and assistance to individuals experiencing difficulties living independently at home. In order to monitor the functional health of smart home residents, we need to design technologies that recognize and track the activities that people perform at home. Machine learning techniques can perform this task, but the software algorithms rely upon large amounts of sample data that is correctly labeled with the corresponding activity. Labeling, or annotating, sensor data with the corresponding activity can be time consuming, may require input from the smart home resident, and is often inaccurate. Therefore, in this paper we investigate four alternative mechanisms for annotating sensor data with a corresponding activity label. We evaluate the alternative methods along the dimensions of annotation time, resident burden, and accuracy using sensor data collected in a real smart apartment.

  12. Annotation Method (AM): SE7_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE7_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data

  13. Annotation Method (AM): SE36_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE36_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  14. Annotation Method (AM): SE14_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE14_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  15. Annotation Method (AM): SE33_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE33_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  16. Annotation Method (AM): SE12_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE12_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  17. Annotation Method (AM): SE20_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE20_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  18. Annotation Method (AM): SE2_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE2_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data

  19. Annotation Method (AM): SE28_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE28_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  20. Annotation Method (AM): SE11_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE11_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  1. Annotation Method (AM): SE17_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE17_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  2. Annotation Method (AM): SE10_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE10_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  3. Annotation Method (AM): SE4_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE4_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data

  4. Annotation Method (AM): SE9_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE9_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data

  5. Annotation Method (AM): SE3_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE3_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data

  6. Annotation Method (AM): SE25_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE25_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  7. Annotation Method (AM): SE30_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE30_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  8. Annotation Method (AM): SE16_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE16_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  9. Annotation Method (AM): SE29_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE29_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  10. Annotation Method (AM): SE35_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE35_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  11. Annotation Method (AM): SE6_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE6_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data

  12. Annotation Method (AM): SE1_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE1_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data

  13. Annotation Method (AM): SE8_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE8_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data

  14. Annotation Method (AM): SE13_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE13_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  15. Annotation Method (AM): SE26_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE26_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  16. Annotation Method (AM): SE27_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE27_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  17. Annotation Method (AM): SE34_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE34_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  18. Annotation Method (AM): SE5_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available base search. Peaks with no hit to these databases are then selected to secondary se...arch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are ma...SE5_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary data

  19. Annotation Method (AM): SE15_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE15_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  20. Annotation Method (AM): SE31_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE31_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  1. Annotation Method (AM): SE32_AM1 [Metabolonote[Archive

    Lifescience Database Archive (English)

    Full Text Available abase search. Peaks with no hit to these databases are then selected to secondary s...earch using exactMassDB and Pep1000 databases. After the database search processes, each database hits are m...SE32_AM1 PowerGet annotation A1 In annotation process, KEGG, KNApSAcK and LipidMAPS are used for primary dat

  2. PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements.

    Science.gov (United States)

    Mi, Huaiyu; Huang, Xiaosong; Muruganujan, Anushya; Tang, Haiming; Mills, Caitlin; Kang, Diane; Thomas, Paul D

    2017-01-04

    The PANTHER database (Protein ANalysis THrough Evolutionary Relationships, http://pantherdb.org) contains comprehensive information on the evolution and function of protein-coding genes from 104 completely sequenced genomes. PANTHER software tools allow users to classify new protein sequences, and to analyze gene lists obtained from large-scale genomics experiments. In the past year, major improvements include a large expansion of classification information available in PANTHER, as well as significant enhancements to the analysis tools. Protein subfamily functional classifications have more than doubled due to progress of the Gene Ontology Phylogenetic Annotation Project. For human genes (as well as a few other organisms), PANTHER now also supports enrichment analysis using pathway classifications from the Reactome resource. The gene list enrichment tools include a new 'hierarchical view' of results, enabling users to leverage the structure of the classifications/ontologies; the tools also allow users to upload genetic variant data directly, rather than requiring prior conversion to a gene list. The updated coding single-nucleotide polymorphisms (SNP) scoring tool uses an improved algorithm. The hidden Markov model (HMM) search tools now use HMMER3, dramatically reducing search times and improving accuracy of E-value statistics. Finally, the PANTHER Tree-Attribute Viewer has been implemented in JavaScript, with new views for exploring protein sequence evolution. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. Software for computing and annotating genomic ranges.

    Science.gov (United States)

    Lawrence, Michael; Huber, Wolfgang; Pagès, Hervé; Aboyoun, Patrick; Carlson, Marc; Gentleman, Robert; Morgan, Martin T; Carey, Vincent J

    2013-01-01

    We describe Bioconductor infrastructure for representing and computing on annotated genomic ranges and integrating genomic data with the statistical computing features of R and its extensions. At the core of the infrastructure are three packages: IRanges, GenomicRanges, and GenomicFeatures. These packages provide scalable data structures for representing annotated ranges on the genome, with special support for transcript structures, read alignments and coverage vectors. Computational facilities include efficient algorithms for overlap and nearest neighbor detection, coverage calculation and other range operations. This infrastructure directly supports more than 80 other Bioconductor packages, including those for sequence analysis, differential expression analysis and visualization.

  4. FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation.

    Science.gov (United States)

    Bolleman, Jerven T; Mungall, Christopher J; Strozzi, Francesco; Baran, Joachim; Dumontier, Michel; Bonnal, Raoul J P; Buels, Robert; Hoehndorf, Robert; Fujisawa, Takatomo; Katayama, Toshiaki; Cock, Peter J A

    2016-06-13

    Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned "omics" areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe - and potentially merge - sequence annotations from multiple sources. Data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.

  5. EXTRACT: interactive extraction of environment metadata and term suggestion for metagenomic sample annotation.

    Science.gov (United States)

    Pafilis, Evangelos; Buttigieg, Pier Luigi; Ferrell, Barbra; Pereira, Emiliano; Schnetzer, Julia; Arvanitidis, Christos; Jensen, Lars Juhl

    2016-01-01

    The microbial and molecular ecology research communities have made substantial progress on developing standards for annotating samples with environment metadata. However, sample manual annotation is a highly labor intensive process and requires familiarity with the terminologies used. We have therefore developed an interactive annotation tool, EXTRACT, which helps curators identify and extract standard-compliant terms for annotation of metagenomic records and other samples. Behind its web-based user interface, the system combines published methods for named entity recognition of environment, organism, tissue and disease terms. The evaluators in the BioCreative V Interactive Annotation Task found the system to be intuitive, useful, well documented and sufficiently accurate to be helpful in spotting relevant text passages and extracting organism and environment terms. Comparison of fully manual and text-mining-assisted curation revealed that EXTRACT speeds up annotation by 15-25% and helps curators to detect terms that would otherwise have been missed. Database URL: https://extract.hcmr.gr/. © The Author(s) 2016. Published by Oxford University Press.

  6. miRBase: integrating microRNA annotation and deep-sequencing data.

    Science.gov (United States)

    Kozomara, Ana; Griffiths-Jones, Sam

    2011-01-01

    miRBase is the primary online repository for all microRNA sequences and annotation. The current release (miRBase 16) contains over 15,000 microRNA gene loci in over 140 species, and over 17,000 distinct mature microRNA sequences. Deep-sequencing technologies have delivered a sharp rise in the rate of novel microRNA discovery. We have mapped reads from short RNA deep-sequencing experiments to microRNAs in miRBase and developed web interfaces to view these mappings. The user can view all read data associated with a given microRNA annotation, filter reads by experiment and count, and search for microRNAs by tissue- and stage-specific expression. These data can be used as a proxy for relative expression levels of microRNA sequences, provide detailed evidence for microRNA annotations and alternative isoforms of mature microRNAs, and allow us to revisit previous annotations. miRBase is available online at: http://www.mirbase.org/.

  7. Online Metacognitive Strategies, Hypermedia Annotations, and Motivation on Hypertext Comprehension

    Science.gov (United States)

    Shang, Hui-Fang

    2016-01-01

    This study examined the effect of online metacognitive strategies, hypermedia annotations, and motivation on reading comprehension in a Taiwanese hypertext environment. A path analysis model was proposed based on the assumption that if English as a foreign language learners frequently use online metacognitive strategies and hypermedia annotations,…

  8. Annotating with Propp's Morphology of the Folktale: Reproducibility and Trainability

    NARCIS (Netherlands)

    Fisseni, B.; Kurji, A.; Löwe, B.

    2014-01-01

    We continue the study of the reproducibility of Propp’s annotations from Bod et al. (2012). We present four experiments in which test subjects were taught Propp’s annotation system; we conclude that Propp’s system needs a significant amount of training, but that with sufficient time investment, it

  9. Annotation of nerve cord transcriptome in earthworm Eisenia fetida

    Directory of Open Access Journals (Sweden)

    Vasanthakumar Ponesakki

    2017-12-01

    Full Text Available In annelid worms, the nerve cord serves as a crucial organ to control the sensory and behavioral physiology. The inadequate genome resource of earthworms has prioritized the comprehensive analysis of their transcriptome dataset to monitor the genes express in the nerve cord and predict their role in the neurotransmission and sensory perception of the species. The present study focuses on identifying the potential transcripts and predicting their functional features by annotating the transcriptome dataset of nerve cord tissues prepared by Gong et al., 2010 from the earthworm Eisenia fetida. Totally 9762 transcripts were successfully annotated against the NCBI nr database using the BLASTX algorithm and among them 7680 transcripts were assigned to a total of 44,354 GO terms. The conserve domain analysis indicated the over representation of P-loop NTPase domain and calcium binding EF-hand domain. The COG functional annotation classified 5860 transcript sequences into 25 functional categories. Further, 4502 contig sequences were found to map with 124 KEGG pathways. The annotated contig dataset exhibited 22 crucial neuropeptides having considerable matches to the marine annelid Platynereis dumerilii, suggesting their possible role in neurotransmission and neuromodulation. In addition, 108 human stem cell marker homologs were identified including the crucial epigenetic regulators, transcriptional repressors and cell cycle regulators, which may contribute to the neuronal and segmental regeneration. The complete functional annotation of this nerve cord transcriptome can be further utilized to interpret genetic and molecular mechanisms associated with neuronal development, nervous system regeneration and nerve cord function.

  10. A topic modeling approach for web service annotation

    Directory of Open Access Journals (Sweden)

    Leandro Ordóñez-Ante

    2014-06-01

    Full Text Available The actual implementation of semantic-based mechanisms for service retrieval has been restricted, given the resource-intensive procedure involved in the formal specification of services, which generally comprises associating semantic annotations to their documentation sources. Typically, developer performs such a procedure by hand, requiring specialized knowledge on models for semantic description of services (e.g. OWL-S, WSMO, SAWSDL, as well as formal specifications of knowledge. Thus, this semantic-based service description procedure turns out to be a cumbersome and error-prone task. This paper introduces a proposal for service annotation, based on processing web service documentation for extracting information regarding its offered capabilities. By uncovering the hidden semantic structure of such information through statistical analysis techniques, we are able to associate meaningful annotations to the services operations/resources, while grouping those operations into non-exclusive semantic related categories. This research paper belongs to the TelComp 2.0 project, which Colciencas and University of Cauca founded in cooperation.

  11. Tagging like Humans: Diverse and Distinct Image Annotation

    KAUST Repository

    Wu, Baoyuan

    2018-03-31

    In this work we propose a new automatic image annotation model, dubbed {\\\\bf diverse and distinct image annotation} (D2IA). The generative model D2IA is inspired by the ensemble of human annotations, which create semantically relevant, yet distinct and diverse tags. In D2IA, we generate a relevant and distinct tag subset, in which the tags are relevant to the image contents and semantically distinct to each other, using sequential sampling from a determinantal point process (DPP) model. Multiple such tag subsets that cover diverse semantic aspects or diverse semantic levels of the image contents are generated by randomly perturbing the DPP sampling process. We leverage a generative adversarial network (GAN) model to train D2IA. Extensive experiments including quantitative and qualitative comparisons, as well as human subject studies, on two benchmark datasets demonstrate that the proposed model can produce more diverse and distinct tags than the state-of-the-arts.

  12. An Atlas of annotations of Hydra vulgaris transcriptome.

    Science.gov (United States)

    Evangelista, Daniela; Tripathi, Kumar Parijat; Guarracino, Mario Rosario

    2016-09-22

    RNA sequencing takes advantage of the Next Generation Sequencing (NGS) technologies for analyzing RNA transcript counts with an excellent accuracy. Trying to interpret this huge amount of data in biological information is still a key issue, reason for which the creation of web-resources useful for their analysis is highly desiderable. Starting from a previous work, Transcriptator, we present the Atlas of Hydra's vulgaris, an extensible web tool in which its complete transcriptome is annotated. In order to provide to the users an advantageous resource that include the whole functional annotated transcriptome of Hydra vulgaris water polyp, we implemented the Atlas web-tool contains 31.988 accesible and downloadable transcripts of this non-reference model organism. Atlas, as a freely available resource, can be considered a valuable tool to rapidly retrieve functional annotation for transcripts differentially expressed in Hydra vulgaris exposed to the distinct experimental treatments. WEB RESOURCE URL: http://www-labgtp.na.icar.cnr.it/Atlas .

  13. GSV Annotated Bibliography

    Energy Technology Data Exchange (ETDEWEB)

    Roberts, Randy S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Pope, Paul A. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Jiang, Ming [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Trucano, Timothy G. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Aragon, Cecilia R. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Ni, Kevin [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Wei, Thomas [Argonne National Lab. (ANL), Argonne, IL (United States); Chilton, Lawrence K. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Bakel, Alan [Argonne National Lab. (ANL), Argonne, IL (United States)

    2011-06-14

    The following annotated bibliography was developed as part of the Geospatial Algorithm Veri cation and Validation (GSV) project for the Simulation, Algorithms and Modeling program of NA-22. Veri cation and Validation of geospatial image analysis algorithms covers a wide range of technologies. Papers in the bibliography are thus organized into the following ve topic areas: Image processing and analysis, usability and validation of geospatial image analysis algorithms, image distance measures, scene modeling and image rendering, and transportation simulation models.

  14. The integration of a metadata generation framework in a music annotation workflow

    OpenAIRE

    Corthaut, Nik; Lippens, Stefaan; Govaerts, Sten; Duval, Erik; Martens, Jean-Pierre

    2009-01-01

    In the MuziK project we try to automate the typically hard task of annotating music files manually. This annotation is used for music recommendation and for automated playlist creation. The music experts of Aristo Music (http://www.aristomusic.com) defined the data fields. High quality annotations are required since the results, playlists, are used in commercial live settings and the cost of a wrong selection is high [1].

  15. Automatic annotation of lecture videos for multimedia driven pedagogical platforms

    Directory of Open Access Journals (Sweden)

    Ali Shariq Imran

    2016-12-01

    Full Text Available Today’s eLearning websites are heavily loaded with multimedia contents, which are often unstructured, unedited, unsynchronized, and lack inter-links among different multimedia components. Hyperlinking different media modality may provide a solution for quick navigation and easy retrieval of pedagogical content in media driven eLearning websites. In addition, finding meta-data information to describe and annotate media content in eLearning platforms is challenging, laborious, prone to errors, and time-consuming task. Thus annotations for multimedia especially of lecture videos became an important part of video learning objects. To address this issue, this paper proposes three major contributions namely, automated video annotation, the 3-Dimensional (3D tag clouds, and the hyper interactive presenter (HIP eLearning platform. Combining existing state-of-the-art SIFT together with tag cloud, a novel approach for automatic lecture video annotation for the HIP is proposed. New video annotations are implemented automatically providing the needed random access in lecture videos within the platform, and a 3D tag cloud is proposed as a new way of user interaction mechanism. A preliminary study of the usefulness of the system has been carried out, and the initial results suggest that 70% of the students opted for using HIP as their preferred eLearning platform at Gjøvik University College (GUC.

  16. DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication.

    Science.gov (United States)

    Tanizawa, Yasuhiro; Fujisawa, Takatomo; Nakamura, Yasukazu

    2018-03-15

    We developed a prokaryotic genome annotation pipeline, DFAST, that also supports genome submission to public sequence databases. DFAST was originally started as an on-line annotation server, and to date, over 7000 jobs have been processed since its first launch in 2016. Here, we present a newly implemented background annotation engine for DFAST, which is also available as a standalone command-line program. The new engine can annotate a typical-sized bacterial genome within 10 min, with rich information such as pseudogenes, translation exceptions and orthologous gene assignment between given reference genomes. In addition, the modular framework of DFAST allows users to customize the annotation workflow easily and will also facilitate extensions for new functions and incorporation of new tools in the future. The software is implemented in Python 3 and runs in both Python 2.7 and 3.4-on Macintosh and Linux systems. It is freely available at https://github.com/nigyta/dfast_core/under the GPLv3 license with external binaries bundled in the software distribution. An on-line version is also available at https://dfast.nig.ac.jp/. yn@nig.ac.jp. Supplementary data are available at Bioinformatics online.

  17. Combining rules, background knowledge and change patterns to maintain semantic annotations.

    Science.gov (United States)

    Cardoso, Silvio Domingos; Chantal, Reynaud-Delaître; Da Silveira, Marcos; Pruski, Cédric

    2017-01-01

    Knowledge Organization Systems (KOS) play a key role in enriching biomedical information in order to make it machine-understandable and shareable. This is done by annotating medical documents, or more specifically, associating concept labels from KOS with pieces of digital information, e.g., images or texts. However, the dynamic nature of KOS may impact the annotations, thus creating a mismatch between the evolved concept and the associated information. To solve this problem, methods to maintain the quality of the annotations are required. In this paper, we define a framework based on rules, background knowledge and change patterns to drive the annotation adaption process. We evaluate experimentally the proposed approach in realistic cases-studies and demonstrate the overall performance of our approach in different KOS considering the precision, recall, F1-score and AUC value of the system.

  18. Phylogenetic molecular function annotation

    International Nuclear Information System (INIS)

    Engelhardt, Barbara E; Jordan, Michael I; Repo, Susanna T; Brenner, Steven E

    2009-01-01

    It is now easier to discover thousands of protein sequences in a new microbial genome than it is to biochemically characterize the specific activity of a single protein of unknown function. The molecular functions of protein sequences have typically been predicted using homology-based computational methods, which rely on the principle that homologous proteins share a similar function. However, some protein families include groups of proteins with different molecular functions. A phylogenetic approach for predicting molecular function (sometimes called 'phylogenomics') is an effective means to predict protein molecular function. These methods incorporate functional evidence from all members of a family that have functional characterizations using the evolutionary history of the protein family to make robust predictions for the uncharacterized proteins. However, they are often difficult to apply on a genome-wide scale because of the time-consuming step of reconstructing the phylogenies of each protein to be annotated. Our automated approach for function annotation using phylogeny, the SIFTER (Statistical Inference of Function Through Evolutionary Relationships) methodology, uses a statistical graphical model to compute the probabilities of molecular functions for unannotated proteins. Our benchmark tests showed that SIFTER provides accurate functional predictions on various protein families, outperforming other available methods.

  19. Annotation of rule-based models with formal semantics to enable creation, analysis, reuse and visualization

    Science.gov (United States)

    Misirli, Goksel; Cavaliere, Matteo; Waites, William; Pocock, Matthew; Madsen, Curtis; Gilfellon, Owen; Honorato-Zimmer, Ricardo; Zuliani, Paolo; Danos, Vincent; Wipat, Anil

    2016-01-01

    Motivation: Biological systems are complex and challenging to model and therefore model reuse is highly desirable. To promote model reuse, models should include both information about the specifics of simulations and the underlying biology in the form of metadata. The availability of computationally tractable metadata is especially important for the effective automated interpretation and processing of models. Metadata are typically represented as machine-readable annotations which enhance programmatic access to information about models. Rule-based languages have emerged as a modelling framework to represent the complexity of biological systems. Annotation approaches have been widely used for reaction-based formalisms such as SBML. However, rule-based languages still lack a rich annotation framework to add semantic information, such as machine-readable descriptions, to the components of a model. Results: We present an annotation framework and guidelines for annotating rule-based models, encoded in the commonly used Kappa and BioNetGen languages. We adapt widely adopted annotation approaches to rule-based models. We initially propose a syntax to store machine-readable annotations and describe a mapping between rule-based modelling entities, such as agents and rules, and their annotations. We then describe an ontology to both annotate these models and capture the information contained therein, and demonstrate annotating these models using examples. Finally, we present a proof of concept tool for extracting annotations from a model that can be queried and analyzed in a uniform way. The uniform representation of the annotations can be used to facilitate the creation, analysis, reuse and visualization of rule-based models. Although examples are given, using specific implementations the proposed techniques can be applied to rule-based models in general. Availability and implementation: The annotation ontology for rule-based models can be found at http

  20. Protein Annotators' Assistant: A Novel Application of Information Retrieval Techniques.

    Science.gov (United States)

    Wise, Michael J.

    2000-01-01

    Protein Annotators' Assistant (PAA) is a software system which assists protein annotators in assigning functions to newly sequenced proteins. PAA employs a number of information retrieval techniques in a novel setting and is thus related to text categorization, where multiple categories may be suggested, except that in this case none of the…

  1. Crowdsourcing Twitter annotations to identify first-hand experiences of prescription drug use.

    Science.gov (United States)

    Alvaro, Nestor; Conway, Mike; Doan, Son; Lofi, Christoph; Overington, John; Collier, Nigel

    2015-12-01

    Self-reported patient data has been shown to be a valuable knowledge source for post-market pharmacovigilance. In this paper we propose using the popular micro-blogging service Twitter to gather evidence about adverse drug reactions (ADRs) after firstly having identified micro-blog messages (also know as "tweets") that report first-hand experience. In order to achieve this goal we explore machine learning with data crowdsourced from laymen annotators. With the help of lay annotators recruited from CrowdFlower we manually annotated 1548 tweets containing keywords related to two kinds of drugs: SSRIs (eg. Paroxetine), and cognitive enhancers (eg. Ritalin). Our results show that inter-annotator agreement (Fleiss' kappa) for crowdsourcing ranks in moderate agreement with a pair of experienced annotators (Spearman's Rho=0.471). We utilized the gold standard annotations from CrowdFlower for automatically training a range of supervised machine learning models to recognize first-hand experience. F-Score values are reported for 6 of these techniques with the Bayesian Generalized Linear Model being the best (F-Score=0.64 and Informedness=0.43) when combined with a selected set of features obtained by using information gain criteria. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  2. Semi-supervised learning based probabilistic latent semantic analysis for automatic image annotation

    Institute of Scientific and Technical Information of China (English)

    Tian Dongping

    2017-01-01

    In recent years, multimedia annotation problem has been attracting significant research attention in multimedia and computer vision areas, especially for automatic image annotation, whose purpose is to provide an efficient and effective searching environment for users to query their images more easily.In this paper, a semi-supervised learning based probabilistic latent semantic analysis ( PL-SA) model for automatic image annotation is presenred.Since it' s often hard to obtain or create la-beled images in large quantities while unlabeled ones are easier to collect, a transductive support vector machine ( TSVM) is exploited to enhance the quality of the training image data.Then, differ-ent image features with different magnitudes will result in different performance for automatic image annotation.To this end, a Gaussian normalization method is utilized to normalize different features extracted from effective image regions segmented by the normalized cuts algorithm so as to reserve the intrinsic content of images as complete as possible.Finally, a PLSA model with asymmetric mo-dalities is constructed based on the expectation maximization( EM) algorithm to predict a candidate set of annotations with confidence scores.Extensive experiments on the general-purpose Corel5k dataset demonstrate that the proposed model can significantly improve performance of traditional PL-SA for the task of automatic image annotation.

  3. Multi-Label Classification Based on Low Rank Representation for Image Annotation

    Directory of Open Access Journals (Sweden)

    Qiaoyu Tan

    2017-01-01

    Full Text Available Annotating remote sensing images is a challenging task for its labor demanding annotation process and requirement of expert knowledge, especially when images can be annotated with multiple semantic concepts (or labels. To automatically annotate these multi-label images, we introduce an approach called Multi-Label Classification based on Low Rank Representation (MLC-LRR. MLC-LRR firstly utilizes low rank representation in the feature space of images to compute the low rank constrained coefficient matrix, then it adapts the coefficient matrix to define a feature-based graph and to capture the global relationships between images. Next, it utilizes low rank representation in the label space of labeled images to construct a semantic graph. Finally, these two graphs are exploited to train a graph-based multi-label classifier. To validate the performance of MLC-LRR against other related graph-based multi-label methods in annotating images, we conduct experiments on a public available multi-label remote sensing images (Land Cover. We perform additional experiments on five real-world multi-label image datasets to further investigate the performance of MLC-LRR. Empirical study demonstrates that MLC-LRR achieves better performance on annotating images than these comparing methods across various evaluation criteria; it also can effectively exploit global structure and label correlations of multi-label images.

  4. Software for computing and annotating genomic ranges.

    Directory of Open Access Journals (Sweden)

    Michael Lawrence

    Full Text Available We describe Bioconductor infrastructure for representing and computing on annotated genomic ranges and integrating genomic data with the statistical computing features of R and its extensions. At the core of the infrastructure are three packages: IRanges, GenomicRanges, and GenomicFeatures. These packages provide scalable data structures for representing annotated ranges on the genome, with special support for transcript structures, read alignments and coverage vectors. Computational facilities include efficient algorithms for overlap and nearest neighbor detection, coverage calculation and other range operations. This infrastructure directly supports more than 80 other Bioconductor packages, including those for sequence analysis, differential expression analysis and visualization.

  5. Consumer energy research: an annotated bibliography

    Energy Technology Data Exchange (ETDEWEB)

    Anderson, C.D.; McDougall, G.H.G.

    1980-01-01

    This document is an updated and expanded version of an earlier annotated bibliography by Dr. C. Dennis Anderson and Carman Cullen (A Review and Annotation of Energy Research on Consumers, March 1978). It is the final draft of the major report that will be published in English and French and made publicly available through the Consumer Research and Evaluation Branch of Consumer and Corporate Affairs, Canada. Two agencies granting permission to include some of their energy abstracts are the Rand Corporation and the DOE Technical Information Center. The bibliography consists mainly of empirical studies, including surveys and experiments. It also includes a number of descriptive and econometric studies that utilize secondary data. Many of the studies provide summaries of research is specific areas, and point out directions for future research efforts. 14 tables.

  6. Public Relations: Selected, Annotated Bibliography.

    Science.gov (United States)

    Demo, Penny

    Designed for students and practitioners of public relations (PR), this annotated bibliography focuses on recent journal articles and ERIC documents. The 34 citations include the following: (1) surveys of public relations professionals on career-related education; (2) literature reviews of research on measurement and evaluation of PR and…

  7. Annotating spatio-temporal datasets for meaningful analysis in the Web

    Science.gov (United States)

    Stasch, Christoph; Pebesma, Edzer; Scheider, Simon

    2014-05-01

    More and more environmental datasets that vary in space and time are available in the Web. This comes along with an advantage of using the data for other purposes than originally foreseen, but also with the danger that users may apply inappropriate analysis procedures due to lack of important assumptions made during the data collection process. In order to guide towards a meaningful (statistical) analysis of spatio-temporal datasets available in the Web, we have developed a Higher-Order-Logic formalism that captures some relevant assumptions in our previous work [1]. It allows to proof on meaningful spatial prediction and aggregation in a semi-automated fashion. In this poster presentation, we will present a concept for annotating spatio-temporal datasets available in the Web with concepts defined in our formalism. Therefore, we have defined a subset of the formalism as a Web Ontology Language (OWL) pattern. It allows capturing the distinction between the different spatio-temporal variable types, i.e. point patterns, fields, lattices and trajectories, that in turn determine whether a particular dataset can be interpolated or aggregated in a meaningful way using a certain procedure. The actual annotations that link spatio-temporal datasets with the concepts in the ontology pattern are provided as Linked Data. In order to allow data producers to add the annotations to their datasets, we have implemented a Web portal that uses a triple store at the backend to store the annotations and to make them available in the Linked Data cloud. Furthermore, we have implemented functions in the statistical environment R to retrieve the RDF annotations and, based on these annotations, to support a stronger typing of spatio-temporal datatypes guiding towards a meaningful analysis in R. [1] Stasch, C., Scheider, S., Pebesma, E., Kuhn, W. (2014): "Meaningful spatial prediction and aggregation", Environmental Modelling & Software, 51, 149-165.

  8. Automatic Annotation Method on Learners' Opinions in Case Method Discussion

    Science.gov (United States)

    Samejima, Masaki; Hisakane, Daichi; Komoda, Norihisa

    2015-01-01

    Purpose: The purpose of this paper is to annotate an attribute of a problem, a solution or no annotation on learners' opinions automatically for supporting the learners' discussion without a facilitator. The case method aims at discussing problems and solutions in a target case. However, the learners miss discussing some of problems and solutions.…

  9. High-performance web services for querying gene and variant annotation.

    Science.gov (United States)

    Xin, Jiwen; Mark, Adam; Afrasiabi, Cyrus; Tsueng, Ginger; Juchler, Moritz; Gopal, Nikhil; Stupp, Gregory S; Putman, Timothy E; Ainscough, Benjamin J; Griffith, Obi L; Torkamani, Ali; Whetzel, Patricia L; Mungall, Christopher J; Mooney, Sean D; Su, Andrew I; Wu, Chunlei

    2016-05-06

    Efficient tools for data management and integration are essential for many aspects of high-throughput biology. In particular, annotations of genes and human genetic variants are commonly used but highly fragmented across many resources. Here, we describe MyGene.info and MyVariant.info, high-performance web services for querying gene and variant annotation information. These web services are currently accessed more than three million times permonth. They also demonstrate a generalizable cloud-based model for organizing and querying biological annotation information. MyGene.info and MyVariant.info are provided as high-performance web services, accessible at http://mygene.info and http://myvariant.info . Both are offered free of charge to the research community.

  10. QTL list - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods ...Policy | Contact Us QTL list - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive ...

  11. Modeling multiple time series annotations as noisy distortions of the ground truth: An Expectation-Maximization approach.

    Science.gov (United States)

    Gupta, Rahul; Audhkhasi, Kartik; Jacokes, Zach; Rozga, Agata; Narayanan, Shrikanth

    2018-01-01

    Studies of time-continuous human behavioral phenomena often rely on ratings from multiple annotators. Since the ground truth of the target construct is often latent, the standard practice is to use ad-hoc metrics (such as averaging annotator ratings). Despite being easy to compute, such metrics may not provide accurate representations of the underlying construct. In this paper, we present a novel method for modeling multiple time series annotations over a continuous variable that computes the ground truth by modeling annotator specific distortions. We condition the ground truth on a set of features extracted from the data and further assume that the annotators provide their ratings as modification of the ground truth, with each annotator having specific distortion tendencies. We train the model using an Expectation-Maximization based algorithm and evaluate it on a study involving natural interaction between a child and a psychologist, to predict confidence ratings of the children's smiles. We compare and analyze the model against two baselines where: (i) the ground truth in considered to be framewise mean of ratings from various annotators and, (ii) each annotator is assumed to bear a distinct time delay in annotation and their annotations are aligned before computing the framewise mean.

  12. Marker list - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods ...Database Site Policy | Contact Us Marker list - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive ...

  13. Domain-based small molecule binding site annotation

    Directory of Open Access Journals (Sweden)

    Dumontier Michel

    2006-03-01

    Full Text Available Abstract Background Accurate small molecule binding site information for a protein can facilitate studies in drug docking, drug discovery and function prediction, but small molecule binding site protein sequence annotation is sparse. The Small Molecule Interaction Database (SMID, a database of protein domain-small molecule interactions, was created using structural data from the Protein Data Bank (PDB. More importantly it provides a means to predict small molecule binding sites on proteins with a known or unknown structure and unlike prior approaches, removes large numbers of false positive hits arising from transitive alignment errors, non-biologically significant small molecules and crystallographic conditions that overpredict ion binding sites. Description Using a set of co-crystallized protein-small molecule structures as a starting point, SMID interactions were generated by identifying protein domains that bind to small molecules, using NCBI's Reverse Position Specific BLAST (RPS-BLAST algorithm. SMID records are available for viewing at http://smid.blueprint.org. The SMID-BLAST tool provides accurate transitive annotation of small-molecule binding sites for proteins not found in the PDB. Given a protein sequence, SMID-BLAST identifies domains using RPS-BLAST and then lists potential small molecule ligands based on SMID records, as well as their aligned binding sites. A heuristic ligand score is calculated based on E-value, ligand residue identity and domain entropy to assign a level of confidence to hits found. SMID-BLAST predictions were validated against a set of 793 experimental small molecule interactions from the PDB, of which 472 (60% of predicted interactions identically matched the experimental small molecule and of these, 344 had greater than 80% of the binding site residues correctly identified. Further, we estimate that 45% of predictions which were not observed in the PDB validation set may be true positives. Conclusion By

  14. AUTHOR’S ANNOTATION AS A MANIFESTATION OF THE COMPOSER’S CREATIVE CONCEPTION

    Directory of Open Access Journals (Sweden)

    CIOBANU GHENADIE

    2015-06-01

    Full Text Available Annotation to his own musical works is considered by the author as a form of analysis of these opuses. Designed to provide answers about works, these comments facilitate the perception of contemporary music by performers and the audience. The composer examines various forms of annotations basing himself on their goals and the context of use, and compares them to other genres with informative function, such as the interview, analytical essay, memoirs, personal diary, etc. The article illustrated some possible forms of annotations. Besides a purely informative character of the annotation, the author notes in the conclusions the value of genuine professional analysis, providing a wide circle of listeners and experts with a brief exegetical approach to his musical works.

  15. Annotated Bibliography of MMPI Research Among College Populations: 1962-1970.

    Science.gov (United States)

    Cornish, Richard D.

    The MMPI continues to be the focus of a large quantity of research. This article offers an aid to persons working with college student populations by annotating recent MMPI research relating to college populations. A total of 49 articles (each categorized in terms of content into one of 10 sections or subsections) were annotated. The Validity of…

  16. An annotated checklist of scale insects (Hemiptera: Coccoidea) of Saint Lucia, Lesser Antilles .

    Science.gov (United States)

    Malumphy, Chris

    2014-07-31

    An annotated list of 83 scale insect species (Hemiptera: Sterorrhyncha: Coccoidea) recorded from Saint Lucia is presented, based on data gathered from UK quarantine interceptions, samples collected in an urban coastal habitat in the North West of the Island in 2013, and published records. Thirty-three species (40%) are recorded for the first time for the country, including Dysmicoccus joannesiae (Costa Lima), a South American mealybug, and Poliaspoides formosana (Takahashi), an Asian armoured scale insect pest of bamboo, which are new for the Caribbean region. The economic, environmental and social impacts caused by introduced exotic species of scale insect are discussed. Two predatory midges Diadiplosis ?coccidivora (Felt) and Diadiplosis multifila (Felt) (Diptera: Cecidomyiidae) are recorded for the first time from Saint Lucia. The latter species was observed causing 90% mortality of a large infestation of passion vine mealybug Planococcus minor (Maskell) on soursop fruit. 

  17. ePNK Applications and Annotations

    DEFF Research Database (Denmark)

    Kindler, Ekkart

    2017-01-01

    newapplicationsfor the ePNK and, in particular, visualizing the result of an application in the graphical editor of the ePNK by singannotations, and interacting with the end user using these annotations. In this paper, we give an overview of the concepts of ePNK applications by discussing the implementation...

  18. Experimental Polish-Lithuanian Corpus with the Semantic Annotation Elements

    Directory of Open Access Journals (Sweden)

    Danuta Roszko

    2015-06-01

    Full Text Available Experimental Polish-Lithuanian Corpus with the Semantic Annotation Elements In the article the authors present the experimental Polish-Lithuanian corpus (ECorpPL-LT formed for the idea of Polish-Lithuanian theoretical contrastive studies, a Polish-Lithuanian electronic dictionary, and as help for a sworn translator. The semantic annotation being brought into ECorpPL-LT is extremely useful in Polish-Lithuanian contrastive studies, and also proves helpful in translation work.

  19. A Novel Quality Measure and Correction Procedure for the Annotation of Microbial Translation Initiation Sites.

    Directory of Open Access Journals (Sweden)

    Lex Overmars

    Full Text Available The identification of translation initiation sites (TISs constitutes an important aspect of sequence-based genome analysis. An erroneous TIS annotation can impair the identification of regulatory elements and N-terminal signal peptides, and also may flaw the determination of descent, for any particular gene. We have formulated a reference-free method to score the TIS annotation quality. The method is based on a comparison of the observed and expected distribution of all TISs in a particular genome given prior gene-calling. We have assessed the TIS annotations for all available NCBI RefSeq microbial genomes and found that approximately 87% is of appropriate quality, whereas 13% needs substantial improvement. We have analyzed a number of factors that could affect TIS annotation quality such as GC-content, taxonomy, the fraction of genes with a Shine-Dalgarno sequence and the year of publication. The analysis showed that only the first factor has a clear effect. We have then formulated a straightforward Principle Component Analysis-based TIS identification strategy to self-organize and score potential TISs. The strategy is independent of reference data and a priori calculations. A representative set of 277 genomes was subjected to the analysis and we found a clear increase in TIS annotation quality for the genomes with a low quality score. The PCA-based annotation was also compared with annotation with the current tool of reference, Prodigal. The comparison for the model genome of Escherichia coli K12 showed that both methods supplement each other and that prediction agreement can be used as an indicator of a correct TIS annotation. Importantly, the data suggest that the addition of a PCA-based strategy to a Prodigal prediction can be used to 'flag' TIS annotations for re-evaluation and in addition can be used to evaluate a given annotation in case a Prodigal annotation is lacking.

  20. Using Nonexperts for Annotating Pharmacokinetic Drug-Drug Interaction Mentions in Product Labeling: A Feasibility Study.

    Science.gov (United States)

    Hochheiser, Harry; Ning, Yifan; Hernandez, Andres; Horn, John R; Jacobson, Rebecca; Boyce, Richard D

    2016-04-11

    Because vital details of potential pharmacokinetic drug-drug interactions are often described in free-text structured product labels, manual curation is a necessary but expensive step in the development of electronic drug-drug interaction information resources. The use of nonexperts to annotate potential drug-drug interaction (PDDI) mentions in drug product label annotation may be a means of lessening the burden of manual curation. Our goal was to explore the practicality of using nonexpert participants to annotate drug-drug interaction descriptions from structured product labels. By presenting annotation tasks to both pharmacy experts and relatively naïve participants, we hoped to demonstrate the feasibility of using nonexpert annotators for drug-drug information annotation. We were also interested in exploring whether and to what extent natural language processing (NLP) preannotation helped improve task completion time, accuracy, and subjective satisfaction. Two experts and 4 nonexperts were asked to annotate 208 structured product label sections under 4 conditions completed sequentially: (1) no NLP assistance, (2) preannotation of drug mentions, (3) preannotation of drug mentions and PDDIs, and (4) a repeat of the no-annotation condition. Results were evaluated within the 2 groups and relative to an existing gold standard. Participants were asked to provide reports on the time required to complete tasks and their perceptions of task difficulty. One of the experts and 3 of the nonexperts completed all tasks. Annotation results from the nonexpert group were relatively strong in every scenario and better than the performance of the NLP pipeline. The expert and 2 of the nonexperts were able to complete most tasks in less than 3 hours. Usability perceptions were generally positive (3.67 for expert, mean of 3.33 for nonexperts). The results suggest that nonexpert annotation might be a feasible option for comprehensive labeling of annotated PDDIs across a broader

  1. SeqAnt: A web service to rapidly identify and annotate DNA sequence variations

    Directory of Open Access Journals (Sweden)

    Patel Viren

    2010-09-01

    Full Text Available Abstract Background The enormous throughput and low cost of second-generation sequencing platforms now allow research and clinical geneticists to routinely perform single experiments that identify tens of thousands to millions of variant sites. Existing methods to annotate variant sites using information from publicly available databases via web browsers are too slow to be useful for the large sequencing datasets being routinely generated by geneticists. Because sequence annotation of variant sites is required before functional characterization can proceed, the lack of a high-throughput pipeline to efficiently annotate variant sites can act as a significant bottleneck in genetics research. Results SeqAnt (Sequence Annotator is an open source web service and software package that rapidly annotates DNA sequence variants and identifies recessive or compound heterozygous loci in human, mouse, fly, and worm genome sequencing experiments. Variants are characterized with respect to their functional type, frequency, and evolutionary conservation. Annotated variants can be viewed on a web browser, downloaded in a tab-delimited text file, or directly uploaded in a BED format to the UCSC genome browser. To demonstrate the speed of SeqAnt, we annotated a series of publicly available datasets that ranged in size from 37 to 3,439,107 variant sites. The total time to completely annotate these data completely ranged from 0.17 seconds to 28 minutes 49.8 seconds. Conclusion SeqAnt is an open source web service and software package that overcomes a critical bottleneck facing research and clinical geneticists using second-generation sequencing platforms. SeqAnt will prove especially useful for those investigators who lack dedicated bioinformatics personnel or infrastructure in their laboratories.

  2. A multi-ontology approach to annotate scientific documents based on a modularization technique.

    Science.gov (United States)

    Gomes, Priscilla Corrêa E Castro; Moura, Ana Maria de Carvalho; Cavalcanti, Maria Cláudia

    2015-12-01

    Scientific text annotation has become an important task for biomedical scientists. Nowadays, there is an increasing need for the development of intelligent systems to support new scientific findings. Public databases available on the Web provide useful data, but much more useful information is only accessible in scientific texts. Text annotation may help as it relies on the use of ontologies to maintain annotations based on a uniform vocabulary. However, it is difficult to use an ontology, especially those that cover a large domain. In addition, since scientific texts explore multiple domains, which are covered by distinct ontologies, it becomes even more difficult to deal with such task. Moreover, there are dozens of ontologies in the biomedical area, and they are usually big in terms of the number of concepts. It is in this context that ontology modularization can be useful. This work presents an approach to annotate scientific documents using modules of different ontologies, which are built according to a module extraction technique. The main idea is to analyze a set of single-ontology annotations on a text to find out the user interests. Based on these annotations a set of modules are extracted from a set of distinct ontologies, and are made available for the user, for complementary annotation. The reduced size and focus of the extracted modules tend to facilitate the annotation task. An experiment was conducted to evaluate this approach, with the participation of a bioinformatician specialist of the Laboratory of Peptides and Proteins of the IOC/Fiocruz, who was interested in discovering new drug targets aiming at the combat of tropical diseases. Copyright © 2015 Elsevier Inc. All rights reserved.

  3. Biography. Advisory List of Instructional Media.

    Science.gov (United States)

    North Carolina State Dept. of Public Instruction, Raleigh. Div. of Media Evaluation Service.

    The 65 biographies reviewed in this annotated bibliography are suitable for readers in grades pre-kindergarten through 12. Full bibliographic data, appropriate grade level indication, and annotations are supplied for each entry. The names and addresses of the publishers are also provided. Biographies of women, children, authors, actors, historical…

  4. An Informally Annotated Bibliography of Sociolinguistics.

    Science.gov (United States)

    Tannen, Deborah

    This annotated bibliography of sociolinguistics is divided into the following sections: speech events, ethnography of speaking and anthropological approaches to analysis of conversation; discourse analysis (including analysis of conversation and narrative), ethnomethodology and nonverbal communication; sociolinguistics; pragmatics (including…

  5. The use of semantic similarity measures for optimally integrating heterogeneous Gene Ontology data from large scale annotation pipelines

    Directory of Open Access Journals (Sweden)

    Gaston K Mazandu

    2014-08-01

    Full Text Available With the advancement of new high throughput sequencing technologies, there has been an increase in the number of genome sequencing projects worldwide, which has yielded complete genome sequences of human, animals and plants. Subsequently, several labs have focused on genome annotation, consisting of assigning functions to gene products, mostly using Gene Ontology (GO terms. As a consequence, there is an increased heterogeneity in annotations across genomes due to different approaches used by different pipelines to infer these annotations and also due to the nature of the GO structure itself. This makes a curator's task difficult, even if they adhere to the established guidelines for assessing these protein annotations. Here we develop a genome-scale approach for integrating GO annotations from different pipelines using semantic similarity measures. We used this approach to identify inconsistencies and similarities in functional annotations between orthologs of human and Drosophila melanogaster, to assess the quality of GO annotations derived from InterPro2GO mappings compared to manually annotated GO annotations for the Drosophila melanogaster proteome from a FlyBase dataset and human, and to filter GO annotation data for these proteomes. Results obtained indicate that an efficient integration of GO annotations eliminates redundancy up to 27.08 and 22.32% in the Drosophila melanogaster and human GO annotation datasets, respectively. Furthermore, we identified lack of and missing annotations for some orthologs, and annotation mismatches between InterPro2GO and manual pipelines in these two proteomes, thus requiring further curation. This simplifies and facilitates tasks of curators in assessing protein annotations, reduces redundancy and eliminates inconsistencies in large annotation datasets for ease of comparative functional genomics.

  6. Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks

    Directory of Open Access Journals (Sweden)

    Mazo Ilya

    2007-07-01

    Full Text Available Abstract Background Uncovering cellular roles of a protein is a task of tremendous importance and complexity that requires dedicated experimental work as well as often sophisticated data mining and processing tools. Protein functions, often referred to as its annotations, are believed to manifest themselves through topology of the networks of inter-proteins interactions. In particular, there is a growing body of evidence that proteins performing the same function are more likely to interact with each other than with proteins with other functions. However, since functional annotation and protein network topology are often studied separately, the direct relationship between them has not been comprehensively demonstrated. In addition to having the general biological significance, such demonstration would further validate the data extraction and processing methods used to compose protein annotation and protein-protein interactions datasets. Results We developed a method for automatic extraction of protein functional annotation from scientific text based on the Natural Language Processing (NLP technology. For the protein annotation extracted from the entire PubMed, we evaluated the precision and recall rates, and compared the performance of the automatic extraction technology to that of manual curation used in public Gene Ontology (GO annotation. In the second part of our presentation, we reported a large-scale investigation into the correspondence between communities in the literature-based protein networks and GO annotation groups of functionally related proteins. We found a comprehensive two-way match: proteins within biological annotation groups form significantly denser linked network clusters than expected by chance and, conversely, densely linked network communities exhibit a pronounced non-random overlap with GO groups. We also expanded the publicly available GO biological process annotation using the relations extracted by our NLP technology

  7. Recognition of Learner's Personality Traits through Digital Annotations in Distance Learning

    Science.gov (United States)

    Omheni, Nizar; Kalboussi, Anis; Mazhoud, Omar; Kacem, Ahmed Hadj

    2017-01-01

    Researchers in distance education are interested in observing and modelling of learner's personality profile, and adapting their learning experiences accordingly. When learners read and interact with their reading materials, they do unselfconscious activities like annotation which may be a key feature of their personalities. Annotation activity…

  8. Metab2MeSH: annotating compounds with medical subject headings.

    Science.gov (United States)

    Sartor, Maureen A; Ade, Alex; Wright, Zach; States, David; Omenn, Gilbert S; Athey, Brian; Karnovsky, Alla

    2012-05-15

    Progress in high-throughput genomic technologies has led to the development of a variety of resources that link genes to functional information contained in the biomedical literature. However, tools attempting to link small molecules to normal and diseased physiology and published data relevant to biologists and clinical investigators, are still lacking. With metabolomics rapidly emerging as a new omics field, the task of annotating small molecule metabolites becomes highly relevant. Our tool Metab2MeSH uses a statistical approach to reliably and automatically annotate compounds with concepts defined in Medical Subject Headings, and the National Library of Medicine's controlled vocabulary for biomedical concepts. These annotations provide links from compounds to biomedical literature and complement existing resources such as PubChem and the Human Metabolome Database.

  9. Annotating breast cancer microarray samples using ontologies

    Science.gov (United States)

    Liu, Hongfang; Li, Xin; Yoon, Victoria; Clarke, Robert

    2008-01-01

    As the most common cancer among women, breast cancer results from the accumulation of mutations in essential genes. Recent advance in high-throughput gene expression microarray technology has inspired researchers to use the technology to assist breast cancer diagnosis, prognosis, and treatment prediction. However, the high dimensionality of microarray experiments and public access of data from many experiments have caused inconsistencies which initiated the development of controlled terminologies and ontologies for annotating microarray experiments, such as the standard microarray Gene Expression Data (MGED) ontology (MO). In this paper, we developed BCM-CO, an ontology tailored specifically for indexing clinical annotations of breast cancer microarray samples from the NCI Thesaurus. Our research showed that the coverage of NCI Thesaurus is very limited with respect to i) terms used by researchers to describe breast cancer histology (covering 22 out of 48 histology terms); ii) breast cancer cell lines (covering one out of 12 cell lines); and iii) classes corresponding to the breast cancer grading and staging. By incorporating a wider range of those terms into BCM-CO, we were able to indexed breast cancer microarray samples from GEO using BCM-CO and MGED ontology and developed a prototype system with web interface that allows the retrieval of microarray data based on the ontology annotations. PMID:18999108

  10. One hundred prime references on hydrogeochemical and stream sediment surveying for uranium as internationally practiced, including 60 annotated references

    International Nuclear Information System (INIS)

    Sharp, R.R. Jr.; Bolivar, S.L.

    1981-04-01

    The United States Department of Energy (DOE), formerly the US ERDA, has initiated a nationwide Hydrogeochemical and Stream Sediment Reconnaissance (HSSR). This program is part of the US National Uranium Resource Evaluation, designed to provide an improved estimate for the availability and economics of nuclear fuel resources and make available to industry information for use in exploration and development of uranium resources. The Los Alamos National Laboratory is responsible for completing the HSSR in Rocky Mountain states of New Mexico, Colorado, Wyoming, and Montana and in the state of Alaska. This report contains a compilation of 100 prime references on uranium hydrogeochemical and stream sediment reconnaissance as internationally practiced prior to 1977. The major emphasis in selection of these references was directed toward constructing a HSSR program with the purpose of identifying uranium in the Los Alamos National Laboratory area of responsibility. The context of the annotated abstracts are the authors' concept of what the respective article contains relative to uranium geochemistry and hydrogeochemical and stream sediment surveying. Consequently, in many cases, significant portions of the original articles are not discussed. The text consists of two parts. Part I contains 100 prime references, alphabetically arranged. Part II contains 60 select annotated abstracts, listed in chronological order

  11. Annotated bibliography on artificial recharge of ground water, 1955-67

    Science.gov (United States)

    Signor, Donald C.; Growitz, Douglas J.; Kam, William

    1970-01-01

     Engineering," published by Fuel and Metallurgical Journals, Ltd., London, England; "Journal of Geophysical Research," American Geophysical Union, Washington, D.C.; "American Society of Civil Engineers Transactions," New York; "Selected Bibliography of Hydrology, United Kingdom, for the Years 1955-59," International Association of Scientific Hydrology; "Water Wells, an Annotated Bibliography," California University Water Resources Center Archives Report 13; "Re-use of Effluent in the Future With an Annotated Bibliography," by G. A. Whetstone, Texas Water Development Board Report 8, Austin, Tex.; "Journal of Water Pollution Control Federation," Washington, D.C.; and "A List of Selected Technical References on Artificial Recharge of Ground-Water Reservoirs," compiled by Roy W. Graves, Tulsa University, Information Services Department, Tulsa, Okla. Other notations are self-explanatory, and initials are those of the authors (DCS, DJG, WK). An unpublished compilation of recharge references by Arnon Arad sponsored by the United Nations Educational, Scientific, and Cultural Organization during a training period with the U.S. Geological Survey was also used. The bibliography is arranged alphabetically by author. Where an author has more than one publication, the arrangement is chronological; where an author has more than one publication in a given year, a, b, c, . . . are added. The indexing is by subject and geographic location. Each article was assigned the key words or phrases to best characterize its contents. Units of measure are as they were in the original article; abbreviations retained are generally those in common use such as mg/1 (milligrams per liter), ppm (parts per million), gpm (gallons per minute), km (kilometers), m (meters), cu m per hr (cubic meters p^r hour), cfs (cubic feet per second), me/1 (milliequivalents per liter), psi (pounds per square inch), BOD (biochemical oxygen demand), sq m (square meters), gpd (gallons per day), and mgd (million gallons per day). The

  12. 76 FR 59835 - Endangered and Threatened Wildlife and Plants; Partial 90-Day Finding on a Petition To List 404...

    Science.gov (United States)

    2011-09-27

    ... distribution, pollution from pesticides and fertilizers, invasive species of introduced crayfish, and the... candidate species until its removal from the candidate list in 1996. In addition to the above species, 24 of... To List 404 Species in the Southeastern United States as Endangered or Threatened With Critical...

  13. NegGOA: negative GO annotations selection using ontology structure.

    Science.gov (United States)

    Fu, Guangyuan; Wang, Jun; Yang, Bo; Yu, Guoxian

    2016-10-01

    Predicting the biological functions of proteins is one of the key challenges in the post-genomic era. Computational models have demonstrated the utility of applying machine learning methods to predict protein function. Most prediction methods explicitly require a set of negative examples-proteins that are known not carrying out a particular function. However, Gene Ontology (GO) almost always only provides the knowledge that proteins carry out a particular function, and functional annotations of proteins are incomplete. GO structurally organizes more than tens of thousands GO terms and a protein is annotated with several (or dozens) of these terms. For these reasons, the negative examples of a protein can greatly help distinguishing true positive examples of the protein from such a large candidate GO space. In this paper, we present a novel approach (called NegGOA) to select negative examples. Specifically, NegGOA takes advantage of the ontology structure, available annotations and potentiality of additional annotations of a protein to choose negative examples of the protein. We compare NegGOA with other negative examples selection algorithms and find that NegGOA produces much fewer false negatives than them. We incorporate the selected negative examples into an efficient function prediction model to predict the functions of proteins in Yeast, Human, Mouse and Fly. NegGOA also demonstrates improved accuracy than these comparing algorithms across various evaluation metrics. In addition, NegGOA is less suffered from incomplete annotations of proteins than these comparing methods. The Matlab and R codes are available at https://sites.google.com/site/guoxian85/neggoa gxyu@swu.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  14. Learning pathology using collaborative vs. individual annotation of whole slide images: a mixed methods trial.

    Science.gov (United States)

    Sahota, Michael; Leung, Betty; Dowdell, Stephanie; Velan, Gary M

    2016-12-12

    Students in biomedical disciplines require understanding of normal and abnormal microscopic appearances of human tissues (histology and histopathology). For this purpose, practical classes in these disciplines typically use virtual microscopy, viewing digitised whole slide images in web browsers. To enhance engagement, tools have been developed to enable individual or collaborative annotation of whole slide images within web browsers. To date, there have been no studies that have critically compared the impact on learning of individual and collaborative annotations on whole slide images. Junior and senior students engaged in Pathology practical classes within Medical Science and Medicine programs participated in cross-over trials of individual and collaborative annotation activities. Students' understanding of microscopic morphology was compared using timed online quizzes, while students' perceptions of learning were evaluated using an online questionnaire. For senior medical students, collaborative annotation of whole slide images was superior for understanding key microscopic features when compared to individual annotation; whilst being at least equivalent to individual annotation for junior medical science students. Across cohorts, students agreed that the annotation activities provided a user-friendly learning environment that met their flexible learning needs, improved efficiency, provided useful feedback, and helped them to set learning priorities. Importantly, these activities were also perceived to enhance motivation and improve understanding. Collaborative annotation improves understanding of microscopic morphology for students with sufficient background understanding of the discipline. These findings have implications for the deployment of annotation activities in biomedical curricula, and potentially for postgraduate training in Anatomical Pathology.

  15. Evaluation of Three Automated Genome Annotations for Halorhabdus utahensis

    DEFF Research Database (Denmark)

    Bakke, Peter; Carney, Nick; DeLoache, Will

    2009-01-01

    in databases such as NCBI and used to validate subsequent annotation errors. We submitted the genome sequence of halophilic archaeon Halorhabdus utahensis to be analyzed by three genome annotation services. We have examined the output from each service in a variety of ways in order to compare the methodology...

  16. The removable acrylic partial denture in primary care: the experience and satisfaction of dental surgeons

    Directory of Open Access Journals (Sweden)

    Rita de Cássia SILVA

    2017-11-01

    Full Text Available Abstract Introduction The guidelines of the National Politics of Oral Health have led to the inclusion of elemental prostheses in the list of Primary Care procedures. Objective This paper aimed to evaluate the performance and satisfaction of dental surgeons with the implementation of Acrylic Partial Dentures. Metodology The sample was composed by 159 dental surgeons (sample calculation, in Belo Horizonte, MG, Brazil, selected via raffle (simple random sampling. A structured questionnaire was built with 72 questions on the daily practice of the performance of dental surgeons, using the SurveyMonkey platform. Result The results showed that for most of dental surgeons, the inclusion on the list of primary care procedures was a positive initiative and they have enjoyed the experience of using Acrylic Partial Dentures. Dental surgeons who had graduated in private institutions reported to have had more failures than those who had graduated in public institutions. The better prepared dental surgeons reported less difficulties and failures, and the more satisfied professionals with the performance of Acrylic Partial Dentures related had also experienced fewer failures. Considering the indication, the majority of participants did it according to the protocol of the institution (only for anterior teeth but many revealed the use of dentures also for premolars. Conclusion Acrylic partial dentures have been a reality in the Brazilian social context even before their inclusion in the list of Primary Care procedures. Such inclusion indicates their relevance; however, it is necessary to have their confection systematized by a protocol in public services.

  17. Feeling Expression Using Avatars and Its Consistency for Subjective Annotation

    Science.gov (United States)

    Ito, Fuyuko; Sasaki, Yasunari; Hiroyasu, Tomoyuki; Miki, Mitsunori

    Consumer Generated Media(CGM) is growing rapidly and the amount of content is increasing. However, it is often difficult for users to extract important contents and the existence of contents recording their experiences can easily be forgotten. As there are no methods or systems to indicate the subjective value of the contents or ways to reuse them, subjective annotation appending subjectivity, such as feelings and intentions, to contents is needed. Representation of subjectivity depends on not only verbal expression, but also nonverbal expression. Linguistically expressed annotation, typified by collaborative tagging in social bookmarking systems, has come into widespread use, but there is no system of nonverbally expressed annotation on the web. We propose the utilization of controllable avatars as a means of nonverbal expression of subjectivity, and confirmed the consistency of feelings elicited by avatars over time for an individual and in a group. In addition, we compared the expressiveness and ease of subjective annotation between collaborative tagging and controllable avatars. The result indicates that the feelings evoked by avatars are consistent in both cases, and using controllable avatars is easier than collaborative tagging for representing feelings elicited by contents that do not express meaning, such as photos.

  18. Chemical annotation of small and peptide-like molecules at the Protein Data Bank

    Science.gov (United States)

    Young, Jasmine Y.; Feng, Zukang; Dimitropoulos, Dimitris; Sala, Raul; Westbrook, John; Zhuravleva, Marina; Shao, Chenghua; Quesada, Martha; Peisach, Ezra; Berman, Helen M.

    2013-01-01

    Over the past decade, the number of polymers and their complexes with small molecules in the Protein Data Bank archive (PDB) has continued to increase significantly. To support scientific advancements and ensure the best quality and completeness of the data files over the next 10 years and beyond, the Worldwide PDB partnership that manages the PDB archive is developing a new deposition and annotation system. This system focuses on efficient data capture across all supported experimental methods. The new deposition and annotation system is composed of four major modules that together support all of the processing requirements for a PDB entry. In this article, we describe one such module called the Chemical Component Annotation Tool. This tool uses information from both the Chemical Component Dictionary and Biologically Interesting molecule Reference Dictionary to aid in annotation. Benchmark studies have shown that the Chemical Component Annotation Tool provides significant improvements in processing efficiency and data quality. Database URL: http://wwpdb.org PMID:24291661

  19. ANNOTATION SUPPORTED OCCLUDED OBJECT TRACKING

    Directory of Open Access Journals (Sweden)

    Devinder Kumar

    2012-08-01

    Full Text Available Tracking occluded objects at different depths has become as extremely important component of study for any video sequence having wide applications in object tracking, scene recognition, coding, editing the videos and mosaicking. The paper studies the ability of annotation to track the occluded object based on pyramids with variation in depth further establishing a threshold at which the ability of the system to track the occluded object fails. Image annotation is applied on 3 similar video sequences varying in depth. In the experiment, one bike occludes the other at a depth of 60cm, 80cm and 100cm respectively. Another experiment is performed on tracking humans with similar depth to authenticate the results. The paper also computes the frame by frame error incurred by the system, supported by detailed simulations. This system can be effectively used to analyze the error in motion tracking and further correcting the error leading to flawless tracking. This can be of great interest to computer scientists while designing surveillance systems etc.

  20. Instructional Media: Comunication Skills. Advisory List.

    Science.gov (United States)

    North Carolina State Dept. of Public Instruction, Raleigh. Media and Technology Services.

    This annotated bibliography of instructional media in communication skills presents annotations of 112 books and videotapes for students in grades from pre-kindergarten through grade 12, and of 38 books and videos for teachers. The material in the bibliography for students consists mostly of poetry collections published in 1990 and 1991. The…

  1. Gene coexpression network analysis as a source of functional annotation for rice genes.

    Directory of Open Access Journals (Sweden)

    Kevin L Childs

    Full Text Available With the existence of large publicly available plant gene expression data sets, many groups have undertaken data analyses to construct gene coexpression networks and functionally annotate genes. Often, a large compendium of unrelated or condition-independent expression data is used to construct gene networks. Condition-dependent expression experiments consisting of well-defined conditions/treatments have also been used to create coexpression networks to help examine particular biological processes. Gene networks derived from either condition-dependent or condition-independent data can be difficult to interpret if a large number of genes and connections are present. However, algorithms exist to identify modules of highly connected and biologically relevant genes within coexpression networks. In this study, we have used publicly available rice (Oryza sativa gene expression data to create gene coexpression networks using both condition-dependent and condition-independent data and have identified gene modules within these networks using the Weighted Gene Coexpression Network Analysis method. We compared the number of genes assigned to modules and the biological interpretability of gene coexpression modules to assess the utility of condition-dependent and condition-independent gene coexpression networks. For the purpose of providing functional annotation to rice genes, we found that gene modules identified by coexpression analysis of condition-dependent gene expression experiments to be more useful than gene modules identified by analysis of a condition-independent data set. We have incorporated our results into the MSU Rice Genome Annotation Project database as additional expression-based annotation for 13,537 genes, 2,980 of which lack a functional annotation description. These results provide two new types of functional annotation for our database. Genes in modules are now associated with groups of genes that constitute a collective functional

  2. Partial differential equation models in macroeconomics.

    Science.gov (United States)

    Achdou, Yves; Buera, Francisco J; Lasry, Jean-Michel; Lions, Pierre-Louis; Moll, Benjamin

    2014-11-13

    The purpose of this article is to get mathematicians interested in studying a number of partial differential equations (PDEs) that naturally arise in macroeconomics. These PDEs come from models designed to study some of the most important questions in economics. At the same time, they are highly interesting for mathematicians because their structure is often quite difficult. We present a number of examples of such PDEs, discuss what is known about their properties, and list some open questions for future research. © 2014 The Author(s) Published by the Royal Society. All rights reserved.

  3. Laughter annotations in conversational speech corpora - possibilities and limitations for phonetic analysis

    NARCIS (Netherlands)

    Truong, Khiet Phuong; Trouvain, Jürgen

    Existing laughter annotations provided with several publicly available conversational speech corpora (both multiparty and dyadic conversations) were investigated and compared. We discuss the possibilities and limitations of these rather coarse and shallow laughter annotations. There are definition

  4. BIOCAT: a pattern recognition platform for customizable biological image classification and annotation.

    Science.gov (United States)

    Zhou, Jie; Lamichhane, Santosh; Sterne, Gabriella; Ye, Bing; Peng, Hanchuan

    2013-10-04

    Pattern recognition algorithms are useful in bioimage informatics applications such as quantifying cellular and subcellular objects, annotating gene expressions, and classifying phenotypes. To provide effective and efficient image classification and annotation for the ever-increasing microscopic images, it is desirable to have tools that can combine and compare various algorithms, and build customizable solution for different biological problems. However, current tools often offer a limited solution in generating user-friendly and extensible tools for annotating higher dimensional images that correspond to multiple complicated categories. We develop the BIOimage Classification and Annotation Tool (BIOCAT). It is able to apply pattern recognition algorithms to two- and three-dimensional biological image sets as well as regions of interest (ROIs) in individual images for automatic classification and annotation. We also propose a 3D anisotropic wavelet feature extractor for extracting textural features from 3D images with xy-z resolution disparity. The extractor is one of the about 20 built-in algorithms of feature extractors, selectors and classifiers in BIOCAT. The algorithms are modularized so that they can be "chained" in a customizable way to form adaptive solution for various problems, and the plugin-based extensibility gives the tool an open architecture to incorporate future algorithms. We have applied BIOCAT to classification and annotation of images and ROIs of different properties with applications in cell biology and neuroscience. BIOCAT provides a user-friendly, portable platform for pattern recognition based biological image classification of two- and three- dimensional images and ROIs. We show, via diverse case studies, that different algorithms and their combinations have different suitability for various problems. The customizability of BIOCAT is thus expected to be useful for providing effective and efficient solutions for a variety of biological

  5. PANDA: pathway and annotation explorer for visualizing and interpreting gene-centric data.

    Science.gov (United States)

    Hart, Steven N; Moore, Raymond M; Zimmermann, Michael T; Oliver, Gavin R; Egan, Jan B; Bryce, Alan H; Kocher, Jean-Pierre A

    2015-01-01

    Objective. Bringing together genomics, transcriptomics, proteomics, and other -omics technologies is an important step towards developing highly personalized medicine. However, instrumentation has advances far beyond expectations and now we are able to generate data faster than it can be interpreted. Materials and Methods. We have developed PANDA (Pathway AND Annotation) Explorer, a visualization tool that integrates gene-level annotation in the context of biological pathways to help interpret complex data from disparate sources. PANDA is a web-based application that displays data in the context of well-studied pathways like KEGG, BioCarta, and PharmGKB. PANDA represents data/annotations as icons in the graph while maintaining the other data elements (i.e., other columns for the table of annotations). Custom pathways from underrepresented diseases can be imported when existing data sources are inadequate. PANDA also allows sharing annotations among collaborators. Results. In our first use case, we show how easy it is to view supplemental data from a manuscript in the context of a user's own data. Another use-case is provided describing how PANDA was leveraged to design a treatment strategy from the somatic variants found in the tumor of a patient with metastatic sarcomatoid renal cell carcinoma. Conclusion. PANDA facilitates the interpretation of gene-centric annotations by visually integrating this information with context of biological pathways. The application can be downloaded or used directly from our website: http://bioinformaticstools.mayo.edu/research/panda-viewer/.

  6. Application of whole slide image markup and annotation for pathologist knowledge capture.

    Science.gov (United States)

    Campbell, Walter S; Foster, Kirk W; Hinrichs, Steven H

    2013-01-01

    The ability to transfer image markup and annotation data from one scanned image of a slide to a newly acquired image of the same slide within a single vendor platform was investigated. The goal was to study the ability to use image markup and annotation data files as a mechanism to capture and retain pathologist knowledge without retaining the entire whole slide image (WSI) file. Accepted mathematical principles were investigated as a method to overcome variations in scans of the same glass slide and to accurately associate image markup and annotation data across different WSI of the same glass slide. Trilateration was used to link fixed points within the image and slide to the placement of markups and annotations of the image in a metadata file. Variation in markup and annotation placement between WSI of the same glass slide was reduced from over 80 μ to less than 4 μ in the x-axis and from 17 μ to 6 μ in the y-axis (P < 0.025). This methodology allows for the creation of a highly reproducible image library of histopathology images and interpretations for educational and research use.

  7. Transcript-level annotation of Affymetrix probesets improves the interpretation of gene expression data

    Directory of Open Access Journals (Sweden)

    Tu Kang

    2007-06-01

    Full Text Available Abstract Background The wide use of Affymetrix microarray in broadened fields of biological research has made the probeset annotation an important issue. Standard Affymetrix probeset annotation is at gene level, i.e. a probeset is precisely linked to a gene, and probeset intensity is interpreted as gene expression. The increased knowledge that one gene may have multiple transcript variants clearly brings up the necessity of updating this gene-level annotation to a refined transcript-level. Results Through performing rigorous alignments of the Affymetrix probe sequences against a comprehensive pool of currently available transcript sequences, and further linking the probesets to the International Protein Index, we generated transcript-level or protein-level annotation tables for two popular Affymetrix expression arrays, Mouse Genome 430A 2.0 Array and Human Genome U133A Array. Application of our new annotations in re-examining existing expression data sets shows increased expression consistency among synonymous probesets and strengthened expression correlation between interacting proteins. Conclusion By refining the standard Affymetrix annotation of microarray probesets from the gene level to the transcript level and protein level, one can achieve a more reliable interpretation of their experimental data, which may lead to discovery of more profound regulatory mechanism.

  8. 76 FR 510 - National Oil and Hazardous Substances Pollution Contingency Plan; National Priorities List...

    Science.gov (United States)

    2011-01-05

    ..., Intergovernmental relations, Penalties, Reporting and recordkeeping requirements, Superfund, Water pollution control... and Hazardous Substances Pollution Contingency Plan; National Priorities List: Partial Deletion of the... Site is located in Albuquerque, Bernalillo County, New Mexico. After this deletion, this 62 acres will...

  9. Registered plant list - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods ...the Plant DB link list in simple search page) Genome analysis methods Presence or... absence of Genome analysis methods information in this DB (link to the Genome analysis methods information ...base Site Policy | Contact Us Registered plant list - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive ...

  10. Sensor Control And Film Annotation For Long Range, Standoff Reconnaissance

    Science.gov (United States)

    Schmidt, Thomas G.; Peters, Owen L.; Post, Lawrence H.

    1984-12-01

    This paper describes a Reconnaissance Data Annotation System that incorporates off-the-shelf technology and system designs providing a high degree of adaptability and interoperability to satisfy future reconnaissance data requirements. The history of data annotation for reconnaissance is reviewed in order to provide the base from which future developments can be assessed and technical risks minimized. The system described will accommodate new developments in recording head assemblies and the incorporation of advanced cameras of both the film and electro-optical type. Use of microprocessor control and digital bus inter-face form the central design philosophy. For long range, high altitude, standoff missions, the Data Annotation System computes the projected latitude and longitude of central target position from aircraft position and attitude. This complements the use of longer ranges and high altitudes for reconnaissance missions.

  11. Integrating UIMA annotators in a web-based text processing framework.

    Science.gov (United States)

    Chen, Xiang; Arnold, Corey W

    2013-01-01

    The Unstructured Information Management Architecture (UIMA) [1] framework is a growing platform for natural language processing (NLP) applications. However, such applications may be difficult for non-technical users deploy. This project presents a web-based framework that wraps UIMA-based annotator systems into a graphical user interface for researchers and clinicians, and a web service for developers. An annotator that extracts data elements from lung cancer radiology reports is presented to illustrate the use of the system. Annotation results from the web system can be exported to multiple formats for users to utilize in other aspects of their research and workflow. This project demonstrates the benefits of a lay-user interface for complex NLP applications. Efforts such as this can lead to increased interest and support for NLP work in the clinical domain.

  12. Processing sequence annotation data using the Lua programming language.

    Science.gov (United States)

    Ueno, Yutaka; Arita, Masanori; Kumagai, Toshitaka; Asai, Kiyoshi

    2003-01-01

    The data processing language in a graphical software tool that manages sequence annotation data from genome databases should provide flexible functions for the tasks in molecular biology research. Among currently available languages we adopted the Lua programming language. It fulfills our requirements to perform computational tasks for sequence map layouts, i.e. the handling of data containers, symbolic reference to data, and a simple programming syntax. Upon importing a foreign file, the original data are first decomposed in the Lua language while maintaining the original data schema. The converted data are parsed by the Lua interpreter and the contents are stored in our data warehouse. Then, portions of annotations are selected and arranged into our catalog format to be depicted on the sequence map. Our sequence visualization program was successfully implemented, embedding the Lua language for processing of annotation data and layout script. The program is available at http://staff.aist.go.jp/yutaka.ueno/guppy/.

  13. A Linked Data-Based Collaborative Annotation System for Increasing Learning Achievements

    Science.gov (United States)

    Zarzour, Hafed; Sellami, Mokhtar

    2017-01-01

    With the emergence of the Web 2.0, collaborative annotation practices have become more mature in the field of learning. In this context, several recent studies have shown the powerful effects of the integration of annotation mechanism in learning process. However, most of these studies provide poor support for semantically structured resources,…

  14. Assessment of disease named entity recognition on a corpus of annotated sentences.

    Science.gov (United States)

    Jimeno, Antonio; Jimenez-Ruiz, Ernesto; Lee, Vivian; Gaudan, Sylvain; Berlanga, Rafael; Rebholz-Schuhmann, Dietrich

    2008-04-11

    In recent years, the recognition of semantic types from the biomedical scientific literature has been focused on named entities like protein and gene names (PGNs) and gene ontology terms (GO terms). Other semantic types like diseases have not received the same level of attention. Different solutions have been proposed to identify disease named entities in the scientific literature. While matching the terminology with language patterns suffers from low recall (e.g., Whatizit) other solutions make use of morpho-syntactic features to better cover the full scope of terminological variability (e.g., MetaMap). Currently, MetaMap that is provided from the National Library of Medicine (NLM) is the state of the art solution for the annotation of concepts from UMLS (Unified Medical Language System) in the literature. Nonetheless, its performance has not yet been assessed on an annotated corpus. In addition, little effort has been invested so far to generate an annotated dataset that links disease entities in text to disease entries in a database, thesaurus or ontology and that could serve as a gold standard to benchmark text mining solutions. As part of our research work, we have taken a corpus that has been delivered in the past for the identification of associations of genes to diseases based on the UMLS Metathesaurus and we have reprocessed and re-annotated the corpus. We have gathered annotations for disease entities from two curators, analyzed their disagreement (0.51 in the kappa-statistic) and composed a single annotated corpus for public use. Thereafter, three solutions for disease named entity recognition including MetaMap have been applied to the corpus to automatically annotate it with UMLS Metathesaurus concepts. The resulting annotations have been benchmarked to compare their performance. The annotated corpus is publicly available at ftp://ftp.ebi.ac.uk/pub/software/textmining/corpora/diseases and can serve as a benchmark to other systems. In addition, we found

  15. AIGO: Towards a unified framework for the Analysis and the Inter-comparison of GO functional annotations

    Directory of Open Access Journals (Sweden)

    Defoin-Platel Michael

    2011-11-01

    Full Text Available Abstract Background In response to the rapid growth of available genome sequences, efforts have been made to develop automatic inference methods to functionally characterize them. Pipelines that infer functional annotation are now routinely used to produce new annotations at a genome scale and for a broad variety of species. These pipelines differ widely in their inference algorithms, confidence thresholds and data sources for reasoning. This heterogeneity makes a comparison of the relative merits of each approach extremely complex. The evaluation of the quality of the resultant annotations is also challenging given there is often no existing gold-standard against which to evaluate precision and recall. Results In this paper, we present a pragmatic approach to the study of functional annotations. An ensemble of 12 metrics, describing various aspects of functional annotations, is defined and implemented in a unified framework, which facilitates their systematic analysis and inter-comparison. The use of this framework is demonstrated on three illustrative examples: analysing the outputs of state-of-the-art inference pipelines, comparing electronic versus manual annotation methods, and monitoring the evolution of publicly available functional annotations. The framework is part of the AIGO library (http://code.google.com/p/aigo for the Analysis and the Inter-comparison of the products of Gene Ontology (GO annotation pipelines. The AIGO library also provides functionalities to easily load, analyse, manipulate and compare functional annotations and also to plot and export the results of the analysis in various formats. Conclusions This work is a step toward developing a unified framework for the systematic study of GO functional annotations. This framework has been designed so that new metrics on GO functional annotations can be added in a very straightforward way.

  16. Saint: a lightweight integration environment for model annotation.

    Science.gov (United States)

    Lister, Allyson L; Pocock, Matthew; Taschuk, Morgan; Wipat, Anil

    2009-11-15

    Saint is a web application which provides a lightweight annotation integration environment for quantitative biological models. The system enables modellers to rapidly mark up models with biological information derived from a range of data sources. Saint is freely available for use on the web at http://www.cisban.ac.uk/saint. The web application is implemented in Google Web Toolkit and Tomcat, with all major browsers supported. The Java source code is freely available for download at http://saint-annotate.sourceforge.net. The Saint web server requires an installation of libSBML and has been tested on Linux (32-bit Ubuntu 8.10 and 9.04).

  17. Generation of silver standard concept annotations from biomedical texts with special relevance to phenotypes.

    Science.gov (United States)

    Oellrich, Anika; Collier, Nigel; Smedley, Damian; Groza, Tudor

    2015-01-01

    Electronic health records and scientific articles possess differing linguistic characteristics that may impact the performance of natural language processing tools developed for one or the other. In this paper, we investigate the performance of four extant concept recognition tools: the clinical Text Analysis and Knowledge Extraction System (cTAKES), the National Center for Biomedical Ontology (NCBO) Annotator, the Biomedical Concept Annotation System (BeCAS) and MetaMap. Each of the four concept recognition systems is applied to four different corpora: the i2b2 corpus of clinical documents, a PubMed corpus of Medline abstracts, a clinical trails corpus and the ShARe/CLEF corpus. In addition, we assess the individual system performances with respect to one gold standard annotation set, available for the ShARe/CLEF corpus. Furthermore, we built a silver standard annotation set from the individual systems' output and assess the quality as well as the contribution of individual systems to the quality of the silver standard. Our results demonstrate that mainly the NCBO annotator and cTAKES contribute to the silver standard corpora (F1-measures in the range of 21% to 74%) and their quality (best F1-measure of 33%), independent from the type of text investigated. While BeCAS and MetaMap can contribute to the precision of silver standard annotations (precision of up to 42%), the F1-measure drops when combined with NCBO Annotator and cTAKES due to a low recall. In conclusion, the performances of individual systems need to be improved independently from the text types, and the leveraging strategies to best take advantage of individual systems' annotations need to be revised. The textual content of the PubMed corpus, accession numbers for the clinical trials corpus, and assigned annotations of the four concept recognition systems as well as the generated silver standard annotation sets are available from http://purl.org/phenotype/resources. The textual content of the Sh

  18. Generation of silver standard concept annotations from biomedical texts with special relevance to phenotypes.

    Directory of Open Access Journals (Sweden)

    Anika Oellrich

    Full Text Available Electronic health records and scientific articles possess differing linguistic characteristics that may impact the performance of natural language processing tools developed for one or the other. In this paper, we investigate the performance of four extant concept recognition tools: the clinical Text Analysis and Knowledge Extraction System (cTAKES, the National Center for Biomedical Ontology (NCBO Annotator, the Biomedical Concept Annotation System (BeCAS and MetaMap. Each of the four concept recognition systems is applied to four different corpora: the i2b2 corpus of clinical documents, a PubMed corpus of Medline abstracts, a clinical trails corpus and the ShARe/CLEF corpus. In addition, we assess the individual system performances with respect to one gold standard annotation set, available for the ShARe/CLEF corpus. Furthermore, we built a silver standard annotation set from the individual systems' output and assess the quality as well as the contribution of individual systems to the quality of the silver standard. Our results demonstrate that mainly the NCBO annotator and cTAKES contribute to the silver standard corpora (F1-measures in the range of 21% to 74% and their quality (best F1-measure of 33%, independent from the type of text investigated. While BeCAS and MetaMap can contribute to the precision of silver standard annotations (precision of up to 42%, the F1-measure drops when combined with NCBO Annotator and cTAKES due to a low recall. In conclusion, the performances of individual systems need to be improved independently from the text types, and the leveraging strategies to best take advantage of individual systems' annotations need to be revised. The textual content of the PubMed corpus, accession numbers for the clinical trials corpus, and assigned annotations of the four concept recognition systems as well as the generated silver standard annotation sets are available from http://purl.org/phenotype/resources. The textual content

  19. How Viral Poems are Annotated : On ‘OCD’ by Neil Hilborn

    NARCIS (Netherlands)

    van der Starre, K.A.

    2015-01-01

    In How Viral Poems are Annotated: On ‘OCD’ by Neil Hilborn Kila van der Starre explores how, where and by whom viral poems are annotated. The article focuses on the performance of the poem ‘OCD’ by Neil Hilborn that went viral in the summer of 2013 and has been viewed more than 10 million times on

  20. Current trend of annotating single nucleotide variation in humans--A case study on SNVrap.

    Science.gov (United States)

    Li, Mulin Jun; Wang, Junwen

    2015-06-01

    As high throughput methods, such as whole genome genotyping arrays, whole exome sequencing (WES) and whole genome sequencing (WGS), have detected huge amounts of genetic variants associated with human diseases, function annotation of these variants is an indispensable step in understanding disease etiology. Large-scale functional genomics projects, such as The ENCODE Project and Roadmap Epigenomics Project, provide genome-wide profiling of functional elements across different human cell types and tissues. With the urgent demands for identification of disease-causal variants, comprehensive and easy-to-use annotation tool is highly in demand. Here we review and discuss current progress and trend of the variant annotation field. Furthermore, we introduce a comprehensive web portal for annotating human genetic variants. We use gene-based features and the latest functional genomics datasets to annotate single nucleotide variation (SNVs) in human, at whole genome scale. We further apply several function prediction algorithms to annotate SNVs that might affect different biological processes, including transcriptional gene regulation, alternative splicing, post-transcriptional regulation, translation and post-translational modifications. The SNVrap web portal is freely available at http://jjwanglab.org/snvrap. Copyright © 2014 Elsevier Inc. All rights reserved.

  1. Expression profiling of hypothetical genes in Desulfovibrio vulgaris leads to improved functional annotation

    Energy Technology Data Exchange (ETDEWEB)

    Elias, Dwayne A.; Mukhopadhyay, Aindrila; Joachimiak, Marcin P.; Drury, Elliott C.; Redding, Alyssa M.; Yen, Huei-Che B.; Fields, Matthew W.; Hazen, Terry C.; Arkin, Adam P.; Keasling, Jay D.; Wall, Judy D.

    2008-10-27

    Hypothetical and conserved hypothetical genes account for>30percent of sequenced bacterial genomes. For the sulfate-reducing bacterium Desulfovibrio vulgaris Hildenborough, 347 of the 3634 genes were annotated as conserved hypothetical (9.5percent) along with 887 hypothetical genes (24.4percent). Given the large fraction of the genome, it is plausible that some of these genes serve critical cellular roles. The study goals were to determine which genes were expressed and provide a more functionally based annotation. To accomplish this, expression profiles of 1234 hypothetical and conserved genes were used from transcriptomic datasets of 11 environmental stresses, complemented with shotgun LC-MS/MS and AMT tag proteomic data. Genes were divided into putatively polycistronic operons and those predicted to be monocistronic, then classified by basal expression levels and grouped according to changes in expression for one or multiple stresses. 1212 of these genes were transcribed with 786 producing detectable proteins. There was no evidence for expression of 17 predicted genes. Except for the latter, monocistronic gene annotation was expanded using the above criteria along with matching Clusters of Orthologous Groups. Polycistronic genes were annotated in the same manner with inferences from their proximity to more confidently annotated genes. Two targeted deletion mutants were used as test cases to determine the relevance of the inferred functional annotations.

  2. Just-in-time : on strategy annotations

    NARCIS (Netherlands)

    J.C. van de Pol (Jaco)

    2001-01-01

    textabstractA simple kind of strategy annotations is investigated, giving rise to a class of strategies, including leftmost-innermost. It is shown that under certain restrictions, an interpreter can be written which computes the normal form of a term in a bottom-up traversal. The main contribution

  3. Snap: an integrated SNP annotation platform

    DEFF Research Database (Denmark)

    Li, Shengting; Ma, Lijia; Li, Heng

    2007-01-01

    Snap (Single Nucleotide Polymorphism Annotation Platform) is a server designed to comprehensively analyze single genes and relationships between genes basing on SNPs in the human genome. The aim of the platform is to facilitate the study of SNP finding and analysis within the framework of medical...

  4. Improved annotation through genome-scale metabolic modeling of Aspergillus oryzae

    DEFF Research Database (Denmark)

    Vongsangnak, Wanwipa; Olsen, Peter; Hansen, Kim

    2008-01-01

    Background: Since ancient times the filamentous fungus Aspergillus oryzae has been used in the fermentation industry for the production of fermented sauces and the production of industrial enzymes. Recently, the genome sequence of A. oryzae with 12,074 annotated genes was released but the number...... to a genome scale metabolic model of A. oryzae. Results: Our assembled EST sequences we identified 1,046 newly predicted genes in the A. oryzae genome. Furthermore, it was possible to assign putative protein functions to 398 of the newly predicted genes. Noteworthy, our annotation strategy resulted...... model was validated and shown to correctly describe the phenotypic behavior of A. oryzae grown on different carbon sources. Conclusion: A much enhanced annotation of the A. oryzae genome was performed and a genomescale metabolic model of A. oryzae was reconstructed. The model accurately predicted...

  5. Supporting Listening Comprehension and Vocabulary Acquisition with Multimedia Annotations: The Students' Voice.

    Science.gov (United States)

    Jones, Linda C.

    2003-01-01

    Extends Mayer's (1997, 2001) generative theory of multimedia learning and investigates under what conditions multimedia annotations can support listening comprehension in a second language. Highlights students' views on the effectiveness of multimedia annotations (visual and verbal) in assisting them in their comprehension and acquisition of…

  6. dbCAN2: a meta server for automated carbohydrate-active enzyme annotation

    DEFF Research Database (Denmark)

    Zhang, Han; Yohe, Tanner; Huang, Le

    2018-01-01

    of plant and plant-associated microbial genomes and metagenomes being sequenced, there is an urgent need of automatic tools for genomic data mining of CAZymes. We developed the dbCAN web server in 2012 to provide a public service for automated CAZyme annotation for newly sequenced genomes. Here, dbCAN2...... (http://cys.bios.niu.edu/dbCAN2) is presented as an updated meta server, which integrates three state-of-the-art tools for CAZome (all CAZymes of a genome) annotation: (i) HMMER search against the dbCAN HMM (hidden Markov model) database; (ii) DIAMOND search against the CAZy pre-annotated CAZyme...

  7. GRADUATE AND PROFESSIONAL EDUCATION, AN ANNOTATED BIBLIOGRAPHY.

    Science.gov (United States)

    HEISS, ANN M.; AND OTHERS

    THIS ANNOTATED BIBLIOGRAPHY CONTAINS REFERENCES TO GENERAL GRADUATE EDUCATION AND TO EDUCATION FOR THE FOLLOWING PROFESSIONAL FIELDS--ARCHITECTURE, BUSINESS, CLINICAL PSYCHOLOGY, DENTISTRY, ENGINEERING, LAW, LIBRARY SCIENCE, MEDICINE, NURSING, SOCIAL WORK, TEACHING, AND THEOLOGY. (HW)

  8. AnnoLnc: a web server for systematically annotating novel human lncRNAs.

    Science.gov (United States)

    Hou, Mei; Tang, Xing; Tian, Feng; Shi, Fangyuan; Liu, Fenglin; Gao, Ge

    2016-11-16

    Long noncoding RNAs (lncRNAs) have been shown to play essential roles in almost every important biological process through multiple mechanisms. Although the repertoire of human lncRNAs has rapidly expanded, their biological function and regulation remain largely elusive, calling for a systematic and integrative annotation tool. Here we present AnnoLnc ( http://annolnc.cbi.pku.edu.cn ), a one-stop portal for systematically annotating novel human lncRNAs. Based on more than 700 data sources and various tool chains, AnnoLnc enables a systematic annotation covering genomic location, secondary structure, expression patterns, transcriptional regulation, miRNA interaction, protein interaction, genetic association and evolution. An intuitive web interface is available for interactive analysis through both desktops and mobile devices, and programmers can further integrate AnnoLnc into their pipeline through standard JSON-based Web Service APIs. To the best of our knowledge, AnnoLnc is the only web server to provide on-the-fly and systematic annotation for newly identified human lncRNAs. Compared with similar tools, the annotation generated by AnnoLnc covers a much wider spectrum with intuitive visualization. Case studies demonstrate the power of AnnoLnc in not only rediscovering known functions of human lncRNAs but also inspiring novel hypotheses.

  9. Effects of Annotations and Homework on Learning Achievement: An Empirical Study of Scratch Programming Pedagogy

    Science.gov (United States)

    Su, Addison Y. S.; Huang, Chester S. J.; Yang, Stephen J. H.; Ding, T. J.; Hsieh, Y. Z.

    2015-01-01

    In Taiwan elementary schools, Scratch programming has been taught for more than four years. Previous studies have shown that personal annotations is a useful learning method that improve learning performance. An annotation-based Scratch programming (ASP) system provides for the creation, share, and review of annotations and homework solutions in…

  10. Crowdsourcing image annotation for nucleus detection and segmentation in computational pathology: evaluating experts, automated methods, and the crowd.

    Science.gov (United States)

    Irshad, H; Montaser-Kouhsari, L; Waltz, G; Bucur, O; Nowak, J A; Dong, F; Knoblauch, N W; Beck, A H

    2015-01-01

    The development of tools in computational pathology to assist physicians and biomedical scientists in the diagnosis of disease requires access to high-quality annotated images for algorithm learning and evaluation. Generating high-quality expert-derived annotations is time-consuming and expensive. We explore the use of crowdsourcing for rapidly obtaining annotations for two core tasks in com- putational pathology: nucleus detection and nucleus segmentation. We designed and implemented crowdsourcing experiments using the CrowdFlower platform, which provides access to a large set of labor channel partners that accesses and manages millions of contributors worldwide. We obtained annotations from four types of annotators and compared concordance across these groups. We obtained: crowdsourced annotations for nucleus detection and segmentation on a total of 810 images; annotations using automated methods on 810 images; annotations from research fellows for detection and segmentation on 477 and 455 images, respectively; and expert pathologist-derived annotations for detection and segmentation on 80 and 63 images, respectively. For the crowdsourced annotations, we evaluated performance across a range of contributor skill levels (1, 2, or 3). The crowdsourced annotations (4,860 images in total) were completed in only a fraction of the time and cost required for obtaining annotations using traditional methods. For the nucleus detection task, the research fellow-derived annotations showed the strongest concordance with the expert pathologist- derived annotations (F-M =93.68%), followed by the crowd-sourced contributor levels 1,2, and 3 and the automated method, which showed relatively similar performance (F-M = 87.84%, 88.49%, 87.26%, and 86.99%, respectively). For the nucleus segmentation task, the crowdsourced contributor level 3-derived annotations, research fellow-derived annotations, and automated method showed the strongest concordance with the expert pathologist

  11. nGASP - the nematode genome annotation assessment project

    Energy Technology Data Exchange (ETDEWEB)

    Coghlan, A; Fiedler, T J; McKay, S J; Flicek, P; Harris, T W; Blasiar, D; Allen, J; Stein, L D

    2008-12-19

    While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets for 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase. The most accurate gene-finders were 'combiner' algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second place. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy as reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs were the most challenging for gene-finders. While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets for 10 Mb of the C

  12. Partial report and other sampling procedures overestimate the duration of iconic memory.

    Science.gov (United States)

    Appelman, I B

    1980-03-01

    In three experiments, subjects estimated the duration of a brief visual image (iconic memory) either directly by adjusting onset of a click to offset of the visual image, or indirectly with a Sperling partial report (sampling) procedure. The results indicated that partial report and other sampling procedures may reflect other brief phenomena along with iconic memory. First, the partial report procedure yields a greater estimate of the duration of iconic memory than the more direct click method. Second, the partial report estimate of the duration of iconic memory is affected if the subject is required to simultaneously retain a list of distractor items (memory load), while the click method estimate of the duration of iconic memory is not affected by a memory load. Finally, another sampling procedure based on visual cuing yields different estimates of the duration of iconic memory depending on how many items are cued. It was concluded that partial report and other sampling procedures overestimate the duration of iconic memory.

  13. The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4).

    Science.gov (United States)

    Huntemann, Marcel; Ivanova, Natalia N; Mavromatis, Konstantinos; Tripp, H James; Paez-Espino, David; Palaniappan, Krishnaveni; Szeto, Ernest; Pillay, Manoj; Chen, I-Min A; Pati, Amrita; Nielsen, Torben; Markowitz, Victor M; Kyrpides, Nikos C

    2015-01-01

    The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. Structural annotation is followed by assignment of protein product names and functions.

  14. 77 FR 31215 - National Oil and Hazardous Substances Pollution Contingency Plan; National Priorities List...

    Science.gov (United States)

    2012-05-25

    ... and Hazardous Substances Pollution Contingency Plan; National Priorities List: Partial Deletion of the... the surface soil, unsaturated subsurface soil, surface water and sediments of Operable Unit (OU) 1...: The Environmental Protection Agency (EPA) Region 8 announces the deletion of Operable Unit (OU) 1--the...

  15. Skin Cancer Education Materials: Selected Annotations.

    Science.gov (United States)

    National Cancer Inst. (NIH), Bethesda, MD.

    This annotated bibliography presents 85 entries on a variety of approaches to cancer education. The entries are grouped under three broad headings, two of which contain smaller sub-divisions. The first heading, Public Education, contains prevention and general information, and non-print materials. The second heading, Professional Education,…

  16. Ten steps to get started in Genome Assembly and Annotation [version 1; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Victoria Dominguez Del Angel

    2018-02-01

    Full Text Available As a part of the ELIXIR-EXCELERATE efforts in capacity building, we present here 10 steps to facilitate researchers getting started in genome assembly and genome annotation. The guidelines given are broadly applicable, intended to be stable over time, and cover all aspects from start to finish of a general assembly and annotation project. Intrinsic properties of genomes are discussed, as is the importance of using high quality DNA. Different sequencing technologies and generally applicable workflows for genome assembly are also detailed. We cover structural and functional annotation and encourage readers to also annotate transposable elements, something that is often omitted from annotation workflows. The importance of data management is stressed, and we give advice on where to submit data and how to make your results Findable, Accessible, Interoperable, and Reusable (FAIR.

  17. Three-dimensional renal CT angiography for guiding segmental renal artery clamping during laparoscopic partial nephrectomy

    International Nuclear Information System (INIS)

    Xu, Yi; Shao, Pengfei; Zhu, Xiaomei; Lv, Qiang; Liu, Wangyan; Xu, Hai; Zhu, Yinsu; Yang, Guangyu; Tang, Lijun; Yin, Changjun

    2013-01-01

    Aim: To evaluate the effectiveness of three-dimensional (3D) renal computed tomography angiography (CTA) in guiding segmental renal artery clamping during laparoscopic partial nephrectomy (LPN). Materials and methods: Forty-three patients with renal tumours undergoing renal CTA before LPN were retrospectively enrolled in this study. 3D arteriogram reconstructed images were created to identify the renal tumour-supplying arteries. The number and location of these targeted vessels were annotated on 3D images preoperatively and compared with the clamped vessels during LPN. The consistency between target vessels annotated at CTA and clamped arteries at LPN was compared both using a patient-based analysis and vessel-based analysis. The χ 2 test was applied to analyse the influence of tumour size, location, and growth pattern on the number of clamped segmental renal branches. Results: On patient-based analysis, the number of targeted vessels was consistent with the clamped vessels during LPN in 33 of 43 patients. On vessel-based analysis, 56 of 65 target vessels annotated at CTA were clamped during LPN. More segmental renal branches (p = 0.04) were clamped in patients with tumours of larger size. Tumour location and growth pattern had no association with the number of clamped segmental branches during LPN. Conclusion: High-quality CTA images and 3D reconstruction images can detect detailed information of tumour-supplying arteries to renal tumours. 3D renal CTA is an effective way to guide segmental renal artery clamping during LPN

  18. Small molecule annotation for the Protein Data Bank.

    Science.gov (United States)

    Sen, Sanchayita; Young, Jasmine; Berrisford, John M; Chen, Minyu; Conroy, Matthew J; Dutta, Shuchismita; Di Costanzo, Luigi; Gao, Guanghua; Ghosh, Sutapa; Hudson, Brian P; Igarashi, Reiko; Kengaku, Yumiko; Liang, Yuhe; Peisach, Ezra; Persikova, Irina; Mukhopadhyay, Abhik; Narayanan, Buvaneswari Coimbatore; Sahni, Gaurav; Sato, Junko; Sekharan, Monica; Shao, Chenghua; Tan, Lihua; Zhuravleva, Marina A

    2014-01-01

    The Protein Data Bank (PDB) is the single global repository for three-dimensional structures of biological macromolecules and their complexes, and its more than 100,000 structures contain more than 20,000 distinct ligands or small molecules bound to proteins and nucleic acids. Information about these small molecules and their interactions with proteins and nucleic acids is crucial for our understanding of biochemical processes and vital for structure-based drug design. Small molecules present in a deposited structure may be attached to a polymer or may occur as a separate, non-covalently linked ligand. During curation of a newly deposited structure by wwPDB annotation staff, each molecule is cross-referenced to the PDB Chemical Component Dictionary (CCD). If the molecule is new to the PDB, a dictionary description is created for it. The information about all small molecule components found in the PDB is distributed via the ftp archive as an external reference file. Small molecule annotation in the PDB also includes information about ligand-binding sites and about covalent and other linkages between ligands and macromolecules. During the remediation of the peptide-like antibiotics and inhibitors present in the PDB archive in 2011, it became clear that additional annotation was required for consistent representation of these molecules, which are quite often composed of several sequential subcomponents including modified amino acids and other chemical groups. The connectivity information of the modified amino acids is necessary for correct representation of these biologically interesting molecules. The combined information is made available via a new resource called the Biologically Interesting molecules Reference Dictionary, which is complementary to the CCD and is now routinely used for annotation of peptide-like antibiotics and inhibitors. © The Author(s) 2014. Published by Oxford University Press.

  19. Nuclear criticality experiments from 1943 to 1978: an annotated bibliography. Volume 1. Main listing

    Energy Technology Data Exchange (ETDEWEB)

    Koponen, B.L.; Wilcox, T.P.; Hampel, V.E.

    1979-04-24

    The bibliography contains 1067 citations from the literature of critical and near-critical nuclear experiments. It provides an up-to-date index to reports containing useful data for many types of criticality studies. Most of the reports can provide specifications for relatively simple critical configurations necessary for validating nuclear constants and calculational techniques. The reports of more than 1143 experimentors at 38 international facilities since 1943 are cross-referenced. The collection contains the prototypes of many different designs of nuclear reactors and studies performed to insure the safe use of fissile materials in chemical processing plants, storage facilities, and transportation containers. The bibliography has three volumes. Volume 1 contains the main listing of citations with abstracts. Volume 2 is a set of indexes organized by report number, publication date, experimental facility, and author name. Volume 3 provides a subject index, concorded on the significant keyphrases derived from titles, and an index of keyterms derived from titles, and an index of keyterms extracted from titles and abstracts. The bibliography was printed by computer as a selection from a computerized system at Lawrence Livermore Laboratory contaning information and data on criticality experiments.

  20. Nuclear criticality experiments from 1943 to 1978. An annotated bibliography: Volume 1, main listing

    International Nuclear Information System (INIS)

    Koponen, B.L.; Wilcox, T.P.; Hampel, V.E.

    1979-05-01

    This report only describes the bibliography which contains 1067 citations from the literature of critical and near-critical nuclear experiments. The bibliography provides an up-to-date index to reports containing useful data for many types of criticality studies. Most of the reports can provide specifications for relatively simple critical configurations necessary for validating nuclear constants and calculational techniques. The reports of more than 1143 experimentors at 38 international facilities since 1943 are cross-referenced. This collection contains the prototypes of many different designs of nuclear reactors and studies performed to ensure the safe use of fissile materials in chemical processing plants, storage facilities, and transportation containers. The bibliography has three volumes. Volume 1 contains the main listing of citations with abstracts. Volume 2 is a set of indexes organized by report number, publication date, experimental facility, and author name. Volume 3 provides a subject index, concorded on the significant keyphrases derived from titles, and an index of key terms extracted from titles and abstracts. The bibliography was printed by computer as a selection from a computerized system at Lawrence Livermore Laboratory containing information and data on criticality experiments

  1. Functional Annotation of Ion Channel Structures by Molecular Simulation.

    Science.gov (United States)

    Trick, Jemma L; Chelvaniththilan, Sivapalan; Klesse, Gianni; Aryal, Prafulla; Wallace, E Jayne; Tucker, Stephen J; Sansom, Mark S P

    2016-12-06

    Ion channels play key roles in cell membranes, and recent advances are yielding an increasing number of structures. However, their functional relevance is often unclear and better tools are required for their functional annotation. In sub-nanometer pores such as ion channels, hydrophobic gating has been shown to promote dewetting to produce a functionally closed (i.e., non-conductive) state. Using the serotonin receptor (5-HT 3 R) structure as an example, we demonstrate the use of molecular dynamics to aid the functional annotation of channel structures via simulation of the behavior of water within the pore. Three increasingly complex simulation analyses are described: water equilibrium densities; single-ion free-energy profiles; and computational electrophysiology. All three approaches correctly predict the 5-HT 3 R crystal structure to represent a functionally closed (i.e., non-conductive) state. We also illustrate the application of water equilibrium density simulations to annotate different conformational states of a glycine receptor. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  2. iPad: Semantic annotation and markup of radiological images.

    Science.gov (United States)

    Rubin, Daniel L; Rodriguez, Cesar; Shah, Priyanka; Beaulieu, Chris

    2008-11-06

    Radiological images contain a wealth of information,such as anatomy and pathology, which is often not explicit and computationally accessible. Information schemes are being developed to describe the semantic content of images, but such schemes can be unwieldy to operationalize because there are few tools to enable users to capture structured information easily as part of the routine research workflow. We have created iPad, an open source tool enabling researchers and clinicians to create semantic annotations on radiological images. iPad hides the complexity of the underlying image annotation information model from users, permitting them to describe images and image regions using a graphical interface that maps their descriptions to structured ontologies semi-automatically. Image annotations are saved in a variety of formats,enabling interoperability among medical records systems, image archives in hospitals, and the Semantic Web. Tools such as iPad can help reduce the burden of collecting structured information from images, and it could ultimately enable researchers and physicians to exploit images on a very large scale and glean the biological and physiological significance of image content.

  3. A Machine Learning Based Analytical Framework for Semantic Annotation Requirements

    OpenAIRE

    Hamed Hassanzadeh; MohammadReza Keyvanpour

    2011-01-01

    The Semantic Web is an extension of the current web in which information is given well-defined meaning. The perspective of Semantic Web is to promote the quality and intelligence of the current web by changing its contents into machine understandable form. Therefore, semantic level information is one of the cornerstones of the Semantic Web. The process of adding semantic metadata to web resources is called Semantic Annotation. There are many obstacles against the Semantic Annotation, such as ...

  4. The influence of annotation in graphical organizers

    NARCIS (Netherlands)

    Bezdan, Eniko; Kester, Liesbeth; Kirschner, Paul A.

    2013-01-01

    Bezdan, E., Kester, L., & Kirschner, P. A. (2012, 29-31 August). The influence of annotation in graphical organizers. Poster presented at the biannual meeting of the EARLI Special Interest Group Comprehension of Text and Graphics, Grenoble, France.

  5. Annotation of mammalian primary microRNAs

    Directory of Open Access Journals (Sweden)

    Enright Anton J

    2008-11-01

    Full Text Available Abstract Background MicroRNAs (miRNAs are important regulators of gene expression and have been implicated in development, differentiation and pathogenesis. Hundreds of miRNAs have been discovered in mammalian genomes. Approximately 50% of mammalian miRNAs are expressed from introns of protein-coding genes; the primary transcript (pri-miRNA is therefore assumed to be the host transcript. However, very little is known about the structure of pri-miRNAs expressed from intergenic regions. Here we annotate transcript boundaries of miRNAs in human, mouse and rat genomes using various transcription features. The 5' end of the pri-miRNA is predicted from transcription start sites, CpG islands and 5' CAGE tags mapped in the upstream flanking region surrounding the precursor miRNA (pre-miRNA. The 3' end of the pri-miRNA is predicted based on the mapping of polyA signals, and supported by cDNA/EST and ditags data. The predicted pri-miRNAs are also analyzed for promoter and insulator-associated regulatory regions. Results We define sets of conserved and non-conserved human, mouse and rat pre-miRNAs using bidirectional BLAST and synteny analysis. Transcription features in their flanking regions are used to demarcate the 5' and 3' boundaries of the pri-miRNAs. The lengths and boundaries of primary transcripts are highly conserved between orthologous miRNAs. A significant fraction of pri-miRNAs have lengths between 1 and 10 kb, with very few introns. We annotate a total of 59 pri-miRNA structures, which include 82 pre-miRNAs. 36 pri-miRNAs are conserved in all 3 species. In total, 18 of the confidently annotated transcripts express more than one pre-miRNA. The upstream regions of 54% of the predicted pri-miRNAs are found to be associated with promoter and insulator regulatory sequences. Conclusion Little is known about the primary transcripts of intergenic miRNAs. Using comparative data, we are able to identify the boundaries of a significant proportion of

  6. SAS- Semantic Annotation Service for Geoscience resources on the web

    Science.gov (United States)

    Elag, M.; Kumar, P.; Marini, L.; Li, R.; Jiang, P.

    2015-12-01

    There is a growing need for increased integration across the data and model resources that are disseminated on the web to advance their reuse across different earth science applications. Meaningful reuse of resources requires semantic metadata to realize the semantic web vision for allowing pragmatic linkage and integration among resources. Semantic metadata associates standard metadata with resources to turn them into semantically-enabled resources on the web. However, the lack of a common standardized metadata framework as well as the uncoordinated use of metadata fields across different geo-information systems, has led to a situation in which standards and related Standard Names abound. To address this need, we have designed SAS to provide a bridge between the core ontologies required to annotate resources and information systems in order to enable queries and analysis over annotation from a single environment (web). SAS is one of the services that are provided by the Geosematnic framework, which is a decentralized semantic framework to support the integration between models and data and allow semantically heterogeneous to interact with minimum human intervention. Here we present the design of SAS and demonstrate its application for annotating data and models. First we describe how predicates and their attributes are extracted from standards and ingested in the knowledge-base of the Geosemantic framework. Then we illustrate the application of SAS in annotating data managed by SEAD and annotating simulation models that have web interface. SAS is a step in a broader approach to raise the quality of geoscience data and models that are published on the web and allow users to better search, access, and use of the existing resources based on standard vocabularies that are encoded and published using semantic technologies.

  7. Argumentation Theory. [A Selected Annotated Bibliography].

    Science.gov (United States)

    Benoit, William L.

    Materials dealing with aspects of argumentation theory are cited in this annotated bibliography. The 50 citations are organized by topic as follows: (1) argumentation; (2) the nature of argument; (3) traditional perspectives on argument; (4) argument diagrams; (5) Chaim Perelman's theory of rhetoric; (6) the evaluation of argument; (7) argument…

  8. Draft Genome Sequence and Gene Annotation of the Entomopathogenic Fungus Verticillium hemipterigenum

    OpenAIRE

    Horn, Fabian; Habel, Andreas; Scharf, Daniel H.; Dworschak, Jan; Brakhage, Axel A.; Guthke, Reinhard; Hertweck, Christian; Linde, J?rg

    2015-01-01

    Verticillium hemipterigenum (anamorph Torrubiella hemipterigena) is an entomopathogenic fungus and produces a broad range of secondary metabolites. Here, we present the draft genome sequence of the fungus, including gene structure and functional annotation. Genes were predicted incorporating RNA-Seq data and functionally annotated to provide the basis for further genome studies.

  9. Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus.

    Science.gov (United States)

    Savkov, Aleksandar; Carroll, John; Koeling, Rob; Cassell, Jackie

    The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning.

  10. CurlySMILES: a chemical language to customize and annotate encodings of molecular and nanodevice structures

    Directory of Open Access Journals (Sweden)

    Drefahl Axel

    2011-01-01

    Full Text Available Abstract CurlySMILES is a chemical line notation which extends SMILES with annotations for storage, retrieval and modeling of interlinked, coordinated, assembled and adsorbed molecules in supramolecular structures and nanodevices. Annotations are enclosed in curly braces and anchored to an atomic node or at the end of the molecular graph depending on the annotation type. CurlySMILES includes predefined annotations for stereogenicity, electron delocalization charges, extra-molecular interactions and connectivity, surface attachment, solutions, and crystal structures and allows extensions for domain-specific annotations. CurlySMILES provides a shorthand format to encode molecules with repetitive substructural parts or motifs such as monomer units in macromolecules and amino acids in peptide chains. CurlySMILES further accommodates special formats for non-molecular materials that are commonly denoted by composition of atoms or substructures rather than complete atom connectivity.

  11. A Flexible Object-of-Interest Annotation Framework for Online Video Portals

    Directory of Open Access Journals (Sweden)

    Robert Sorschag

    2012-02-01

    Full Text Available In this work, we address the use of object recognition techniques to annotate what is shown where in online video collections. These annotations are suitable to retrieve specific video scenes for object related text queries which is not possible with the manually generated metadata that is used by current portals. We are not the first to present object annotations that are generated with content-based analysis methods. However, the proposed framework possesses some outstanding features that offer good prospects for its application in real video portals. Firstly, it can be easily used as background module in any video environment. Secondly, it is not based on a fixed analysis chain but on an extensive recognition infrastructure that can be used with all kinds of visual features, matching and machine learning techniques. New recognition approaches can be integrated into this infrastructure with low development costs and a configuration of the used recognition approaches can be performed even on a running system. Thus, this framework might also benefit from future advances in computer vision. Thirdly, we present an automatic selection approach to support the use of different recognition strategies for different objects. Last but not least, visual analysis can be performed efficiently on distributed, multi-processor environments and a database schema is presented to store the resulting video annotations as well as the off-line generated low-level features in a compact form. We achieve promising results in an annotation case study and the instance search task of the TRECVID 2011 challenge.

  12. BLAST-based structural annotation of protein residues using Protein Data Bank.

    Science.gov (United States)

    Singh, Harinder; Raghava, Gajendra P S

    2016-01-25

    In the era of next-generation sequencing where thousands of genomes have been already sequenced; size of protein databases is growing with exponential rate. Structural annotation of these proteins is one of the biggest challenges for the computational biologist. Although, it is easy to perform BLAST search against Protein Data Bank (PDB) but it is difficult for a biologist to annotate protein residues from BLAST search. A web-server StarPDB has been developed for structural annotation of a protein based on its similarity with known protein structures. It uses standard BLAST software for performing similarity search of a query protein against protein structures in PDB. This server integrates wide range modules for assigning different types of annotation that includes, Secondary-structure, Accessible surface area, Tight-turns, DNA-RNA and Ligand modules. Secondary structure module allows users to predict regular secondary structure states to each residue in a protein. Accessible surface area predict the exposed or buried residues in a protein. Tight-turns module is designed to predict tight turns like beta-turns in a protein. DNA-RNA module developed for predicting DNA and RNA interacting residues in a protein. Similarly, Ligand module of server allows one to predicted ligands, metal and nucleotides ligand interacting residues in a protein. In summary, this manuscript presents a web server for comprehensive annotation of a protein based on similarity search. It integrates number of visualization tools that facilitate users to understand structure and function of protein residues. This web server is available freely for scientific community from URL http://crdd.osdd.net/raghava/starpdb .

  13. DFAST and DAGA: web-based integrated genome annotation tools and resources.

    Science.gov (United States)

    Tanizawa, Yasuhiro; Fujisawa, Takatomo; Kaminuma, Eli; Nakamura, Yasukazu; Arita, Masanori

    2016-01-01

    Quality assurance and correct taxonomic affiliation of data submitted to public sequence databases have been an everlasting problem. The DDBJ Fast Annotation and Submission Tool (DFAST) is a newly developed genome annotation pipeline with quality and taxonomy assessment tools. To enable annotation of ready-to-submit quality, we also constructed curated reference protein databases tailored for lactic acid bacteria. DFAST was developed so that all the procedures required for DDBJ submission could be done seamlessly online. The online workspace would be especially useful for users not familiar with bioinformatics skills. In addition, we have developed a genome repository, DFAST Archive of Genome Annotation (DAGA), which currently includes 1,421 genomes covering 179 species and 18 subspecies of two genera, Lactobacillus and Pediococcus , obtained from both DDBJ/ENA/GenBank and Sequence Read Archive (SRA). All the genomes deposited in DAGA were annotated consistently and assessed using DFAST. To assess the taxonomic position based on genomic sequence information, we used the average nucleotide identity (ANI), which showed high discriminative power to determine whether two given genomes belong to the same species. We corrected mislabeled or misidentified genomes in the public database and deposited the curated information in DAGA. The repository will improve the accessibility and reusability of genome resources for lactic acid bacteria. By exploiting the data deposited in DAGA, we found intraspecific subgroups in Lactobacillus gasseri and Lactobacillus jensenii , whose variation between subgroups is larger than the well-accepted ANI threshold of 95% to differentiate species. DFAST and DAGA are freely accessible at https://dfast.nig.ac.jp.

  14. Annotation of selection strengths in viral genomes

    DEFF Research Database (Denmark)

    McCauley, Stephen; de Groot, Saskia; Mailund, Thomas

    2007-01-01

    Motivation: Viral genomes tend to code in overlapping reading frames to maximize information content. This may result in atypical codon bias and particular evolutionary constraints. Due to the fast mutation rate of viruses, there is additional strong evidence for varying selection between intra......- and intergenomic regions. The presence of multiple coding regions complicates the concept of Ka/Ks ratio, and thus begs for an alternative approach when investigating selection strengths. Building on the paper by McCauley & Hein (2006), we develop a method for annotating a viral genome coding in overlapping...... may thus achieve an annotation both of coding regions as well as selection strengths, allowing us to investigate different selection patterns and hypotheses. Results: We illustrate our method by applying it to a multiple alignment of four HIV2 sequences, as well as four Hepatitis B sequences. We...

  15. Annotating functional RNAs in genomes using Infernal.

    Science.gov (United States)

    Nawrocki, Eric P

    2014-01-01

    Many different types of functional non-coding RNAs participate in a wide range of important cellular functions but the large majority of these RNAs are not routinely annotated in published genomes. Several programs have been developed for identifying RNAs, including specific tools tailored to a particular RNA family as well as more general ones designed to work for any family. Many of these tools utilize covariance models (CMs), statistical models of the conserved sequence, and structure of an RNA family. In this chapter, as an illustrative example, the Infernal software package and CMs from the Rfam database are used to identify RNAs in the genome of the archaeon Methanobrevibacter ruminantium, uncovering some additional RNAs not present in the genome's initial annotation. Analysis of the results and comparison with family-specific methods demonstrate some important strengths and weaknesses of this general approach.

  16. Discrepancy between severity of lung impairment and seniority on the lung transplantation list.

    Science.gov (United States)

    Travaline, J M; Cordova, F C; Furukawa, S; Criner, G J

    2004-12-01

    Organ allocation for lung transplantation, based mainly on accrued time on a waiting list, may not be an equitable system of organ allocation. To provide an objective view of the current practice concerning lung allocation, and timing for transplantation, we examined illness severity and list seniority in patients on a lung transplantation waiting list. Adult patients awaiting lung transplantation underwent testing for mean pulmonary artery pressure (mPpa), maximum oxygen consumption (VO2 max), 6-minute walk distance (6MWD), forced expiratory volume in 1 second, mean partial pressure of carbon dioxide, partial pressure of oxygen/fractional concentration of inspired oxygen, and diffusing capacity of the lung for carbon monoxide. Relationships between physiological variables and waiting list rankings were then determined. Thirty-four patients were tested and there was no correlation between time spent waiting on the list and mPpa (r=0.01; P=.94), VO2 max percentage predicted (r=0.07; P=.71), or 6MWD (r=0.15; P=.42). Many patients with functional impairments as indicated by low maximum VO2 or by short 6MWD are scheduled to receive their transplant after patients with levels that indicate a lower degree of risk. When compared with a hypothetical reranking based on mean Ppa, 24 of the 34 patients (71%) on our current waiting list were found to be 5 positions higher or lower than this new risk-based ranking. Sixteen patients (47%) were 10 or more positions away from their hypothetical severity-based ranking, and 9 (26%) were at least 15 positions out of place. Sixteen of the 34 patients were ranked lower than they would be based on a severity of illness using the pulmonary artery pressure alone, 17 were ranked higher than "should be" based on pulmonary artery mean, and only 1 patient (ranked in position 15) was appropriately positioned based on seniority and severity of disease based on PA mean. Rank order for lung transplantation has no relationship with illness

  17. Annotated Bibliographies.

    Science.gov (United States)

    Totten, Sam; Alexander, Susan

    1985-01-01

    Intended for elementary, secondary, and college teachers, this listing cites print materials dealing with nuclear warfare. Included are nonfiction, fiction, journals, newsletters, curriculum materials, and organizations. (RM)

  18. Linking human diseases to animal models using ontology-based phenotype annotation.

    Directory of Open Access Journals (Sweden)

    Nicole L Washington

    2009-11-01

    Full Text Available Scientists and clinicians who study genetic alterations and disease have traditionally described phenotypes in natural language. The considerable variation in these free-text descriptions has posed a hindrance to the important task of identifying candidate genes and models for human diseases and indicates the need for a computationally tractable method to mine data resources for mutant phenotypes. In this study, we tested the hypothesis that ontological annotation of disease phenotypes will facilitate the discovery of new genotype-phenotype relationships within and across species. To describe phenotypes using ontologies, we used an Entity-Quality (EQ methodology, wherein the affected entity (E and how it is affected (Q are recorded using terms from a variety of ontologies. Using this EQ method, we annotated the phenotypes of 11 gene-linked human diseases described in Online Mendelian Inheritance in Man (OMIM. These human annotations were loaded into our Ontology-Based Database (OBD along with other ontology-based phenotype descriptions of mutants from various model organism databases. Phenotypes recorded with this EQ method can be computationally compared based on the hierarchy of terms in the ontologies and the frequency of annotation. We utilized four similarity metrics to compare phenotypes and developed an ontology of homologous and analogous anatomical structures to compare phenotypes between species. Using these tools, we demonstrate that we can identify, through the similarity of the recorded phenotypes, other alleles of the same gene, other members of a signaling pathway, and orthologous genes and pathway members across species. We conclude that EQ-based annotation of phenotypes, in conjunction with a cross-species ontology, and a variety of similarity metrics can identify biologically meaningful similarities between genes by comparing phenotypes alone. This annotation and search method provides a novel and efficient means to identify

  19. An annotated check list of the land mammal fauna of the West Coast National Park

    Directory of Open Access Journals (Sweden)

    D.M. Avery

    1990-10-01

    Full Text Available Some 4 000 Barn Owl pellets with small mammal remains have been collected over a period of nine years from two locations at the south end of the Langebaan lagoon. Two small samples of bones from archaeological sites on the Churchhaven peninsula provide evidence for past mammal occurrences. The remains of small mammals from the owl pellet collections provide an initial list of 18 species that occur within theWest Coast National Park. Subsequent conventional censusing by means of trapping and observational techniques to assess the small and large mammal species diversity of the area were conducted during 1989. This study documents the definite occurrence of 63 mammal species in the park, seven of which are exotics. The presence of a further five species requires confirmation. Interesting insight is gained into how direct censusing and owl pellet analyses augment each other in establishing the presence of small mammal taxa of an area.

  20. The Community Junior College: An Annotated Bibliography.

    Science.gov (United States)

    Rarig, Emory W., Jr., Ed.

    This annotated bibliography on the junior college is arranged by topic: research tools, history, functions and purposes, organization and administration, students, programs, personnel, facilities, and research. It covers publications through the fall of 1965 and has an author index. (HH)

  1. A Resource of Quantitative Functional Annotation for Homo sapiens Genes.

    Science.gov (United States)

    Taşan, Murat; Drabkin, Harold J; Beaver, John E; Chua, Hon Nian; Dunham, Julie; Tian, Weidong; Blake, Judith A; Roth, Frederick P

    2012-02-01

    The body of human genomic and proteomic evidence continues to grow at ever-increasing rates, while annotation efforts struggle to keep pace. A surprisingly small fraction of human genes have clear, documented associations with specific functions, and new functions continue to be found for characterized genes. Here we assembled an integrated collection of diverse genomic and proteomic data for 21,341 human genes and make quantitative associations of each to 4333 Gene Ontology terms. We combined guilt-by-profiling and guilt-by-association approaches to exploit features unique to the data types. Performance was evaluated by cross-validation, prospective validation, and by manual evaluation with the biological literature. Functional-linkage networks were also constructed, and their utility was demonstrated by identifying candidate genes related to a glioma FLN using a seed network from genome-wide association studies. Our annotations are presented-alongside existing validated annotations-in a publicly accessible and searchable web interface.

  2. Annotating and Interpreting Linear and Cyclic Peptide Tandem Mass Spectra.

    Science.gov (United States)

    Niedermeyer, Timo Horst Johannes

    2016-01-01

    Nonribosomal peptides often possess pronounced bioactivity, and thus, they are often interesting hit compounds in natural product-based drug discovery programs. Their mass spectrometric characterization is difficult due to the predominant occurrence of non-proteinogenic monomers and, especially in the case of cyclic peptides, the complex fragmentation patterns observed. This makes nonribosomal peptide tandem mass spectra annotation challenging and time-consuming. To meet this challenge, software tools for this task have been developed. In this chapter, the workflow for using the software mMass for the annotation of experimentally obtained peptide tandem mass spectra is described. mMass is freely available (http://www.mmass.org), open-source, and the most advanced and user-friendly software tool for this purpose. The software enables the analyst to concisely annotate and interpret tandem mass spectra of linear and cyclic peptides. Thus, it is highly useful for accelerating the structure confirmation and elucidation of cyclic as well as linear peptides and depsipeptides.

  3. Metabolite signal identification in accurate mass metabolomics data with MZedDB, an interactive m/z annotation tool utilising predicted ionisation behaviour 'rules'

    Directory of Open Access Journals (Sweden)

    Snowdon Stuart

    2009-07-01

    Full Text Available Abstract Background Metabolomics experiments using Mass Spectrometry (MS technology measure the mass to charge ratio (m/z and intensity of ionised molecules in crude extracts of complex biological samples to generate high dimensional metabolite 'fingerprint' or metabolite 'profile' data. High resolution MS instruments perform routinely with a mass accuracy of Results Metabolite 'structures' harvested from publicly accessible databases were converted into a common format to generate a comprehensive archive in MZedDB. 'Rules' were derived from chemical information that allowed MZedDB to generate a list of adducts and neutral loss fragments putatively able to form for each structure and calculate, on the fly, the exact molecular weight of every potential ionisation product to provide targets for annotation searches based on accurate mass. We demonstrate that data matrices representing populations of ionisation products generated from different biological matrices contain a large proportion (sometimes > 50% of molecular isotopes, salt adducts and neutral loss fragments. Correlation analysis of ESI-MS data features confirmed the predicted relationships of m/z signals. An integrated isotope enumerator in MZedDB allowed verification of exact isotopic pattern distributions to corroborate experimental data. Conclusion We conclude that although ultra-high accurate mass instruments provide major insight into the chemical diversity of biological extracts, the facile annotation of a large proportion of signals is not possible by simple, automated query of current databases using computed molecular formulae. Parameterising MZedDB to take into account predicted ionisation behaviour and the biological source of any sample improves greatly both the frequency and accuracy of potential annotation 'hits' in ESI-MS data.

  4. Great Basin Experimental Range: Annotated bibliography

    Science.gov (United States)

    E. Durant McArthur; Bryce A. Richardson; Stanley G. Kitchen

    2013-01-01

    This annotated bibliography documents the research that has been conducted on the Great Basin Experimental Range (GBER, also known as the Utah Experiment Station, Great Basin Station, the Great Basin Branch Experiment Station, Great Basin Experimental Center, and other similar name variants) over the 102 years of its existence. Entries were drawn from the original...

  5. FIGENIX: Intelligent automation of genomic annotation: expertise integration in a new software platform

    Directory of Open Access Journals (Sweden)

    Pontarotti Pierre

    2005-08-01

    Full Text Available Abstract Background Two of the main objectives of the genomic and post-genomic era are to structurally and functionally annotate genomes which consists of detecting genes' position and structure, and inferring their function (as well as of other features of genomes. Structural and functional annotation both require the complex chaining of numerous different software, algorithms and methods under the supervision of a biologist. The automation of these pipelines is necessary to manage huge amounts of data released by sequencing projects. Several pipelines already automate some of these complex chaining but still necessitate an important contribution of biologists for supervising and controlling the results at various steps. Results Here we propose an innovative automated platform, FIGENIX, which includes an expert system capable to substitute to human expertise at several key steps. FIGENIX currently automates complex pipelines of structural and functional annotation under the supervision of the expert system (which allows for example to make key decisions, check intermediate results or refine the dataset. The quality of the results produced by FIGENIX is comparable to those obtained by expert biologists with a drastic gain in terms of time costs and avoidance of errors due to the human manipulation of data. Conclusion The core engine and expert system of the FIGENIX platform currently handle complex annotation processes of broad interest for the genomic community. They could be easily adapted to new, or more specialized pipelines, such as for example the annotation of miRNAs, the classification of complex multigenic families, annotation of regulatory elements and other genomic features of interest.

  6. SplicingTypesAnno: annotating and quantifying alternative splicing events for RNA-Seq data.

    Science.gov (United States)

    Sun, Xiaoyong; Zuo, Fenghua; Ru, Yuanbin; Guo, Jiqiang; Yan, Xiaoyan; Sablok, Gaurav

    2015-04-01

    Alternative splicing plays a key role in the regulation of the central dogma. Four major types of alternative splicing have been classified as intron retention, exon skipping, alternative 5 splice sites or alternative donor sites, and alternative 3 splice sites or alternative acceptor sites. A few algorithms have been developed to detect splice junctions from RNA-Seq reads. However, there are few tools targeting at the major alternative splicing types at the exon/intron level. This type of analysis may reveal subtle, yet important events of alternative splicing, and thus help gain deeper understanding of the mechanism of alternative splicing. This paper describes a user-friendly R package, extracting, annotating and analyzing alternative splicing types for sequence alignment files from RNA-Seq. SplicingTypesAnno can: (1) provide annotation for major alternative splicing at exon/intron level. By comparing the annotation from GTF/GFF file, it identifies the novel alternative splicing sites; (2) offer a convenient two-level analysis: genome-scale annotation for users with high performance computing environment, and gene-scale annotation for users with personal computers; (3) generate a user-friendly web report and additional BED files for IGV visualization. SplicingTypesAnno is a user-friendly R package for extracting, annotating and analyzing alternative splicing types at exon/intron level for sequence alignment files from RNA-Seq. It is publically available at https://sourceforge.net/projects/splicingtypes/files/ or http://genome.sdau.edu.cn/research/software/SplicingTypesAnno.html. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  7. Annotation of Regular Polysemy

    DEFF Research Database (Denmark)

    Martinez Alonso, Hector

    Regular polysemy has received a lot of attention from the theory of lexical semantics and from computational linguistics. However, there is no consensus on how to represent the sense of underspecified examples at the token level, namely when annotating or disambiguating senses of metonymic words...... and metonymic. We have conducted an analysis in English, Danish and Spanish. Later on, we have tried to replicate the human judgments by means of unsupervised and semi-supervised sense prediction. The automatic sense-prediction systems have been unable to find empiric evidence for the underspecified sense, even...

  8. An Annotated Bibliography of Education for Medical Librarianship, 1940-1968

    Science.gov (United States)

    Shirley, Sherrilynne

    1969-01-01

    An attempt has been made in this bibliography to represent the various viewpoints concerning education for medical librarianship equally. The topics covered include: general background reading and readings for those interested in establishing courses in medical librarianship. The former includes annotations on the history and international aspects of the subject. The latter consists of annotations of articles on early courses and present courses in medical librarianship. A final area discussed is the Medical Library Association's Code for the Training and Certification of Medical Librarians. PMID:4898629

  9. Prototype semantic infrastructure for automated small molecule classification and annotation in lipidomics.

    Science.gov (United States)

    Chepelev, Leonid L; Riazanov, Alexandre; Kouznetsov, Alexandre; Low, Hong Sang; Dumontier, Michel; Baker, Christopher J O

    2011-07-26

    The development of high-throughput experimentation has led to astronomical growth in biologically relevant lipids and lipid derivatives identified, screened, and deposited in numerous online databases. Unfortunately, efforts to annotate, classify, and analyze these chemical entities have largely remained in the hands of human curators using manual or semi-automated protocols, leaving many novel entities unclassified. Since chemical function is often closely linked to structure, accurate structure-based classification and annotation of chemical entities is imperative to understanding their functionality. As part of an exploratory study, we have investigated the utility of semantic web technologies in automated chemical classification and annotation of lipids. Our prototype framework consists of two components: an ontology and a set of federated web services that operate upon it. The formal lipid ontology we use here extends a part of the LiPrO ontology and draws on the lipid hierarchy in the LIPID MAPS database, as well as literature-derived knowledge. The federated semantic web services that operate upon this ontology are deployed within the Semantic Annotation, Discovery, and Integration (SADI) framework. Structure-based lipid classification is enacted by two core services. Firstly, a structural annotation service detects and enumerates relevant functional groups for a specified chemical structure. A second service reasons over lipid ontology class descriptions using the attributes obtained from the annotation service and identifies the appropriate lipid classification. We extend the utility of these core services by combining them with additional SADI services that retrieve associations between lipids and proteins and identify publications related to specified lipid types. We analyze the performance of SADI-enabled eicosanoid classification relative to the LIPID MAPS classification and reflect on the contribution of our integrative methodology in the context of

  10. Prototype semantic infrastructure for automated small molecule classification and annotation in lipidomics

    Directory of Open Access Journals (Sweden)

    Dumontier Michel

    2011-07-01

    Full Text Available Abstract Background The development of high-throughput experimentation has led to astronomical growth in biologically relevant lipids and lipid derivatives identified, screened, and deposited in numerous online databases. Unfortunately, efforts to annotate, classify, and analyze these chemical entities have largely remained in the hands of human curators using manual or semi-automated protocols, leaving many novel entities unclassified. Since chemical function is often closely linked to structure, accurate structure-based classification and annotation of chemical entities is imperative to understanding their functionality. Results As part of an exploratory study, we have investigated the utility of semantic web technologies in automated chemical classification and annotation of lipids. Our prototype framework consists of two components: an ontology and a set of federated web services that operate upon it. The formal lipid ontology we use here extends a part of the LiPrO ontology and draws on the lipid hierarchy in the LIPID MAPS database, as well as literature-derived knowledge. The federated semantic web services that operate upon this ontology are deployed within the Semantic Annotation, Discovery, and Integration (SADI framework. Structure-based lipid classification is enacted by two core services. Firstly, a structural annotation service detects and enumerates relevant functional groups for a specified chemical structure. A second service reasons over lipid ontology class descriptions using the attributes obtained from the annotation service and identifies the appropriate lipid classification. We extend the utility of these core services by combining them with additional SADI services that retrieve associations between lipids and proteins and identify publications related to specified lipid types. We analyze the performance of SADI-enabled eicosanoid classification relative to the LIPID MAPS classification and reflect on the contribution of

  11. MUTAGEN: Multi-user tool for annotating GENomes

    DEFF Research Database (Denmark)

    Brugger, K.; Redder, P.; Skovgaard, Marie

    2003-01-01

    MUTAGEN is a free prokaryotic annotation system. It offers the advantages of genome comparison, graphical sequence browsers, search facilities and open-source for user-specific adjustments. The web-interface allows several users to access the system from standard desktop computers. The Sulfolobus...

  12. Olomouc Corpus of Spoken Czech: characterization and main features of the project

    Directory of Open Access Journals (Sweden)

    Pořízka, Petr

    2009-01-01

    Full Text Available This study presents the results of the author's research project called Olomouc Corpus of Spoken Czech (OCSC. The paper is focused on the state and partial phases of constructing the corpora, its methodology and annotation. Within the OCSC we use so called dual system of transcription, which means (1 an orthographic one with the purpose of linguistic (morpho-logical analysis and tagging and (2 a phonetic version of transcript which consists of three layers of the text: first the real transcription and further various types of the metatexts as a second and third layer, including communication aspects of the texts. The criteria of selection of speakers are also listed here and the highly important statistical analysis of the sociolin-guistic categories (gender, age, type of education, types of recordings is presented as well. This analysis can serve as a base for a partial correction of possible non-balance among those sociolinguistic parameters. The annotation rules and principles are mentioned at the end of this study.

  13. CommWalker: correctly evaluating modules in molecular networks in light of annotation bias.

    Science.gov (United States)

    Luecken, M D; Page, M J T; Crosby, A J; Mason, S; Reinert, G; Deane, C M

    2018-03-15

    Detecting novel functional modules in molecular networks is an important step in biological research. In the absence of gold standard functional modules, functional annotations are often used to verify whether detected modules/communities have biological meaning. However, as we show, the uneven distribution of functional annotations means that such evaluation methods favor communities of well-studied proteins. We propose a novel framework for the evaluation of communities as functional modules. Our proposed framework, CommWalker, takes communities as inputs and evaluates them in their local network environment by performing short random walks. We test CommWalker's ability to overcome annotation bias using input communities from four community detection methods on two protein interaction networks. We find that modules accepted by CommWalker are similarly co-expressed as those accepted by current methods. Crucially, CommWalker performs well not only in well-annotated regions, but also in regions otherwise obscured by poor annotation. CommWalker community prioritization both faithfully captures well-validated communities and identifies functional modules that may correspond to more novel biology. The CommWalker algorithm is freely available at opig.stats.ox.ac.uk/resources or as a docker image on the Docker Hub at hub.docker.com/r/lueckenmd/commwalker/. deane@stats.ox.ac.uk. Supplementary data are available at Bioinformatics online.

  14. The AnnoLite and AnnoLyze programs for comparative annotation of protein structures

    Directory of Open Access Journals (Sweden)

    Dopazo Joaquín

    2007-05-01

    Full Text Available Abstract Background Advances in structural biology, including structural genomics, have resulted in a rapid increase in the number of experimentally determined protein structures. However, about half of the structures deposited by the structural genomics consortia have little or no information about their biological function. Therefore, there is a need for tools for automatically and comprehensively annotating the function of protein structures. We aim to provide such tools by applying comparative protein structure annotation that relies on detectable relationships between protein structures to transfer functional annotations. Here we introduce two programs, AnnoLite and AnnoLyze, which use the structural alignments deposited in the DBAli database. Description AnnoLite predicts the SCOP, CATH, EC, InterPro, PfamA, and GO terms with an average sensitivity of ~90% and average precision of ~80%. AnnoLyze predicts ligand binding site and domain interaction patches with an average sensitivity of ~70% and average precision of ~30%, correctly localizing binding sites for small molecules in ~95% of its predictions. Conclusion The AnnoLite and AnnoLyze programs for comparative annotation of protein structures can reliably and automatically annotate new protein structures. The programs are fully accessible via the Internet as part of the DBAli suite of tools at http://salilab.org/DBAli/.

  15. Retrieval-based Face Annotation by Weak Label Regularized Local Coordinate Coding.

    Science.gov (United States)

    Wang, Dayong; Hoi, Steven C H; He, Ying; Zhu, Jianke; Mei, Tao; Luo, Jiebo

    2013-08-02

    Retrieval-based face annotation is a promising paradigm of mining massive web facial images for automated face annotation. This paper addresses a critical problem of such paradigm, i.e., how to effectively perform annotation by exploiting the similar facial images and their weak labels which are often noisy and incomplete. In particular, we propose an effective Weak Label Regularized Local Coordinate Coding (WLRLCC) technique, which exploits the principle of local coordinate coding in learning sparse features, and employs the idea of graph-based weak label regularization to enhance the weak labels of the similar facial images. We present an efficient optimization algorithm to solve the WLRLCC task. We conduct extensive empirical studies on two large-scale web facial image databases: (i) a Western celebrity database with a total of $6,025$ persons and $714,454$ web facial images, and (ii)an Asian celebrity database with $1,200$ persons and $126,070$ web facial images. The encouraging results validate the efficacy of the proposed WLRLCC algorithm. To further improve the efficiency and scalability, we also propose a PCA-based approximation scheme and an offline approximation scheme (AWLRLCC), which generally maintains comparable results but significantly saves much time cost. Finally, we show that WLRLCC can also tackle two existing face annotation tasks with promising performance.

  16. Similarity maps and hierarchical clustering for annotating FT-IR spectral images.

    Science.gov (United States)

    Zhong, Qiaoyong; Yang, Chen; Großerüschkamp, Frederik; Kallenbach-Thieltges, Angela; Serocka, Peter; Gerwert, Klaus; Mosig, Axel

    2013-11-20

    Unsupervised segmentation of multi-spectral images plays an important role in annotating infrared microscopic images and is an essential step in label-free spectral histopathology. In this context, diverse clustering approaches have been utilized and evaluated in order to achieve segmentations of Fourier Transform Infrared (FT-IR) microscopic images that agree with histopathological characterization. We introduce so-called interactive similarity maps as an alternative annotation strategy for annotating infrared microscopic images. We demonstrate that segmentations obtained from interactive similarity maps lead to similarly accurate segmentations as segmentations obtained from conventionally used hierarchical clustering approaches. In order to perform this comparison on quantitative grounds, we provide a scheme that allows to identify non-horizontal cuts in dendrograms. This yields a validation scheme for hierarchical clustering approaches commonly used in infrared microscopy. We demonstrate that interactive similarity maps may identify more accurate segmentations than hierarchical clustering based approaches, and thus are a viable and due to their interactive nature attractive alternative to hierarchical clustering. Our validation scheme furthermore shows that performance of hierarchical two-means is comparable to the traditionally used Ward's clustering. As the former is much more efficient in time and memory, our results suggest another less resource demanding alternative for annotating large spectral images.

  17. Book Reviews, Annotation, and Web Technology.

    Science.gov (United States)

    Schulze, Patricia

    From reading texts to annotating web pages, grade 6-8 students rely on group cooperation and individual reading and writing skills in this research project that spans six 50-minute lessons. Student objectives for this project are that they will: read, discuss, and keep a journal on a book in literature circles; understand the elements of and…

  18. H2DB: a heritability database across multiple species by annotating trait-associated genomic loci.

    Science.gov (United States)

    Kaminuma, Eli; Fujisawa, Takatomo; Tanizawa, Yasuhiro; Sakamoto, Naoko; Kurata, Nori; Shimizu, Tokurou; Nakamura, Yasukazu

    2013-01-01

    H2DB (http://tga.nig.ac.jp/h2db/), an annotation database of genetic heritability estimates for humans and other species, has been developed as a knowledge database to connect trait-associated genomic loci. Heritability estimates have been investigated for individual species, particularly in human twin studies and plant/animal breeding studies. However, there appears to be no comprehensive heritability database for both humans and other species. Here, we introduce an annotation database for genetic heritabilities of various species that was annotated by manually curating online public resources in PUBMED abstracts and journal contents. The proposed heritability database contains attribute information for trait descriptions, experimental conditions, trait-associated genomic loci and broad- and narrow-sense heritability specifications. Annotated trait-associated genomic loci, for which most are single-nucleotide polymorphisms derived from genome-wide association studies, may be valuable resources for experimental scientists. In addition, we assigned phenotype ontologies to the annotated traits for the purposes of discussing heritability distributions based on phenotypic classifications.

  19. BG7: A New Approach for Bacterial Genome Annotation Designed for Next Generation Sequencing Data

    Science.gov (United States)

    Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Pareja, Eduardo; Tobes, Raquel

    2012-01-01

    BG7 is a new system for de novo bacterial, archaeal and viral genome annotation based on a new approach specifically designed for annotating genomes sequenced with next generation sequencing technologies. The system is versatile and able to annotate genes even in the step of preliminary assembly of the genome. It is especially efficient detecting unexpected genes horizontally acquired from bacterial or archaeal distant genomes, phages, plasmids, and mobile elements. From the initial phases of the gene annotation process, BG7 exploits the massive availability of annotated protein sequences in databases. BG7 predicts ORFs and infers their function based on protein similarity with a wide set of reference proteins, integrating ORF prediction and functional annotation phases in just one step. BG7 is especially tolerant to sequencing errors in start and stop codons, to frameshifts, and to assembly or scaffolding errors. The system is also tolerant to the high level of gene fragmentation which is frequently found in not fully assembled genomes. BG7 current version – which is developed in Java, takes advantage of Amazon Web Services (AWS) cloud computing features, but it can also be run locally in any operating system. BG7 is a fast, automated and scalable system that can cope with the challenge of analyzing the huge amount of genomes that are being sequenced with NGS technologies. Its capabilities and efficiency were demonstrated in the 2011 EHEC Germany outbreak in which BG7 was used to get the first annotations right the next day after the first entero-hemorrhagic E. coli genome sequences were made publicly available. The suitability of BG7 for genome annotation has been proved for Illumina, 454, Ion Torrent, and PacBio sequencing technologies. Besides, thanks to its plasticity, our system could be very easily adapted to work with new technologies in the future. PMID:23185310

  20. New in protein structure and function annotation: hotspots, single nucleotide polymorphisms and the 'Deep Web'.

    Science.gov (United States)

    Bromberg, Yana; Yachdav, Guy; Ofran, Yanay; Schneider, Reinhard; Rost, Burkhard

    2009-05-01

    The rapidly increasing quantity of protein sequence data continues to widen the gap between available sequences and annotations. Comparative modeling suggests some aspects of the 3D structures of approximately half of all known proteins; homology- and network-based inferences annotate some aspect of function for a similar fraction of the proteome. For most known protein sequences, however, there is detailed knowledge about neither their function nor their structure. Comprehensive efforts towards the expert curation of sequence annotations have failed to meet the demand of the rapidly increasing number of available sequences. Only the automated prediction of protein function in the absence of homology can close the gap between available sequences and annotations in the foreseeable future. This review focuses on two novel methods for automated annotation, and briefly presents an outlook on how modern web software may revolutionize the field of protein sequence annotation. First, predictions of protein binding sites and functional hotspots, and the evolution of these into the most successful type of prediction of protein function from sequence will be discussed. Second, a new tool, comprehensive in silico mutagenesis, which contributes important novel predictions of function and at the same time prepares for the onset of the next sequencing revolution, will be described. While these two new sub-fields of protein prediction represent the breakthroughs that have been achieved methodologically, it will then be argued that a different development might further change the way biomedical researchers benefit from annotations: modern web software can connect the worldwide web in any browser with the 'Deep Web' (ie, proprietary data resources). The availability of this direct connection, and the resulting access to a wealth of data, may impact drug discovery and development more than any existing method that contributes to protein annotation.