WorldWideScience

Sample records for event-level metadata service

  1. A programmatic view of metadata, metadata services, and metadata flow in ATLAS

    Science.gov (United States)

    Malon, D.; Albrand, S.; Gallas, E.; Stewart, G.

    2012-12-01

    The volume and diversity of metadata in an experiment of the size and scope of ATLAS are considerable. Even the definition of metadata may seem context-dependent: data that are primary for one purpose may be metadata for another. ATLAS metadata services must integrate and federate information from inhomogeneous sources and repositories, map metadata about logical or physics constructs to deployment and production constructs, provide a means to associate metadata at one level of granularity with processing or decision-making at another, offer a coherent and integrated view to physicists, and support both human use and programmatic access. In this paper we consider ATLAS metadata, metadata services, and metadata flow principally from the illustrative perspective of how disparate metadata are made available to executing jobs and, conversely, how metadata generated by such jobs are returned. We describe how metadata are read, how metadata are cached, and how metadata generated by jobs and the tasks of which they are a part are communicated, associated with data products, and preserved. We also discuss the principles that guide decision-making about metadata storage, replication, and access.

  2. A Programmatic View of Metadata, Metadata Services, and Metadata Flow in ATLAS

    CERN Multimedia

    CERN. Geneva

    2012-01-01

    The volume and diversity of metadata in an experiment of the size and scope of ATLAS is considerable. Even the definition of metadata may seem context-dependent: data that are primary for one purpose may be metadata for another. Trigger information and data from the Large Hadron Collider itself provide cases in point, but examples abound. Metadata about logical or physics constructs, such as data-taking periods and runs and luminosity blocks and events and algorithms, often need to be mapped to deployment and production constructs, such as datasets and jobs and files and software versions, and vice versa. Metadata at one level of granularity may have implications at another. ATLAS metadata services must integrate and federate information from inhomogeneous sources and repositories, map metadata about logical or physics constructs to deployment and production constructs, provide a means to associate metadata at one level of granularity with processing or decision-making at another, offer a coherent and ...

  3. A Programmatic View of Metadata, Metadata Services, and Metadata Flow in ATLAS

    CERN Document Server

    Malon, D; The ATLAS collaboration; Gallas, E; Stewart, G

    2012-01-01

    The volume and diversity of metadata in an experiment of the size and scope of ATLAS are considerable. Even the definition of metadata may seem context-dependent: data that are primary for one purpose may be metadata for another. Trigger information and data from the Large Hadron Collider itself provide cases in point, but examples abound. Metadata about logical or physics constructs, such as data-taking periods and runs and luminosity blocks and events and algorithms, often need to be mapped to deployment and production constructs, such as datasets and jobs and files and software versions, and vice versa. Metadata at one level of granularity may have implications at another. ATLAS metadata services must integrate and federate information from inhomogeneous sources and repositories, map metadata about logical or physics constructs to deployment and production constructs, provide a means to associate metadata at one level of granularity with processing or decision-making at another, offer a coherent and integr...

  4. A programmatic view of metadata, metadata services, and metadata flow in ATLAS

    CERN Document Server

    Malon, D; The ATLAS collaboration; Albrand, S; Gallas, E; Stewart, G

    2012-01-01

    The volume and diversity of metadata in an experiment of the size and scope of ATLAS are considerable. Even the definition of metadata may seem context-dependent: data that are primary for one purpose may be metadata for another. Trigger information and data from the Large Hadron Collider itself provide cases in point, but examples abound. Metadata about logical or physics constructs, such as data-taking periods and runs and luminosity blocks and events and algorithms, often need to be mapped to deployment and production constructs, such as datasets and jobs and files and software versions, and vice versa. Metadata at one level of granularity may have implications at another. ATLAS metadata services must integrate and federate information from inhomogeneous sources and repositories, map metadata about logical or physics constructs to deployment and production constructs, provide a means to associate metadata at one level of granularity with processing or decision-making at another, offer a coherent and integr...

  5. Geospatial metadata retrieval from web services

    Directory of Open Access Journals (Sweden)

    Ivanildo Barbosa

    Full Text Available Nowadays, producers of geospatial data in either raster or vector formats are able to make them available on the World Wide Web by deploying web services that enable users to access and query on those contents even without specific software for geoprocessing. Several providers around the world have deployed instances of WMS (Web Map Service, WFS (Web Feature Service and WCS (Web Coverage Service, all of them specified by the Open Geospatial Consortium (OGC. In consequence, metadata about the available contents can be retrieved to be compared with similar offline datasets from other sources. This paper presents a brief summary and describes the matching process between the specifications for OGC web services (WMS, WFS and WCS and the specifications for metadata required by the ISO 19115 - adopted as reference for several national metadata profiles, including the Brazilian one. This process focuses on retrieving metadata about the identification and data quality packages as well as indicates the directions to retrieve metadata related to other packages. Therefore, users are able to assess whether the provided contents fit to their purposes.

  6. U.S. EPAs Public Geospatial Metadata Service

    Data.gov (United States)

    U.S. Environmental Protection Agency — EPAs public geospatial metadata service provides external parties (Data.gov, GeoPlatform.gov, and the general public) with access to EPA's geospatial metadata...

  7. Testing Metadata Existence of Web Map Services

    Directory of Open Access Journals (Sweden)

    Jan Růžička

    2011-05-01

    Full Text Available For a general user is quite common to use data sources available on WWW. Almost all GIS software allow to use data sources available via Web Map Service (ISO/OGC standard interface. The opportunity to use different sources and combine them brings a lot of problems that were discussed many times on conferences or journal papers. One of the problem is based on non existence of metadata for published sources. The question was: were the discussions effective? The article is partly based on comparison of situation for metadata between years 2007 and 2010. Second part of the article is focused only on 2010 year situation. The paper is created in a context of research of intelligent map systems, that can be used for an automatic or a semi-automatic map creation or a map evaluation.

  8. Metadata

    CERN Document Server

    Zeng, Marcia Lei

    2016-01-01

    Metadata remains the solution for describing the explosively growing, complex world of digital information, and continues to be of paramount importance for information professionals. Providing a solid grounding in the variety and interrelationships among different metadata types, Zeng and Qin's thorough revision of their benchmark text offers a comprehensive look at the metadata schemas that exist in the world of library and information science and beyond, as well as the contexts in which they operate. Cementing its value as both an LIS text and a handy reference for professionals already in the field, this book: * Lays out the fundamentals of metadata, including principles of metadata, structures of metadata vocabularies, and metadata descriptions * Surveys metadata standards and their applications in distinct domains and for various communities of metadata practice * Examines metadata building blocks, from modelling to defining properties, and from designing application profiles to implementing value vocabu...

  9. Leveraging Metadata to Create Better Web Services

    Science.gov (United States)

    Mitchell, Erik

    2012-01-01

    Libraries have been increasingly concerned with data creation, management, and publication. This increase is partly driven by shifting metadata standards in libraries and partly by the growth of data and metadata repositories being managed by libraries. In order to manage these data sets, libraries are looking for new preservation and discovery…

  10. Developing Cyberinfrastructure Tools and Services for Metadata Quality Evaluation

    Science.gov (United States)

    Mecum, B.; Gordon, S.; Habermann, T.; Jones, M. B.; Leinfelder, B.; Powers, L. A.; Slaughter, P.

    2016-12-01

    Metadata and data quality are at the core of reusable and reproducible science. While great progress has been made over the years, much of the metadata collected only addresses data discovery, covering concepts such as titles and keywords. Improving metadata beyond the discoverability plateau means documenting detailed concepts within the data such as sampling protocols, instrumentation used, and variables measured. Given that metadata commonly do not describe their data at this level, how might we improve the state of things? Giving scientists and data managers easy to use tools to evaluate metadata quality that utilize community-driven recommendations is the key to producing high-quality metadata. To achieve this goal, we created a set of cyberinfrastructure tools and services that integrate with existing metadata and data curation workflows which can be used to improve metadata and data quality across the sciences. These tools work across metadata dialects (e.g., ISO19115, FGDC, EML, etc.) and can be used to assess aspects of quality beyond what is internal to the metadata such as the congruence between the metadata and the data it describes. The system makes use of a user-friendly mechanism for expressing a suite of checks as code in popular data science programming languages such as Python and R. This reduces the burden on scientists and data managers to learn yet another language. We demonstrated these services and tools in three ways. First, we evaluated a large corpus of datasets in the DataONE federation of data repositories against a metadata recommendation modeled after existing recommendations such as the LTER best practices and the Attribute Convention for Dataset Discovery (ACDD). Second, we showed how this service can be used to display metadata and data quality information to data producers during the data submission and metadata creation process, and to data consumers through data catalog search and access tools. Third, we showed how the centrally

  11. Metadata

    CERN Document Server

    Pomerantz, Jeffrey

    2015-01-01

    When "metadata" became breaking news, appearing in stories about surveillance by the National Security Agency, many members of the public encountered this once-obscure term from information science for the first time. Should people be reassured that the NSA was "only" collecting metadata about phone calls -- information about the caller, the recipient, the time, the duration, the location -- and not recordings of the conversations themselves? Or does phone call metadata reveal more than it seems? In this book, Jeffrey Pomerantz offers an accessible and concise introduction to metadata. In the era of ubiquitous computing, metadata has become infrastructural, like the electrical grid or the highway system. We interact with it or generate it every day. It is not, Pomerantz tell us, just "data about data." It is a means by which the complexity of an object is represented in a simpler form. For example, the title, the author, and the cover art are metadata about a book. When metadata does its job well, it fades i...

  12. The XML Metadata Editor of GFZ Data Services

    Science.gov (United States)

    Ulbricht, Damian; Elger, Kirsten; Tesei, Telemaco; Trippanera, Daniele

    2017-04-01

    Following the FAIR data principles, research data should be Findable, Accessible, Interoperable and Reuseable. Publishing data under these principles requires to assign persistent identifiers to the data and to generate rich machine-actionable metadata. To increase the interoperability, metadata should include shared vocabularies and crosslink the newly published (meta)data and related material. However, structured metadata formats tend to be complex and are not intended to be generated by individual scientists. Software solutions are needed that support scientists in providing metadata describing their data. To facilitate data publication activities of 'GFZ Data Services', we programmed an XML metadata editor that assists scientists to create metadata in different schemata popular in the earth sciences (ISO19115, DIF, DataCite), while being at the same time usable by and understandable for scientists. Emphasis is placed on removing barriers, in particular the editor is publicly available on the internet without registration [1] and the scientists are not requested to provide information that may be generated automatically (e.g. the URL of a specific licence or the contact information of the metadata distributor). Metadata are stored in browser cookies and a copy can be saved to the local hard disk. To improve usability, form fields are translated into the scientific language, e.g. 'creators' of the DataCite schema are called 'authors'. To assist filling in the form, we make use of drop down menus for small vocabulary lists and offer a search facility for large thesauri. Explanations to form fields and definitions of vocabulary terms are provided in pop-up windows and a full documentation is available for download via the help menu. In addition, multiple geospatial references can be entered via an interactive mapping tool, which helps to minimize problems with different conventions to provide latitudes and longitudes. Currently, we are extending the metadata editor

  13. Metadata and Service at the GFZ ISDC Portal

    Science.gov (United States)

    Ritschel, B.

    2008-05-01

    The online service portal of the GFZ Potsdam Information System and Data Center (ISDC) is an access point for all manner of geoscientific geodata, its corresponding metadata, scientific documentation and software tools. At present almost 2000 national and international users and user groups have the opportunity to request Earth science data from a portfolio of 275 different products types and more than 20 Million single data files with an added volume of approximately 12 TByte. The majority of the data and information, the portal currently offers to the public, are global geomonitoring products such as satellite orbit and Earth gravity field data as well as geomagnetic and atmospheric data for the exploration. These products for Earths changing system are provided via state-of-the art retrieval techniques. The data product catalog system behind these techniques is based on the extensive usage of standardized metadata, which are describing the different geoscientific product types and data products in an uniform way. Where as all ISDC product types are specified by NASA's Directory Interchange Format (DIF), Version 9.0 Parent XML DIF metadata files, the individual data files are described by extended DIF metadata documents. Depending on the beginning of the scientific project, one part of data files are described by extended DIF, Version 6 metadata documents and the other part are specified by data Child XML DIF metadata documents. Both, the product type dependent parent DIF metadata documents and the data file dependent child DIF metadata documents are derived from a base-DIF.xsd xml schema file. The ISDC metadata philosophy defines a geoscientific product as a package consisting of mostly one or sometimes more than one data file plus one extended DIF metadata file. Because NASA's DIF metadata standard has been developed in order to specify a collection of data only, the extension of the DIF standard consists of new and specific attributes, which are necessary for

  14. The National Digital Information Infrastructure Preservation Program; Metadata Principles and Practicalities; Challenges for Service Providers when Importing Metadata in Digital Libraries; Integrated and Aggregated Reference Services.

    Science.gov (United States)

    Friedlander, Amy; Duval, Erik; Hodgins, Wayne; Sutton, Stuart; Weibel, Stuart L.; McClelland, Marilyn; McArthur, David; Giersch, Sarah; Geisler, Gary; Hodgkin, Adam

    2002-01-01

    Includes 6 articles that discuss the National Digital Information Infrastructure Preservation Program at the Library of Congress; metadata in digital libraries; integrated reference services on the Web. (LRW)

  15. An integrated overview of metadata in ATLAS

    International Nuclear Information System (INIS)

    Gallas, E J; Malon, D; Hawkings, R J; Albrand, S; Torrence, E

    2010-01-01

    Metadata (data about data) arise in many contexts, from many diverse sources, and at many levels in ATLAS. Familiar examples include run-level, luminosity-block-level, and event-level metadata, and, related to processing and organization, dataset-level and file-level metadata, but these categories are neither exhaustive nor orthogonal. Some metadata are known a priori, in advance of data taking or simulation; other metadata are known only after processing, and occasionally, quite late (e.g., detector status or quality updates that may appear after initial reconstruction is complete). Metadata that may seem relevant only internally to the distributed computing infrastructure under ordinary conditions may become relevant to physics analysis under error conditions ('What can I discover about data I failed to process?'). This talk provides an overview of metadata and metadata handling in ATLAS, and describes ongoing work to deliver integrated metadata services in support of physics analysis.

  16. Data Bookkeeping Service 3 - Providing event metadata in CMS

    CERN Document Server

    Giffels, Manuel; Riley, Daniel

    2014-01-01

    The Data Bookkeeping Service 3 provides a catalog of event metadata for Monte Carlo and recorded data of the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) at CERN, Geneva. It comprises all necessary information for tracking datasets, their processing history and associations between runs, files and datasets, on a large scale of about $200,000$ datasets and more than $40$ million files, which adds up in around $700$ GB of metadata. The DBS is an essential part of the CMS Data Management and Workload Management (DMWM) systems, all kind of data-processing like Monte Carlo production, processing of recorded event data as well as physics analysis done by the users are heavily relying on the information stored in DBS.

  17. DESIGN AND PRACTICE ON METADATA SERVICE SYSTEM OF SURVEYING AND MAPPING RESULTS BASED ON GEONETWORK

    Directory of Open Access Journals (Sweden)

    Z. Zha

    2012-08-01

    Full Text Available Based on the analysis and research on the current geographic information sharing and metadata service,we design, develop and deploy a distributed metadata service system based on GeoNetwork covering more than 30 nodes in provincial units of China.. By identifying the advantages of GeoNetwork, we design a distributed metadata service system of national surveying and mapping results. It consists of 31 network nodes, a central node and a portal. Network nodes are the direct system metadata source, and are distributed arround the country. Each network node maintains a metadata service system, responsible for metadata uploading and management. The central node harvests metadata from network nodes using OGC CSW 2.0.2 standard interface. The portal shows all metadata in the central node, provides users with a variety of methods and interface for metadata search or querying. It also provides management capabilities on connecting the central node and the network nodes together. There are defects with GeoNetwork too. Accordingly, we made improvement and optimization on big-amount metadata uploading, synchronization and concurrent access. For metadata uploading and synchronization, by carefully analysis the database and index operation logs, we successfully avoid the performance bottlenecks. And with a batch operation and dynamic memory management solution, data throughput and system performance are significantly improved; For concurrent access, , through a request coding and results cache solution, query performance is greatly improved. To smoothly respond to huge concurrent requests, a web cluster solution is deployed. This paper also gives an experiment analysis and compares the system performance before and after improvement and optimization. Design and practical results have been applied in national metadata service system of surveying and mapping results. It proved that the improved GeoNetwork service architecture can effectively adaptive for

  18. Evolution of Web Services in EOSDIS: Search and Order Metadata Registry (ECHO)

    Science.gov (United States)

    Mitchell, Andrew; Ramapriyan, Hampapuram; Lowe, Dawn

    2009-01-01

    During 2005 through 2008, NASA defined and implemented a major evolutionary change in it Earth Observing system Data and Information System (EOSDIS) to modernize its capabilities. This implementation was based on a vision for 2015 developed during 2005. The EOSDIS 2015 Vision emphasizes increased end-to-end data system efficiency and operability; increased data usability; improved support for end users; and decreased operations costs. One key feature of the Evolution plan was achieving higher operational maturity (ingest, reconciliation, search and order, performance, error handling) for the NASA s Earth Observing System Clearinghouse (ECHO). The ECHO system is an operational metadata registry through which the scientific community can easily discover and exchange NASA's Earth science data and services. ECHO contains metadata for 2,726 data collections comprising over 87 million individual data granules and 34 million browse images, consisting of NASA s EOSDIS Data Centers and the United States Geological Survey's Landsat Project holdings. ECHO is a middleware component based on a Service Oriented Architecture (SOA). The system is comprised of a set of infrastructure services that enable the fundamental SOA functions: publish, discover, and access Earth science resources. It also provides additional services such as user management, data access control, and order management. The ECHO system has a data registry and a services registry. The data registry enables organizations to publish EOS and other Earth-science related data holdings to a common metadata model. These holdings are described through metadata in terms of datasets (types of data) and granules (specific data items of those types). ECHO also supports browse images, which provide a visual representation of the data. The published metadata can be mapped to and from existing standards (e.g., FGDC, ISO 19115). With ECHO, users can find the metadata stored in the data registry and then access the data either

  19. The European Database of Seismogenic Faults (EDSF) for EPOS: implementation of OGC services and metadata publication

    Science.gov (United States)

    Vallone, Roberto; Basili, Roberto; Tarabusi, Gabriele; Burrato, Pierfrancesco; Valensise, Gianluca

    2017-04-01

    The European Database of Seismogenic Faults (EDSF; http://diss.rm.ingv.it/share-edsf/; doi: 10.6092/INGV.IT-SHARE-EDSF) is part of the Hazard & Risk pillar of EPOS-Implementation Phase (WP8, Seismology). Its tables contain faults that are deemed capable of generating earthquakes of magnitude 5.5 and larger, and aims at making available a homogenous input dataset for use in the assessment of ground-shaking hazard in the extended Euro-Mediterranean area or for developing regional tectonic and geodynamic models. In keeping with the goals set forth by EPOS, EDSF data are currently distributed through the Open Geospatial Consortium (OGC) service standards known as WFS (Web Feature Service) and WMS (Web Map Service), both complying with ISO standards. We present the software infrastructure implemented for the publication of the EDSF-OGC services and of the related metadata. The infrastructure was entirely built using free and open source software. The metadata were published as web services following the recommendations of the EPOS metadata model reference.

  20. GEO Label Web Services for Dynamic and Effective Communication of Geospatial Metadata Quality

    Science.gov (United States)

    Lush, Victoria; Nüst, Daniel; Bastin, Lucy; Masó, Joan; Lumsden, Jo

    2014-05-01

    We present demonstrations of the GEO label Web services and their integration into a prototype extension of the GEOSS portal (http://scgeoviqua.sapienzaconsulting.com/web/guest/geo_home), the GMU portal (http://gis.csiss.gmu.edu/GADMFS/) and a GeoNetwork catalog application (http://uncertdata.aston.ac.uk:8080/geonetwork/srv/eng/main.home). The GEO label is designed to communicate, and facilitate interrogation of, geospatial quality information with a view to supporting efficient and effective dataset selection on the basis of quality, trustworthiness and fitness for use. The GEO label which we propose was developed and evaluated according to a user-centred design (UCD) approach in order to maximise the likelihood of user acceptance once deployed. The resulting label is dynamically generated from producer metadata in ISO or FDGC format, and incorporates user feedback on dataset usage, ratings and discovered issues, in order to supply a highly informative summary of metadata completeness and quality. The label was easily incorporated into a community portal as part of the GEO Architecture Implementation Programme (AIP-6) and has been successfully integrated into a prototype extension of the GEOSS portal, as well as the popular metadata catalog and editor, GeoNetwork. The design of the GEO label was based on 4 user studies conducted to: (1) elicit initial user requirements; (2) investigate initial user views on the concept of a GEO label and its potential role; (3) evaluate prototype label visualizations; and (4) evaluate and validate physical GEO label prototypes. The results of these studies indicated that users and producers support the concept of a label with drill-down interrogation facility, combining eight geospatial data informational aspects, namely: producer profile, producer comments, lineage information, standards compliance, quality information, user feedback, expert reviews, and citations information. These are delivered as eight facets of a wheel

  1. Metadata aided run selection at ATLAS

    International Nuclear Information System (INIS)

    Buckingham, R M; Gallas, E J; Tseng, J C-L; Viegas, F; Vinek, E

    2011-01-01

    Management of the large volume of data collected by any large scale scientific experiment requires the collection of coherent metadata quantities, which can be used by reconstruction or analysis programs and/or user interfaces, to pinpoint collections of data needed for specific purposes. In the ATLAS experiment at the LHC, we have collected metadata from systems storing non-event-wise data (Conditions) into a relational database. The Conditions metadata (COMA) database tables not only contain conditions known at the time of event recording, but also allow for the addition of conditions data collected as a result of later analysis of the data (such as improved measurements of beam conditions or assessments of data quality). A new web based interface called 'runBrowser' makes these Conditions Metadata available as a Run based selection service. runBrowser, based on PHP and JavaScript, uses jQuery to present selection criteria and report results. It not only facilitates data selection by conditions attributes, but also gives the user information at each stage about the relationship between the conditions chosen and the remaining conditions criteria available. When a set of COMA selections are complete, runBrowser produces a human readable report as well as an XML file in a standardized ATLAS format. This XML can be saved for later use or refinement in a future runBrowser session, shared with physics/detector groups, or used as input to ELSSI (event level Metadata browser) or other ATLAS run or event processing services.

  2. A Transparently-Scalable Metadata Service for the Ursa Minor Storage System

    Science.gov (United States)

    2010-06-25

    Software Engineering Institute,Pittsburgh,PA,15213 8. PERFORMING ORGANIZATION REPORT NUMBER 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10...metadata server for metadata activity and to discover the locations of data. They then access data directly at the appropriate data servers. Metadata...necessary for Section 4.4. Second, we doubled the warmup time for each run to 10 minutes to ensure the measured portion of the run did not benefit from

  3. BASINS Metadata

    Science.gov (United States)

    Metadata or data about data describes the content, quality, condition, and other characteristics of data. Geospatial metadata are critical to data discovery and serves as the fuel for the Geospatial One-Stop data portal.

  4. USGIN ISO metadata profile

    Science.gov (United States)

    Richard, S. M.

    2011-12-01

    The USGIN project has drafted and is using a specification for use of ISO 19115/19/39 metadata, recommendations for simple metadata content, and a proposal for a URI scheme to identify resources using resolvable http URI's(see http://lab.usgin.org/usgin-profiles). The principal target use case is a catalog in which resources can be registered and described by data providers for discovery by users. We are currently using the ESRI Geoportal (Open Source), with configuration files for the USGIN profile. The metadata offered by the catalog must provide sufficient content to guide search engines to locate requested resources, to describe the resource content, provenance, and quality so users can determine if the resource will serve for intended usage, and finally to enable human users and sofware clients to obtain or access the resource. In order to achieve an operational federated catalog system, provisions in the ISO specification must be restricted and usage clarified to reduce the heterogeneity of 'standard' metadata and service implementations such that a single client can search against different catalogs, and the metadata returned by catalogs can be parsed reliably to locate required information. Usage of the complex ISO 19139 XML schema allows for a great deal of structured metadata content, but the heterogenity in approaches to content encoding has hampered development of sophisticated client software that can take advantage of the rich metadata; the lack of such clients in turn reduces motivation for metadata producers to produce content-rich metadata. If the only significant use of the detailed, structured metadata is to format into text for people to read, then the detailed information could be put in free text elements and be just as useful. In order for complex metadata encoding and content to be useful, there must be clear and unambiguous conventions on the encoding that are utilized by the community that wishes to take advantage of advanced metadata

  5. NAIP National Metadata

    Data.gov (United States)

    Farm Service Agency, Department of Agriculture — The NAIP National Metadata Map contains USGS Quarter Quad and NAIP Seamline boundaries for every year NAIP imagery has been collected. Clicking on the map also makes...

  6. An Intelligent Web Digital Image Metadata Service Platform for Social Curation Commerce Environment

    Directory of Open Access Journals (Sweden)

    Seong-Yong Hong

    2015-01-01

    Full Text Available Information management includes multimedia data management, knowledge management, collaboration, and agents, all of which are supporting technologies for XML. XML technologies have an impact on multimedia databases as well as collaborative technologies and knowledge management. That is, e-commerce documents are encoded in XML and are gaining much popularity for business-to-business or business-to-consumer transactions. Recently, the internet sites, such as e-commerce sites and shopping mall sites, deal with a lot of image and multimedia information. This paper proposes an intelligent web digital image information retrieval platform, which adopts XML technology for social curation commerce environment. To support object-based content retrieval on product catalog images containing multiple objects, we describe multilevel metadata structures representing the local features, global features, and semantics of image data. To enable semantic-based and content-based retrieval on such image data, we design an XML-Schema for the proposed metadata. We also describe how to automatically transform the retrieval results into the forms suitable for the various user environments, such as web browser or mobile device, using XSLT. The proposed scheme can be utilized to enable efficient e-catalog metadata sharing between systems, and it will contribute to the improvement of the retrieval correctness and the user’s satisfaction on semantic-based web digital image information retrieval.

  7. The RBV metadata catalog

    Science.gov (United States)

    Andre, Francois; Fleury, Laurence; Gaillardet, Jerome; Nord, Guillaume

    2015-04-01

    RBV (Réseau des Bassins Versants) is a French initiative to consolidate the national efforts made by more than 15 elementary observatories funded by various research institutions (CNRS, INRA, IRD, IRSTEA, Universities) that study river and drainage basins. The RBV Metadata Catalogue aims at giving an unified vision of the work produced by every observatory to both the members of the RBV network and any external person interested by this domain of research. Another goal is to share this information with other existing metadata portals. Metadata management is heterogeneous among observatories ranging from absence to mature harvestable catalogues. Here, we would like to explain the strategy used to design a state of the art catalogue facing this situation. Main features are as follows : - Multiple input methods: Metadata records in the catalog can either be entered with the graphical user interface, harvested from an existing catalogue or imported from information system through simplified web services. - Hierarchical levels: Metadata records may describe either an observatory, one of its experimental site or a single dataset produced by one instrument. - Multilingualism: Metadata can be easily entered in several configurable languages. - Compliance to standards : the backoffice part of the catalogue is based on a CSW metadata server (Geosource) which ensures ISO19115 compatibility and the ability of being harvested (globally or partially). On going tasks focus on the use of SKOS thesaurus and SensorML description of the sensors. - Ergonomy : The user interface is built with the GWT Framework to offer a rich client application with a fully ajaxified navigation. - Source code sharing : The work has led to the development of reusable components which can be used to quickly create new metadata forms in other GWT applications You can visit the catalogue (http://portailrbv.sedoo.fr/) or contact us by email rbv@sedoo.fr.

  8. Log-Less Metadata Management on Metadata Server for Parallel File Systems

    Directory of Open Access Journals (Sweden)

    Jianwei Liao

    2014-01-01

    Full Text Available This paper presents a novel metadata management mechanism on the metadata server (MDS for parallel and distributed file systems. In this technique, the client file system backs up the sent metadata requests, which have been handled by the metadata server, so that the MDS does not need to log metadata changes to nonvolatile storage for achieving highly available metadata service, as well as better performance improvement in metadata processing. As the client file system backs up certain sent metadata requests in its memory, the overhead for handling these backup requests is much smaller than that brought by the metadata server, while it adopts logging or journaling to yield highly available metadata service. The experimental results show that this newly proposed mechanism can significantly improve the speed of metadata processing and render a better I/O data throughput, in contrast to conventional metadata management schemes, that is, logging or journaling on MDS. Besides, a complete metadata recovery can be achieved by replaying the backup logs cached by all involved clients, when the metadata server has crashed or gone into nonoperational state exceptionally.

  9. Development of RESTful services and map-based user interface tools for access and delivery of data and metadata from the Marine-Geo Digital Library

    Science.gov (United States)

    Morton, J. J.; Ferrini, V. L.

    2015-12-01

    The Marine Geoscience Data System (MGDS, www.marine-geo.org) operates an interactive digital data repository and metadata catalog that provides access to a variety of marine geology and geophysical data from throughout the global oceans. Its Marine-Geo Digital Library includes common marine geophysical data types and supporting data and metadata, as well as complementary long-tail data. The Digital Library also includes community data collections and custom data portals for the GeoPRISMS, MARGINS and Ridge2000 programs, for active source reflection data (Academic Seismic Portal), and for marine data acquired by the US Antarctic Program (Antarctic and Southern Ocean Data Portal). Ensuring that these data are discoverable not only through our own interfaces but also through standards-compliant web services is critical for enabling investigators to find data of interest.Over the past two years, MGDS has developed several new RESTful web services that enable programmatic access to metadata and data holdings. These web services are compliant with the EarthCube GeoWS Building Blocks specifications and are currently used to drive our own user interfaces. New web applications have also been deployed to provide a more intuitive user experience for searching, accessing and browsing metadata and data. Our new map-based search interface combines components of the Google Maps API with our web services for dynamic searching and exploration of geospatially constrained data sets. Direct introspection of nearly all data formats for hundreds of thousands of data files curated in the Marine-Geo Digital Library has allowed for precise geographic bounds, which allow geographic searches to an extent not previously possible. All MGDS map interfaces utilize the web services of the Global Multi-Resolution Topography (GMRT) synthesis for displaying global basemap imagery and for dynamically provide depth values at the cursor location.

  10. ATLAS Metadata Interface (AMI), a generic metadata framework

    CERN Document Server

    AUTHOR|(SzGeCERN)573735; The ATLAS collaboration; Odier, Jerome; Lambert, Fabian

    2017-01-01

    The ATLAS Metadata Interface (AMI) is a mature application of more than 15 years of existence. Mainly used by the ATLAS experiment at CERN, it consists of a very generic tool ecosystem for metadata aggregation and cataloguing. We briefly describe the architecture, the main services and the benefits of using AMI in big collaborations, especially for high energy physics. We focus on the recent improvements, for instance: the lightweight clients (Python, JavaScript, C++), the new smart task server system and the Web 2.0 AMI framework for simplifying the development of metadata-oriented web interfaces.

  11. ATLAS Metadata Interface (AMI), a generic metadata framework

    Science.gov (United States)

    Fulachier, J.; Odier, J.; Lambert, F.; ATLAS Collaboration

    2017-10-01

    The ATLAS Metadata Interface (AMI) is a mature application of more than 15 years of existence. Mainly used by the ATLAS experiment at CERN, it consists of a very generic tool ecosystem for metadata aggregation and cataloguing. We briefly describe the architecture, the main services and the benefits of using AMI in big collaborations, especially for high energy physics. We focus on the recent improvements, for instance: the lightweight clients (Python, JavaScript, C++), the new smart task server system and the Web 2.0 AMI framework for simplifying the development of metadata-oriented web interfaces.

  12. ATLAS Metadata Interface (AMI), a generic metadata framework

    CERN Document Server

    Fulachier, Jerome; The ATLAS collaboration

    2016-01-01

    The ATLAS Metadata Interface (AMI) is a mature application of more than 15 years of existence. Mainly used by the ATLAS experiment at CERN, it consists of a very generic tool ecosystem for metadata aggregation and cataloguing. We briefly describe the architecture, the main services and the benefits of using AMI in big collaborations, especially for high energy physics. We focus on the recent improvements, for instance: the lightweight clients (Python, Javascript, C++), the new smart task server system and the Web 2.0 AMI framework for simplifying the development of metadata-oriented web interfaces.

  13. Federating Metadata Catalogs

    Science.gov (United States)

    Baru, C.; Lin, K.

    2009-04-01

    The Geosciences Network project (www.geongrid.org) has been developing cyberinfrastructure for data sharing in the Earth Science community based on a service-oriented architecture. The project defines a standard "software stack", which includes a standardized set of software modules and corresponding service interfaces. The system employs Grid certificates for distributed user authentication. The GEON Portal provides online access to these services via a set of portlets. This service-oriented approach has enabled the GEON network to easily expand to new sites and deploy the same infrastructure in new projects. To facilitate interoperation with other distributed geoinformatics environments, service standards are being defined and implemented for catalog services and federated search across distributed catalogs. The need arises because there may be multiple metadata catalogs in a distributed system, for example, for each institution, agency, geographic region, and/or country. Ideally, a geoinformatics user should be able to search across all such catalogs by making a single search request. In this paper, we describe our implementation for such a search capability across federated metadata catalogs in the GEON service-oriented architecture. The GEON catalog can be searched using spatial, temporal, and other metadata-based search criteria. The search can be invoked as a Web service and, thus, can be imbedded in any software application. The need for federated catalogs in GEON arises because, (i) GEON collaborators at the University of Hyderabad, India have deployed their own catalog, as part of the iGEON-India effort, to register information about local resources for broader access across the network, (ii) GEON collaborators in the GEO Grid (Global Earth Observations Grid) project at AIST, Japan have implemented a catalog for their ASTER data products, and (iii) we have recently deployed a search service to access all data products from the EarthScope project in the US

  14. How libraries use publisher metadata

    Directory of Open Access Journals (Sweden)

    Steve Shadle

    2013-11-01

    Full Text Available With the proliferation of electronic publishing, libraries are increasingly relying on publisher-supplied metadata to meet user needs for discovery in library systems. However, many publisher/content provider staff creating metadata are unaware of the end-user environment and how libraries use their metadata. This article provides an overview of the three primary discovery systems that are used by academic libraries, with examples illustrating how publisher-supplied metadata directly feeds into these systems and is used to support end-user discovery and access. Commonly seen metadata problems are discussed, with recommendations suggested. Based on a series of presentations given in Autumn 2012 to the staff of a large publisher, this article uses the University of Washington Libraries systems and services as illustrative examples. Judging by the feedback received from these presentations, publishers (specifically staff not familiar with the big picture of metadata standards work would benefit from a better understanding of the systems and services libraries provide using the data that is created and managed by publishers.

  15. Big Metadata, Smart Metadata, and Metadata Capital: Toward Greater Synergy Between Data Science and Metadata

    Directory of Open Access Journals (Sweden)

    Jane Greenberg

    2017-08-01

    Full Text Available Purpose: The purpose of the paper is to provide a framework for addressing the disconnect between metadata and data science. Data science cannot progress without metadata research. This paper takes steps toward advancing the synergy between metadata and data science, and identifies pathways for developing a more cohesive metadata research agenda in data science. Design/methodology/approach: This paper identifies factors that challenge metadata research in the digital ecosystem, defines metadata and data science, and presents the concepts big metadata, smart metadata, and metadata capital as part of a metadata lingua franca connecting to data science. Findings: The “utilitarian nature” and “historical and traditional views” of metadata are identified as two intersecting factors that have inhibited metadata research. Big metadata, smart metadata, and metadata capital are presented as part of a metadata lingua franca to help frame research in the data science research space. Research limitations: There are additional, intersecting factors to consider that likely inhibit metadata research, and other significant metadata concepts to explore. Practical implications: The immediate contribution of this work is that it may elicit response, critique, revision, or, more significantly, motivate research. The work presented can encourage more researchers to consider the significance of metadata as a research worthy topic within data science and the larger digital ecosystem. Originality/value: Although metadata research has not kept pace with other data science topics, there is little attention directed to this problem. This is surprising, given that metadata is essential for data science endeavors. This examination synthesizes original and prior scholarship to provide new grounding for metadata research in data science.

  16. Drivers of flood damage on event level

    DEFF Research Database (Denmark)

    Kreibich, H.; Aerts, J. C. J. H.; Apel, H.

    2016-01-01

    Flood risk is dynamic and influenced by many processes related to hazard, exposure and vulnerability. Flood damage increased significantly over the past decades, however, resulting overall economic loss per event is an aggregated indicator and it is difficult to attribute causes to this increasing...... trend. Much has been learned about damaging processes during floods at the micro-scale, e.g. building level. However, little is known about the main factors determining the amount of flood damage on event level. Thus, we analyse and compare paired flood events, i.e. consecutive, similar damaging floods...... that occurred in the same area. In analogy to ’Paired catchment studies’ - a well-established method in hydrology to understand how changes in land use affect streamflow – we will investigate how and why resulting flood damage in a region differed between the first and second consecutive flood events. One...

  17. Drivers of flood damage on event level

    DEFF Research Database (Denmark)

    Kreibich, H.; Aerts, J. C. J. H.; Apel, H.

    2016-01-01

    -level mitigation measures, 3) more effective early warning and improved coordination of disaster response and 4) a more targeted maintenance of flood defence systems and their deliberate relocation. Thus, despite higher hydrological severity damage due to the 2013 flood was significantly lower than in 2002. In our......Flood risk is dynamic and influenced by many processes related to hazard, exposure and vulnerability. Flood damage increased significantly over the past decades, however, resulting overall economic loss per event is an aggregated indicator and it is difficult to attribute causes to this increasing...... trend. Much has been learned about damaging processes during floods at the micro-scale, e.g. building level. However, little is known about the main factors determining the amount of flood damage on event level. Thus, we analyse and compare paired flood events, i.e. consecutive, similar damaging floods...

  18. Handbook of metadata, semantics and ontologies

    CERN Document Server

    Sicilia, Miguel-Angel

    2013-01-01

    Metadata research has emerged as a discipline cross-cutting many domains, focused on the provision of distributed descriptions (often called annotations) to Web resources or applications. Such associated descriptions are supposed to serve as a foundation for advanced services in many application areas, including search and location, personalization, federation of repositories and automated delivery of information. Indeed, the Semantic Web is in itself a concrete technological framework for ontology-based metadata. For example, Web-based social networking requires metadata describing people and

  19. Finding Atmospheric Composition (AC) Metadata

    Science.gov (United States)

    Strub, Richard F..; Falke, Stefan; Fiakowski, Ed; Kempler, Steve; Lynnes, Chris; Goussev, Oleg

    2015-01-01

    The Atmospheric Composition Portal (ACP) is an aggregator and curator of information related to remotely sensed atmospheric composition data and analysis. It uses existing tools and technologies and, where needed, enhances those capabilities to provide interoperable access, tools, and contextual guidance for scientists and value-adding organizations using remotely sensed atmospheric composition data. The initial focus is on Essential Climate Variables identified by the Global Climate Observing System CH4, CO, CO2, NO2, O3, SO2 and aerosols. This poster addresses our efforts in building the ACP Data Table, an interface to help discover and understand remotely sensed data that are related to atmospheric composition science and applications. We harvested GCMD, CWIC, GEOSS metadata catalogs using machine to machine technologies - OpenSearch, Web Services. We also manually investigated the plethora of CEOS data providers portals and other catalogs where that data might be aggregated. This poster is our experience of the excellence, variety, and challenges we encountered.Conclusions:1.The significant benefits that the major catalogs provide are their machine to machine tools like OpenSearch and Web Services rather than any GUI usability improvements due to the large amount of data in their catalog.2.There is a trend at the large catalogs towards simulating small data provider portals through advanced services. 3.Populating metadata catalogs using ISO19115 is too complex for users to do in a consistent way, difficult to parse visually or with XML libraries, and too complex for Java XML binders like CASTOR.4.The ability to search for Ids first and then for data (GCMD and ECHO) is better for machine to machine operations rather than the timeouts experienced when returning the entire metadata entry at once. 5.Metadata harvest and export activities between the major catalogs has led to a significant amount of duplication. (This is currently being addressed) 6.Most (if not all

  20. Metadata management staging system

    Energy Technology Data Exchange (ETDEWEB)

    2013-08-01

    Django application providing a user-interface for building a file and metadata management system. An evolution of our Node.js and CouchDb metadata management system. This one focuses on server functionality and uses a well-documented, rational and REST-ful API for data access.

  1. Visualization of JPEG Metadata

    Science.gov (United States)

    Malik Mohamad, Kamaruddin; Deris, Mustafa Mat

    There are a lot of information embedded in JPEG image than just graphics. Visualization of its metadata would benefit digital forensic investigator to view embedded data including corrupted image where no graphics can be displayed in order to assist in evidence collection for cases such as child pornography or steganography. There are already available tools such as metadata readers, editors and extraction tools but mostly focusing on visualizing attribute information of JPEG Exif. However, none have been done to visualize metadata by consolidating markers summary, header structure, Huffman table and quantization table in a single program. In this paper, metadata visualization is done by developing a program that able to summarize all existing markers, header structure, Huffman table and quantization table in JPEG. The result shows that visualization of metadata helps viewing the hidden information within JPEG more easily.

  2. Harvesting NASA's Common Metadata Repository

    Science.gov (United States)

    Shum, D.; Mitchell, A. E.; Durbin, C.; Norton, J.

    2017-12-01

    As part of NASA's Earth Observing System Data and Information System (EOSDIS), the Common Metadata Repository (CMR) stores metadata for over 30,000 datasets from both NASA and international providers along with over 300M granules. This metadata enables sub-second discovery and facilitates data access. While the CMR offers a robust temporal, spatial and keyword search functionality to the general public and international community, it is sometimes more desirable for international partners to harvest the CMR metadata and merge the CMR metadata into a partner's existing metadata repository. This poster will focus on best practices to follow when harvesting CMR metadata to ensure that any changes made to the CMR can also be updated in a partner's own repository. Additionally, since each partner has distinct metadata formats they are able to consume, the best practices will also include guidance on retrieving the metadata in the desired metadata format using CMR's Unified Metadata Model translation software.

  3. Transforming Dermatologic Imaging for the Digital Era: Metadata and Standards.

    Science.gov (United States)

    Caffery, Liam J; Clunie, David; Curiel-Lewandrowski, Clara; Malvehy, Josep; Soyer, H Peter; Halpern, Allan C

    2018-01-17

    Imaging is increasingly being used in dermatology for documentation, diagnosis, and management of cutaneous disease. The lack of standards for dermatologic imaging is an impediment to clinical uptake. Standardization can occur in image acquisition, terminology, interoperability, and metadata. This paper presents the International Skin Imaging Collaboration position on standardization of metadata for dermatologic imaging. Metadata is essential to ensure that dermatologic images are properly managed and interpreted. There are two standards-based approaches to recording and storing metadata in dermatologic imaging. The first uses standard consumer image file formats, and the second is the file format and metadata model developed for the Digital Imaging and Communication in Medicine (DICOM) standard. DICOM would appear to provide an advantage over using consumer image file formats for metadata as it includes all the patient, study, and technical metadata necessary to use images clinically. Whereas, consumer image file formats only include technical metadata and need to be used in conjunction with another actor-for example, an electronic medical record-to supply the patient and study metadata. The use of DICOM may have some ancillary benefits in dermatologic imaging including leveraging DICOM network and workflow services, interoperability of images and metadata, leveraging existing enterprise imaging infrastructure, greater patient safety, and better compliance to legislative requirements for image retention.

  4. Science friction: data, metadata, and collaboration.

    Science.gov (United States)

    Edwards, Paul N; Mayernik, Matthew S; Batcheller, Archer L; Bowker, Geoffrey C; Borgman, Christine L

    2011-10-01

    When scientists from two or more disciplines work together on related problems, they often face what we call 'science friction'. As science becomes more data-driven, collaborative, and interdisciplinary, demand increases for interoperability among data, tools, and services. Metadata--usually viewed simply as 'data about data', describing objects such as books, journal articles, or datasets--serve key roles in interoperability. Yet we find that metadata may be a source of friction between scientific collaborators, impeding data sharing. We propose an alternative view of metadata, focusing on its role in an ephemeral process of scientific communication, rather than as an enduring outcome or product. We report examples of highly useful, yet ad hoc, incomplete, loosely structured, and mutable, descriptions of data found in our ethnographic studies of several large projects in the environmental sciences. Based on this evidence, we argue that while metadata products can be powerful resources, usually they must be supplemented with metadata processes. Metadata-as-process suggests the very large role of the ad hoc, the incomplete, and the unfinished in everyday scientific work.

  5. Study on high-level waste geological disposal metadata model

    International Nuclear Information System (INIS)

    Ding Xiaobin; Wang Changhong; Zhu Hehua; Li Xiaojun

    2008-01-01

    This paper expatiated the concept of metadata and its researches within china and abroad, then explain why start the study on the metadata model of high-level nuclear waste deep geological disposal project. As reference to GML, the author first set up DML under the framework of digital underground space engineering. Based on DML, a standardized metadata employed in high-level nuclear waste deep geological disposal project is presented. Then, a Metadata Model with the utilization of internet is put forward. With the standardized data and CSW services, this model may solve the problem in the data sharing and exchanging of different data form A metadata editor is build up in order to search and maintain metadata based on this model. (authors)

  6. ATLAS Metadata Task Force

    Energy Technology Data Exchange (ETDEWEB)

    ATLAS Collaboration; Costanzo, D.; Cranshaw, J.; Gadomski, S.; Jezequel, S.; Klimentov, A.; Lehmann Miotto, G.; Malon, D.; Mornacchi, G.; Nemethy, P.; Pauly, T.; von der Schmitt, H.; Barberis, D.; Gianotti, F.; Hinchliffe, I.; Mapelli, L.; Quarrie, D.; Stapnes, S.

    2007-04-04

    This document provides an overview of the metadata, which are needed to characterizeATLAS event data at different levels (a complete run, data streams within a run, luminosity blocks within a run, individual events).

  7. Data, Metadata - Who Cares?

    Science.gov (United States)

    Baumann, Peter

    2013-04-01

    There is a traditional saying that metadata are understandable, semantic-rich, and searchable. Data, on the other hand, are big, with no accessible semantics, and just downloadable. Not only has this led to an imbalance of search support form a user perspective, but also underneath to a deep technology divide often using relational databases for metadata and bespoke archive solutions for data. Our vision is that this barrier will be overcome, and data and metadata become searchable likewise, leveraging the potential of semantic technologies in combination with scalability technologies. Ultimately, in this vision ad-hoc processing and filtering will not distinguish any longer, forming a uniformly accessible data universe. In the European EarthServer initiative, we work towards this vision by federating database-style raster query languages with metadata search and geo broker technology. We present our approach taken, how it can leverage OGC standards, the benefits envisaged, and first results.

  8. Visual exploration of the attribute space of DANS EASY metadata

    NARCIS (Netherlands)

    ten Bosch, Olav; Scharnhorst, A.M.; Doorn, P.K.; Koning, Henk

    2012-01-01

    Study of the metadata of the Electronic Archiving System (EASY) of Data Archiving and Networked Services (DANS) for the purpose of getting insight in the internal structure of the collection. The visualization contains a dump of the EASY metadata set and all important data files that were generated

  9. Metadata Authoring with Versatility and Extensibility

    Science.gov (United States)

    Pollack, Janine; Olsen, Lola

    2004-01-01

    NASA's Global Change Master Directory (GCMD) assists the scientific community in the discovery of and linkage to Earth science data sets and related services. The GCMD holds over 13,800 data set descriptions in Directory Interchange Format (DIF) and 700 data service descriptions in Service Entry Resource Format (SERF), encompassing the disciplines of geology, hydrology, oceanography, meteorology, and ecology. Data descriptions also contain geographic coverage information and direct links to the data, thus allowing researchers to discover data pertaining to a geographic location of interest, then quickly acquire those data. The GCMD strives to be the preferred data locator for world-wide directory-level metadata. In this vein, scientists and data providers must have access to intuitive and efficient metadata authoring tools. Existing GCMD tools are attracting widespread usage; however, a need for tools that are portable, customizable and versatile still exists. With tool usage directly influencing metadata population, it has become apparent that new tools are needed to fill these voids. As a result, the GCMD has released a new authoring tool allowing for both web-based and stand-alone authoring of descriptions. Furthermore, this tool incorporates the ability to plug-and-play the metadata format of choice, offering users options of DIF, SERF, FGDC, ISO or any other defined standard. Allowing data holders to work with their preferred format, as well as an option of a stand-alone application or web-based environment, docBUlLDER will assist the scientific community in efficiently creating quality data and services metadata.

  10. ASDC Collaborations and Processes to Ensure Quality Metadata and Consistent Data Availability

    Science.gov (United States)

    Trapasso, T. J.

    2017-12-01

    With the introduction of new tools, faster computing, and less expensive storage, increased volumes of data are expected to be managed with existing or fewer resources. Metadata management is becoming a heightened challenge from the increase in data volume, resulting in more metadata records needed to be curated for each product. To address metadata availability and completeness, NASA ESDIS has taken significant strides with the creation of the United Metadata Model (UMM) and Common Metadata Repository (CMR). These UMM helps address hurdles experienced by the increasing number of metadata dialects and the CMR provides a primary repository for metadata so that required metadata fields can be served through a growing number of tools and services. However, metadata quality remains an issue as metadata is not always inherent to the end-user. In response to these challenges, the NASA Atmospheric Science Data Center (ASDC) created the Collaboratory for quAlity Metadata Preservation (CAMP) and defined the Product Lifecycle Process (PLP) to work congruently. CAMP is unique in that it provides science team members a UI to directly supply metadata that is complete, compliant, and accurate for their data products. This replaces back-and-forth communication that often results in misinterpreted metadata. Upon review by ASDC staff, metadata is submitted to CMR for broader distribution through Earthdata. Further, approval of science team metadata in CAMP automatically triggers the ASDC PLP workflow to ensure appropriate services are applied throughout the product lifecycle. This presentation will review the design elements of CAMP and PLP as well as demonstrate interfaces to each. It will show the benefits that CAMP and PLP provide to the ASDC that could potentially benefit additional NASA Earth Science Data and Information System (ESDIS) Distributed Active Archive Centers (DAACs).

  11. Populating and harvesting metadata in a virtual observatory

    Science.gov (United States)

    Walker, Raymond; King, Todd; Joy, Steven; Bargatze, Lee; Chi, Peter; Weygand, James

    Founded in 2007 the Virtual Magnetospheric Observatory (VMO) provides one stop shopping for data and services useful in magnetospheric research. The VMO's purview includes ground based observations as well as observations from spacecraft. The data and services for using and analyzing these data are found at laboratories distributed around the world. The VMO is itself a federated data system with branches at UCLA and the Goddard Space Flight Center (GSFC). These data can be connected by using a common data model. The VMO has selected the Space Physics Archive Search and Extract (SPASE) metadata standard for this purpose. SPASE metadata are collected and stored in distributed registries that are maintained along with the data at the location of the data provider. Populating the registries and extracting the metadata requested for a given study remain major challenges. In general there is little or no money available to data providers to create the metadata and populate the registries. We have taken a two pronged approach to minimize the effort required to create the metadata and maintain the registries. First part of the approach is human. We have appointed a group of domain experts called "X-Men". X-Men are expert in both magnetospheric physics and data management. They work closely with data providers to help them prepare the metadata and populate the registries. The second part of our approach is to develop a series of tools to populate and harvest information from the registries. We have developed SPASE editors for high level metadata and adopted the NASA Planetary Data System's Rule Set approach in which the science data are used to generate detailed level SPASE metadata. Finally we have developed a unique harvesting system to retrieve metadata from distributed registries in response to user queries.

  12. Mercury- Distributed Metadata Management, Data Discovery and Access System

    Science.gov (United States)

    Palanisamy, Giri; Wilson, Bruce E.; Devarakonda, Ranjeet; Green, James M.

    2007-12-01

    Mercury is a federated metadata harvesting, search and retrieval tool based on both open source and ORNL- developed software. It was originally developed for NASA, and the Mercury development consortium now includes funding from NASA, USGS, and DOE. Mercury supports various metadata standards including XML, Z39.50, FGDC, Dublin-Core, Darwin-Core, EML, and ISO-19115 (under development). Mercury provides a single portal to information contained in disparate data management systems. It collects metadata and key data from contributing project servers distributed around the world and builds a centralized index. The Mercury search interfaces then allow the users to perform simple, fielded, spatial and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury supports various projects including: ORNL DAAC, NBII, DADDI, LBA, NARSTO, CDIAC, OCEAN, I3N, IAI, ESIP and ARM. The new Mercury system is based on a Service Oriented Architecture and supports various services such as Thesaurus Service, Gazetteer Web Service and UDDI Directory Services. This system also provides various search services including: RSS, Geo-RSS, OpenSearch, Web Services and Portlets. Other features include: Filtering and dynamic sorting of search results, book-markable search results, save, retrieve, and modify search criteria.

  13. Achieving interoperability for metadata registries using comparative object modeling.

    Science.gov (United States)

    Park, Yu Rang; Kim, Ju Han

    2010-01-01

    Achieving data interoperability between organizations relies upon agreed meaning and representation (metadata) of data. For managing and registering metadata, many organizations have built metadata registries (MDRs) in various domains based on international standard for MDR framework, ISO/IEC 11179. Following this trend, two pubic MDRs in biomedical domain have been created, United States Health Information Knowledgebase (USHIK) and cancer Data Standards Registry and Repository (caDSR), from U.S. Department of Health & Human Services and National Cancer Institute (NCI), respectively. Most MDRs are implemented with indiscriminate extending for satisfying organization-specific needs and solving semantic and structural limitation of ISO/IEC 11179. As a result it is difficult to address interoperability among multiple MDRs. In this paper, we propose an integrated metadata object model for achieving interoperability among multiple MDRs. To evaluate this model, we developed an XML Schema Definition (XSD)-based metadata exchange format. We created an XSD-based metadata exporter, supporting both the integrated metadata object model and organization-specific MDR formats.

  14. Cytometry metadata in XML

    Science.gov (United States)

    Leif, Robert C.; Leif, Stephanie H.

    2016-04-01

    Introduction: The International Society for Advancement of Cytometry (ISAC) has created a standard for the Minimum Information about a Flow Cytometry Experiment (MIFlowCyt 1.0). CytometryML will serve as a common metadata standard for flow and image cytometry (digital microscopy). Methods: The MIFlowCyt data-types were created, as is the rest of CytometryML, in the XML Schema Definition Language (XSD1.1). The datatypes are primarily based on the Flow Cytometry and the Digital Imaging and Communication (DICOM) standards. A small section of the code was formatted with standard HTML formatting elements (p, h1, h2, etc.). Results:1) The part of MIFlowCyt that describes the Experimental Overview including the specimen and substantial parts of several other major elements has been implemented as CytometryML XML schemas (www.cytometryml.org). 2) The feasibility of using MIFlowCyt to provide the combination of an overview, table of contents, and/or an index of a scientific paper or a report has been demonstrated. Previously, a sample electronic publication, EPUB, was created that could contain both MIFlowCyt metadata as well as the binary data. Conclusions: The use of CytometryML technology together with XHTML5 and CSS permits the metadata to be directly formatted and together with the binary data to be stored in an EPUB container. This will facilitate: formatting, data- mining, presentation, data verification, and inclusion in structured research, clinical, and regulatory documents, as well as demonstrate a publication's adherence to the MIFlowCyt standard, promote interoperability and should also result in the textual and numeric data being published using web technology without any change in composition.

  15. Flexible Community-driven Metadata with the Component Metadata Infrastructure

    NARCIS (Netherlands)

    Windhouwer, M.; Goosen, Twan; Mosutka, Jozef; Van Uytvanck, D.; Broeder, D.

    Many researchers, from the humanities and other domains, have a strong need to study resources in close detail. Nowadays more and more of these resources are available online. To make these resources discoverable, they are described with metadata. These metadata records are collected and made

  16. Metadata Realities for Cyberinfrastructure: Data Authors as Metadata Creators

    Science.gov (United States)

    Mayernik, Matthew Stephen

    2011-01-01

    As digital data creation technologies become more prevalent, data and metadata management are necessary to make data available, usable, sharable, and storable. Researchers in many scientific settings, however, have little experience or expertise in data and metadata management. In this dissertation, I explore the everyday data and metadata…

  17. Toward element-level interoperability in bibliographic metadata

    Directory of Open Access Journals (Sweden)

    Eric Childress

    2008-03-01

    Full Text Available This paper discusses an approach and set of tools for translating bibliographic metadata from one format to another. A computational model is proposed to formalize the notion of a 'crosswalk'. The translation process separates semantics from syntax, and specifies a crosswalk as machine executable translation files which are focused on assertions of element equivalence and are closely associated with the underlying intellectual analysis of metadata translation. A data model developed by the authors called Morfrom serves as an internal generic metadata format. Translation logic is written in an XML scripting language designed by the authors called the Semantic Equivalence Expression Language (Seel. These techniques have been built into an OCLC software toolkit to manage large and diverse collections of metadata records, called the Crosswalk Web Service.

  18. Creating preservation metadata from XML-metadata profiles

    Science.gov (United States)

    Ulbricht, Damian; Bertelmann, Roland; Gebauer, Petra; Hasler, Tim; Klump, Jens; Kirchner, Ingo; Peters-Kottig, Wolfgang; Mettig, Nora; Rusch, Beate

    2014-05-01

    Registration of dataset DOIs at DataCite makes research data citable and comes with the obligation to keep data accessible in the future. In addition, many universities and research institutions measure data that is unique and not repeatable like the data produced by an observational network and they want to keep these data for future generations. In consequence, such data should be ingested in preservation systems, that automatically care for file format changes. Open source preservation software that is developed along the definitions of the ISO OAIS reference model is available but during ingest of data and metadata there are still problems to be solved. File format validation is difficult, because format validators are not only remarkably slow - due to variety in file formats different validators return conflicting identification profiles for identical data. These conflicts are hard to resolve. Preservation systems have a deficit in the support of custom metadata. Furthermore, data producers are sometimes not aware that quality metadata is a key issue for the re-use of data. In the project EWIG an university institute and a research institute work together with Zuse-Institute Berlin, that is acting as an infrastructure facility, to generate exemplary workflows for research data into OAIS compliant archives with emphasis on the geosciences. The Institute for Meteorology provides timeseries data from an urban monitoring network whereas GFZ Potsdam delivers file based data from research projects. To identify problems in existing preservation workflows the technical work is complemented by interviews with data practitioners. Policies for handling data and metadata are developed. Furthermore, university teaching material is created to raise the future scientists awareness of research data management. As a testbed for ingest workflows the digital preservation system Archivematica [1] is used. During the ingest process metadata is generated that is compliant to the

  19. Mercury Toolset for Spatiotemporal Metadata

    Science.gov (United States)

    Devarakonda, Ranjeet; Palanisamy, Giri; Green, James; Wilson, Bruce; Rhyne, B. Timothy; Lindsley, Chris

    2010-06-01

    Mercury (http://mercury.ornl.gov) is a set of tools for federated harvesting, searching, and retrieving metadata, particularly spatiotemporal metadata. Version 3.0 of the Mercury toolset provides orders of magnitude improvements in search speed, support for additional metadata formats, integration with Google Maps for spatial queries, facetted type search, support for RSS (Really Simple Syndication) delivery of search results, and enhanced customization to meet the needs of the multiple projects that use Mercury. It provides a single portal to very quickly search for data and information contained in disparate data management systems, each of which may use different metadata formats. Mercury harvests metadata and key data from contributing project servers distributed around the world and builds a centralized index. The search interfaces then allow the users to perform a variety of fielded, spatial, and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury periodically (typically daily)harvests metadata sources through a collection of interfaces and re-indexes these metadata to provide extremely rapid search capabilities, even over collections with tens of millions of metadata records. A number of both graphical and application interfaces have been constructed within Mercury, to enable both human users and other computer programs to perform queries. Mercury was also designed to support multiple different projects, so that the particular fields that can be queried and used with search filters are easy to configure for each different project.

  20. Mercury Toolset for Spatiotemporal Metadata

    Science.gov (United States)

    Wilson, Bruce E.; Palanisamy, Giri; Devarakonda, Ranjeet; Rhyne, B. Timothy; Lindsley, Chris; Green, James

    2010-01-01

    Mercury (http://mercury.ornl.gov) is a set of tools for federated harvesting, searching, and retrieving metadata, particularly spatiotemporal metadata. Version 3.0 of the Mercury toolset provides orders of magnitude improvements in search speed, support for additional metadata formats, integration with Google Maps for spatial queries, facetted type search, support for RSS (Really Simple Syndication) delivery of search results, and enhanced customization to meet the needs of the multiple projects that use Mercury. It provides a single portal to very quickly search for data and information contained in disparate data management systems, each of which may use different metadata formats. Mercury harvests metadata and key data from contributing project servers distributed around the world and builds a centralized index. The search interfaces then allow the users to perform a variety of fielded, spatial, and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury periodically (typically daily) harvests metadata sources through a collection of interfaces and re-indexes these metadata to provide extremely rapid search capabilities, even over collections with tens of millions of metadata records. A number of both graphical and application interfaces have been constructed within Mercury, to enable both human users and other computer programs to perform queries. Mercury was also designed to support multiple different projects, so that the particular fields that can be queried and used with search filters are easy to configure for each different project.

  1. Developing the CUAHSI Metadata Profile

    Science.gov (United States)

    Piasecki, M.; Bermudez, L.; Islam, S.; Beran, B.

    2004-12-01

    The Hydrologic Information System (HIS), of the Consortium of Universities for the Advancement of Hydrologic Science Inc., (CUAHSI), has as one of its goals to improve access to large volume, high quality, and heterogeneous hydrologic data sets. This will be attained in part by adopting a community metadata profile to achieve consistent descriptions that will facilitate data discovery. However, common standards are quite general in nature and typically lack domain specific vocabularies, complicating the adoption of standards for specific communities. We will show and demonstrate the problems encountered in the process of adopting ISO standards to create a CUAHSI metadata profile. The final schema is expressed in a simple metadata format, Metadata Template File (MTF), to leverage metadata annotations/viewer tools already developed by the San Diego Super Computer Center. The steps performed to create an MTF starting from ISO 19115:2003 are the following: 1) creation of ontologies using the Web Ontology Language (OWL) for ISO:19115 2003 and related ISO/TC 211 documents; 2) conceptualization in OWL of related hydrologic vocabularies such as NASA's Global Change Master Directory and units from the Hydrologic Handbook; 3) definition of CUAHSI profile by importing and extending the previous ontologies; 4) explicit creation of CUAHSI core set 5) export of the core set to MTF); 6) definition of metadata blocks for arbitrary digital objects (e.g. time series vs static-spatial data) using ISO's methodology for feature cataloguing; and 7) export of metadata blocks to MTF.

  2. Metadata Dictionary Database: A Proposed Tool for Academic Library Metadata Management

    Science.gov (United States)

    Southwick, Silvia B.; Lampert, Cory

    2011-01-01

    This article proposes a metadata dictionary (MDD) be used as a tool for metadata management. The MDD is a repository of critical data necessary for managing metadata to create "shareable" digital collections. An operational definition of metadata management is provided. The authors explore activities involved in metadata management in…

  3. In Interactive, Web-Based Approach to Metadata Authoring

    Science.gov (United States)

    Pollack, Janine; Wharton, Stephen W. (Technical Monitor)

    2001-01-01

    NASA's Global Change Master Directory (GCMD) serves a growing number of users by assisting the scientific community in the discovery of and linkage to Earth science data sets and related services. The GCMD holds over 8000 data set descriptions in Directory Interchange Format (DIF) and 200 data service descriptions in Service Entry Resource Format (SERF), encompassing the disciplines of geology, hydrology, oceanography, meteorology, and ecology. Data descriptions also contain geographic coverage information, thus allowing researchers to discover data pertaining to a particular geographic location, as well as subject of interest. The GCMD strives to be the preeminent data locator for world-wide directory level metadata. In this vein, scientists and data providers must have access to intuitive and efficient metadata authoring tools. Existing GCMD tools are not currently attracting. widespread usage. With usage being the prime indicator of utility, it has become apparent that current tools must be improved. As a result, the GCMD has released a new suite of web-based authoring tools that enable a user to create new data and service entries, as well as modify existing data entries. With these tools, a more interactive approach to metadata authoring is taken, as they feature a visual "checklist" of data/service fields that automatically update when a field is completed. In this way, the user can quickly gauge which of the required and optional fields have not been populated. With the release of these tools, the Earth science community will be further assisted in efficiently creating quality data and services metadata. Keywords: metadata, Earth science, metadata authoring tools

  4. The metadata manual a practical workbook

    CERN Document Server

    Lubas, Rebecca; Schneider, Ingrid

    2013-01-01

    Cultural heritage professionals have high levels of training in metadata. However, the institutions in which they practice often depend on support staff, volunteers, and students in order to function. With limited time and funding for training in metadata creation for digital collections, there are often many questions about metadata without a reliable, direct source for answers. The Metadata Manual provides such a resource, answering basic metadata questions that may appear, and exploring metadata from a beginner's perspective. This title covers metadata basics, XML basics, Dublin Core, VRA C

  5. FSA 2002 Digital Orthophoto Metadata

    Data.gov (United States)

    Minnesota Department of Natural Resources — Metadata for the 2002 FSA Color Orthophotos Layer. Each orthophoto is represented by a Quarter 24k Quad tile polygon. The polygon attributes contain the quarter-quad...

  6. phosphorus retention data and metadata

    Data.gov (United States)

    U.S. Environmental Protection Agency — phosphorus retention in wetlands data and metadata. This dataset is associated with the following publication: Lane , C., and B. Autrey. Phosphorus retention of...

  7. Pragmatic Metadata Management for Integration into Multiple Spatial Data Infrastructure Systems and Platforms

    Science.gov (United States)

    Benedict, K. K.; Scott, S.

    2013-12-01

    While there has been a convergence towards a limited number of standards for representing knowledge (metadata) about geospatial (and other) data objects and collections, there exist a variety of community conventions around the specific use of those standards and within specific data discovery and access systems. This combination of limited (but multiple) standards and conventions creates a challenge for system developers that aspire to participate in multiple data infrastrucutres, each of which may use a different combination of standards and conventions. While Extensible Markup Language (XML) is a shared standard for encoding most metadata, traditional direct XML transformations (XSLT) from one standard to another often result in an imperfect transfer of information due to incomplete mapping from one standard's content model to another. This paper presents the work at the University of New Mexico's Earth Data Analysis Center (EDAC) in which a unified data and metadata management system has been developed in support of the storage, discovery and access of heterogeneous data products. This system, the Geographic Storage, Transformation and Retrieval Engine (GSTORE) platform has adopted a polyglot database model in which a combination of relational and document-based databases are used to store both data and metadata, with some metadata stored in a custom XML schema designed as a superset of the requirements for multiple target metadata standards: ISO 19115-2/19139/19110/19119, FGCD CSDGM (both with and without remote sensing extensions) and Dublin Core. Metadata stored within this schema is complemented by additional service, format and publisher information that is dynamically "injected" into produced metadata documents when they are requested from the system. While mapping from the underlying common metadata schema is relatively straightforward, the generation of valid metadata within each target standard is necessary but not sufficient for integration into

  8. A Shared Infrastructure for Federated Search Across Distributed Scientific Metadata Catalogs

    Science.gov (United States)

    Reed, S. A.; Truslove, I.; Billingsley, B. W.; Grauch, A.; Harper, D.; Kovarik, J.; Lopez, L.; Liu, M.; Brandt, M.

    2013-12-01

    The vast amount of science metadata can be overwhelming and highly complex. Comprehensive analysis and sharing of metadata is difficult since institutions often publish to their own repositories. There are many disjoint standards used for publishing scientific data, making it difficult to discover and share information from different sources. Services that publish metadata catalogs often have different protocols, formats, and semantics. The research community is limited by the exclusivity of separate metadata catalogs and thus it is desirable to have federated search interfaces capable of unified search queries across multiple sources. Aggregation of metadata catalogs also enables users to critique metadata more rigorously. With these motivations in mind, the National Snow and Ice Data Center (NSIDC) and Advanced Cooperative Arctic Data and Information Service (ACADIS) implemented two search interfaces for the community. Both the NSIDC Search and ACADIS Arctic Data Explorer (ADE) use a common infrastructure which keeps maintenance costs low. The search clients are designed to make OpenSearch requests against Solr, an Open Source search platform. Solr applies indexes to specific fields of the metadata which in this instance optimizes queries containing keywords, spatial bounds and temporal ranges. NSIDC metadata is reused by both search interfaces but the ADE also brokers additional sources. Users can quickly find relevant metadata with minimal effort and ultimately lowers costs for research. This presentation will highlight the reuse of data and code between NSIDC and ACADIS, discuss challenges and milestones for each project, and will identify creation and use of Open Source libraries.

  9. Evolving Metadata in NASA Earth Science Data Systems

    Science.gov (United States)

    Mitchell, A.; Cechini, M. F.; Walter, J.

    2011-12-01

    NASA's Earth Observing System (EOS) is a coordinated series of satellites for long term global observations. NASA's Earth Observing System Data and Information System (EOSDIS) is a petabyte-scale archive of environmental data that supports global climate change research by providing end-to-end services from EOS instrument data collection to science data processing to full access to EOS and other earth science data. On a daily basis, the EOSDIS ingests, processes, archives and distributes over 3 terabytes of data from NASA's Earth Science missions representing over 3500 data products ranging from various types of science disciplines. EOSDIS is currently comprised of 12 discipline specific data centers that are collocated with centers of science discipline expertise. Metadata is used in all aspects of NASA's Earth Science data lifecycle from the initial measurement gathering to the accessing of data products. Missions use metadata in their science data products when describing information such as the instrument/sensor, operational plan, and geographically region. Acting as the curator of the data products, data centers employ metadata for preservation, access and manipulation of data. EOSDIS provides a centralized metadata repository called the Earth Observing System (EOS) ClearingHouse (ECHO) for data discovery and access via a service-oriented-architecture (SOA) between data centers and science data users. ECHO receives inventory metadata from data centers who generate metadata files that complies with the ECHO Metadata Model. NASA's Earth Science Data and Information System (ESDIS) Project established a Tiger Team to study and make recommendations regarding the adoption of the international metadata standard ISO 19115 in EOSDIS. The result was a technical report recommending an evolution of NASA data systems towards a consistent application of ISO 19115 and related standards including the creation of a NASA-specific convention for core ISO 19115 elements. Part of

  10. openPDS: protecting the privacy of metadata through SafeAnswers.

    Directory of Open Access Journals (Sweden)

    Yves-Alexandre de Montjoye

    Full Text Available The rise of smartphones and web services made possible the large-scale collection of personal metadata. Information about individuals' location, phone call logs, or web-searches, is collected and used intensively by organizations and big data researchers. Metadata has however yet to realize its full potential. Privacy and legal concerns, as well as the lack of technical solutions for personal metadata management is preventing metadata from being shared and reconciled under the control of the individual. This lack of access and control is furthermore fueling growing concerns, as it prevents individuals from understanding and managing the risks associated with the collection and use of their data. Our contribution is two-fold: (1 we describe openPDS, a personal metadata management framework that allows individuals to collect, store, and give fine-grained access to their metadata to third parties. It has been implemented in two field studies; (2 we introduce and analyze SafeAnswers, a new and practical way of protecting the privacy of metadata at an individual level. SafeAnswers turns a hard anonymization problem into a more tractable security one. It allows services to ask questions whose answers are calculated against the metadata instead of trying to anonymize individuals' metadata. The dimensionality of the data shared with the services is reduced from high-dimensional metadata to low-dimensional answers that are less likely to be re-identifiable and to contain sensitive information. These answers can then be directly shared individually or in aggregate. openPDS and SafeAnswers provide a new way of dynamically protecting personal metadata, thereby supporting the creation of smart data-driven services and data science research.

  11. CyberSKA Radio Imaging Metadata and VO Compliance Engineering

    Science.gov (United States)

    Anderson, K. R.; Rosolowsky, E.; Dowler, P.

    2013-10-01

    The CyberSKA project has written a specification for the metadata encapsulation of radio astronomy data products pursuant to insertion into the VO-compliant Common Archive Observation Model (CAOM) database hosted by the Canadian Astronomy Data Centre (CADC). This specification accommodates radio FITS Image and UV Visibility data, as well as pure CASA Tables Imaging and Visibility Measurement Sets. To extract and engineer radio metadata, we have authored two software packages: metaData (v0.5.0) and mddb (v1.3). Together, these Python packages can convert all the above stated data format types into concise FITS-like header files, engineer the metadata to conform to the CAOM data model, and then insert these engineered data into the CADC database, which subsequently becomes published through the Canadian Virtual Observatory. The metaData and mddb packages have, for the first time, published ALMA imaging data on VO services. Our ongoing work aims to integrate visibility data from ALMA and the SKA into VO services and to enable user-submitted radio data to move seamlessly into the Virtual Observatory.

  12. OAI-PMH repositories : quality issues regarding metadata and protocol compliance, tutorial 1

    CERN Document Server

    CERN. Geneva; Cole, Tim

    2005-01-01

    This tutorial will provide an overview of emerging guidelines and best practices for OAI data providers and how they relate to expectations and needs of service providers. The audience should already be familiar with OAI protocol basics and have at least some experience with either data provider or service provider implementations. The speakers will present both protocol compliance best practices and general recommendations for creating and disseminating high-quality "shareable metadata". Protocol best practices discussion will include coverage of OAI identifiers, date-stamps, deleted records, sets, resumption tokens, about containers, branding, errors conditions, HTTP server issues, and repository lifecycle issues. Discussion of what makes for good, shareable metadata will cover topics including character encoding, namespace and XML schema issues, metadata crosswalk issues, support of multiple metadata formats, general metadata authoring recommendations, specific recommendations for use of Dublin Core elemen...

  13. Digital Forensics Tool Testing - Image Metadata in the Cloud

    OpenAIRE

    Clark, Philip

    2011-01-01

    As cloud based services are becoming a common way for users to store and share images on the internet, this adds a new layer to the traditional digital forensics examination, which could cause additional potential errors in the investigation. Courtroom forensics evidence has historically been criticised for lacking a scientific basis. This thesis aims to present an approach for testing to what extent cloud based services alter or remove metadata in the images stored through such services. ...

  14. On the Origin of Metadata

    Directory of Open Access Journals (Sweden)

    Sam Coppens

    2012-12-01

    Full Text Available Metadata has been around and has evolved for centuries, albeit not recognized as such. Medieval manuscripts typically had illuminations at the start of each chapter, being both a kind of signature for the author writing the script and a pictorial chapter anchor for the illiterates at the time. Nowadays, there is so much fragmented information on the Internet that users sometimes fail to distinguish the real facts from some bended truth, let alone being able to interconnect different facts. Here, the metadata can both act as noise-reductors for detailed recommendations to the end-users, as it can be the catalyst to interconnect related information. Over time, metadata thus not only has had different modes of information, but furthermore, metadata’s relation of information to meaning, i.e., “semantics”, evolved. Darwin’s evolutionary propositions, from “species have an unlimited reproductive capacity”, over “natural selection”, to “the cooperation of mutations leads to adaptation to the environment” show remarkable parallels to both metadata’s different modes of information and to its relation of information to meaning over time. In this paper, we will show that the evolution of the use of (metadata can be mapped to Darwin’s nine evolutionary propositions. As mankind and its behavior are products of an evolutionary process, the evolutionary process of metadata with its different modes of information is on the verge of a new-semantic-era.

  15. THE NEW ONLINE METADATA EDITOR FOR GENERATING STRUCTURED METADATA

    Energy Technology Data Exchange (ETDEWEB)

    Devarakonda, Ranjeet [ORNL; Shrestha, Biva [ORNL; Palanisamy, Giri [ORNL; Hook, Leslie A [ORNL; Killeffer, Terri S [ORNL; Boden, Thomas A [ORNL; Cook, Robert B [ORNL; Zolly, Lisa [United States Geological Service (USGS); Hutchison, Viv [United States Geological Service (USGS); Frame, Mike [United States Geological Service (USGS); Cialella, Alice [Brookhaven National Laboratory (BNL); Lazer, Kathy [Brookhaven National Laboratory (BNL)

    2014-01-01

    Nobody is better suited to describe data than the scientist who created it. This description about a data is called Metadata. In general terms, Metadata represents the who, what, when, where, why and how of the dataset [1]. eXtensible Markup Language (XML) is the preferred output format for metadata, as it makes it portable and, more importantly, suitable for system discoverability. The newly developed ORNL Metadata Editor (OME) is a Web-based tool that allows users to create and maintain XML files containing key information, or metadata, about the research. Metadata include information about the specific projects, parameters, time periods, and locations associated with the data. Such information helps put the research findings in context. In addition, the metadata produced using OME will allow other researchers to find these data via Metadata clearinghouses like Mercury [2][4]. OME is part of ORNL s Mercury software fleet [2][3]. It was jointly developed to support projects funded by the United States Geological Survey (USGS), U.S. Department of Energy (DOE), National Aeronautics and Space Administration (NASA) and National Oceanic and Atmospheric Administration (NOAA). OME s architecture provides a customizable interface to support project-specific requirements. Using this new architecture, the ORNL team developed OME instances for USGS s Core Science Analytics, Synthesis, and Libraries (CSAS&L), DOE s Next Generation Ecosystem Experiments (NGEE) and Atmospheric Radiation Measurement (ARM) Program, and the international Surface Ocean Carbon Dioxide ATlas (SOCAT). Researchers simply use the ORNL Metadata Editor to enter relevant metadata into a Web-based form. From the information on the form, the Metadata Editor can create an XML file on the server that the editor is installed or to the user s personal computer. Researchers can also use the ORNL Metadata Editor to modify existing XML metadata files. As an example, an NGEE Arctic scientist use OME to register

  16. The Machinic Temporality of Metadata

    Directory of Open Access Journals (Sweden)

    Claudio Celis

    2015-03-01

    Full Text Available In 1990 Deleuze introduced the hypothesis that disciplinary societies are gradually being replaced by a new logic of power: control. Accordingly, Matteo Pasquinelli has recently argued that we are moving towards societies of metadata, which correspond to a new stage of what Deleuze called control societies. Societies of metadata are characterised for the central role that meta-information acquires both as a source of surplus value and as an apparatus of social control. The aim of this article is to develop Pasquinelli’s thesis by examining the temporal scope of these emerging societies of metadata. In particular, this article employs Guattari’s distinction between human and machinic times. Through these two concepts, this article attempts to show how societies of metadata combine the two poles of capitalist power formations as identified by Deleuze and Guattari, i.e. social subjection and machinic enslavement. It begins by presenting the notion of metadata in order to identify some of the defining traits of contemporary capitalism. It then examines Berardi’s account of the temporality of the attention economy from the perspective of the asymmetric relation between cyber-time and human time. The third section challenges Berardi’s definition of the temporality of the attention economy by using Guattari’s notions of human and machinic times. Parts four and five fall back upon Deleuze and Guattari’s notions of machinic surplus labour and machinic enslavement, respectively. The concluding section tries to show that machinic and human times constitute two poles of contemporary power formations that articulate the temporal dimension of societies of metadata.

  17. Improving Scientific Metadata Interoperability And Data Discoverability using OAI-PMH

    Science.gov (United States)

    Devarakonda, Ranjeet; Palanisamy, Giri; Green, James M.; Wilson, Bruce E.

    2010-12-01

    While general-purpose search engines (such as Google or Bing) are useful for finding many things on the Internet, they are often of limited usefulness for locating Earth Science data relevant (for example) to a specific spatiotemporal extent. By contrast, tools that search repositories of structured metadata can locate relevant datasets with fairly high precision, but the search is limited to that particular repository. Federated searches (such as Z39.50) have been used, but can be slow and the comprehensiveness can be limited by downtime in any search partner. An alternative approach to improve comprehensiveness is for a repository to harvest metadata from other repositories, possibly with limits based on subject matter or access permissions. Searches through harvested metadata can be extremely responsive, and the search tool can be customized with semantic augmentation appropriate to the community of practice being served. However, there are a number of different protocols for harvesting metadata, with some challenges for ensuring that updates are propagated and for collaborations with repositories using differing metadata standards. The Open Archive Initiative Protocol for Metadata Handling (OAI-PMH) is a standard that is seeing increased use as a means for exchanging structured metadata. OAI-PMH implementations must support Dublin Core as a metadata standard, with other metadata formats as optional. We have developed tools which enable our structured search tool (Mercury; http://mercury.ornl.gov) to consume metadata from OAI-PMH services in any of the metadata formats we support (Dublin Core, Darwin Core, FCDC CSDGM, GCMD DIF, EML, and ISO 19115/19137). We are also making ORNL DAAC metadata available through OAI-PMH for other metadata tools to utilize, such as the NASA Global Change Master Directory, GCMD). This paper describes Mercury capabilities with multiple metadata formats, in general, and, more specifically, the results of our OAI-PMH implementations and

  18. Enriching The Metadata On CDS

    CERN Document Server

    Chhibber, Nalin

    2014-01-01

    The project report revolves around the open source software package called Invenio. It provides the tools for management of digital assets in a repository and drives CERN Document Server. Primary objective is to enhance the existing metadata in CDS with data from other libraries. An implicit part of this task is to manage disambiguation (within incoming data), removal of multiple entries and handle replications between new and existing records. All such elements and their corresponding changes are integrated within Invenio to make the upgraded metadata available on the CDS. Latter part of the report discuss some changes related to the Invenio code-base itself.

  19. U.S. EPA Metadata Editor (EME)

    Data.gov (United States)

    U.S. Environmental Protection Agency — The EPA Metadata Editor (EME) allows users to create geospatial metadata that meets EPA's requirements. The tool has been developed as a desktop application that...

  20. Why can't I manage my digital images like MP3s? The evolution and intent of multimedia metadata

    Science.gov (United States)

    Goodrum, Abby; Howison, James

    2005-01-01

    This paper considers the deceptively simple question: Why can't digital images be managed in the simple and effective manner in which digital music files are managed? We make the case that the answer is different treatments of metadata in different domains with different goals. A central difference between the two formats stems from the fact that digital music metadata lookup services are collaborative and automate the movement from a digital file to the appropriate metadata, while image metadata services do not. To understand why this difference exists we examine the divergent evolution of metadata standards for digital music and digital images and observed that the processes differ in interesting ways according to their intent. Specifically music metadata was developed primarily for personal file management and community resource sharing, while the focus of image metadata has largely been on information retrieval. We argue that lessons from MP3 metadata can assist individuals facing their growing personal image management challenges. Our focus therefore is not on metadata for cultural heritage institutions or the publishing industry, it is limited to the personal libraries growing on our hard-drives. This bottom-up approach to file management combined with p2p distribution radically altered the music landscape. Might such an approach have a similar impact on image publishing? This paper outlines plans for improving the personal management of digital images-doing image metadata and file management the MP3 way-and considers the likelihood of success.

  1. The essential guide to metadata for books

    CERN Document Server

    Register, Renee

    2013-01-01

    In The Essential Guide to Metadata for Books, you will learn exactly what you need to know to effectively generate, handle and disseminate metadata for books and ebooks. This comprehensive but digestible document will explain the life-cycle of book metadata, industry standards, XML, ONIX and the essential elements of metadata. It will also show you how effective, well-organized metadata can improve your efforts to sell a book, especially when it comes to marketing, discoverability and converting at the point of sale. This information-packed document also includes a glossary of terms

  2. Using URIs to effectively transmit sensor data and metadata

    Science.gov (United States)

    Kokkinaki, Alexandra; Buck, Justin; Darroch, Louise; Gardner, Thomas

    2017-04-01

    Autonomous ocean observation is massively increasing the number of sensors in the ocean. Accordingly, the continuing increase in datasets produced, makes selecting sensors that are fit for purpose a growing challenge. Decision making on selecting quality sensor data, is based on the sensor's metadata, i.e. manufacturer specifications, history of calibrations etc. The Open Geospatial Consortium (OGC) has developed the Sensor Web Enablement (SWE) standards to facilitate integration and interoperability of sensor data and metadata. The World Wide Web Consortium (W3C) Semantic Web technologies enable machine comprehensibility promoting sophisticated linking and processing of data published on the web. Linking the sensor's data and metadata according to the above-mentioned standards can yield practical difficulties, because of internal hardware bandwidth restrictions and a requirement to constrain data transmission costs. Our approach addresses these practical difficulties by uniquely identifying sensor and platform models and instances through URIs, which resolve via content negotiation to either OGC's sensor meta language, sensorML or W3C's Linked Data. Data transmitted by a sensor incorporate the sensor's unique URI to refer to its metadata. Sensor and platform model URIs and descriptions are created and hosted by the British Oceanographic Data Centre (BODC) linked systems service. The sensor owner creates the sensor and platform instance URIs prior and during sensor deployment, through an updatable web form, the Sensor Instance Form (SIF). SIF enables model and instance URI association but also platform and sensor linking. The use of URIs, which are dynamically generated through the SIF, offers both practical and economical benefits to the implementation of SWE and Linked Data standards in near real time systems. Data can be linked to metadata dynamically in-situ while saving on the costs associated to the transmission of long metadata descriptions. The transmission

  3. A Method for Automating Geospatial Dataset Metadata

    Directory of Open Access Journals (Sweden)

    Robert I. Dunfey

    2009-11-01

    Full Text Available Metadata have long been recognised as crucial to geospatial asset management and discovery, and yet undertaking their creation remains an unenviable task often to be avoided. This paper proposes a practical approach designed to address such concerns, decomposing various data creation, management, update and documentation process steps that are subsequently leveraged to contribute towards metadata record completion. Using a customised utility embedded within a common GIS application, metadata elements are computationally derived from an imposed feature metadata standard, dataset geometry, an integrated storage protocol and pre-prepared content, and instantiated within a common geospatial discovery convention. Yielding 27 out of a 32 total metadata elements (or 15 out of 17 mandatory elements the approach demonstrably lessens the burden of metadata authorship. It also encourages improved geospatial asset management whilst outlining core requisites for developing a more open metadata strategy not bound to any particular application domain.

  4. A Solution to Metadata: Using XML Transformations to Automate Metadata

    Science.gov (United States)

    2010-06-01

    database development that requires little or no programming knowledge and skill; however, a working knowledge of schemas and metadata standards is...against the FGDC CSDGM schema using NCDDC’s MERMAid tool [15]. All randomly sampled records passed validation. Record content was also visually inspected...Beverley, MA: Altova GmbH & Altova, Inc., 2003. [15] NOAA National Coastal Data Development Center, MERMAid (computer software), version 1.2, Stennis Space Center, MS.

  5. Provenance Description of Metadata Vocabularies for the Long-term Maintenance of Metadata

    Directory of Open Access Journals (Sweden)

    Chunqiu Li

    2017-03-01

    Full Text Available Purpose: The purpose of this paper is to discuss provenance description of metadata terms and metadata vocabularies as a set of metadata terms. Provenance is crucial information to keep track of changes of metadata terms and metadata vocabularies for their consistent maintenance. Design/methodology/approach: The W3C PROV standard for general provenance description and Resource Description Framework (RDF are adopted as the base models to formally define provenance description for metadata vocabularies. Findings: This paper defines a few primitive change types of metadata terms, and a provenance description model of the metadata terms based on the primitive change types. We also provide examples of provenance description in RDF graphs to show the proposed model. Research limitations: The model proposed in this paper is defined based on a few primitive relationships (e.g. addition, deletion, and replacement between pre-version and post-version of a metadata term. The model is simplified and the practical changes of metadata terms can be more complicated than the primitive relationships discussed in the model. Practical implications: Formal provenance description of metadata vocabularies can improve maintainability of metadata vocabularies over time. Conventional maintenance of metadata terms is the maintenance of documents of terms. The proposed model enables effective and automated tracking of change history of metadata vocabularies using simple formal description scheme defined based on widely-used standards. Originality/value: Changes in metadata vocabularies may cause inconsistencies in the long-term use of metadata. This paper proposes a simple and formal scheme of provenance description of metadata vocabularies. The proposed model works as the basis of automated maintenance of metadata terms and their vocabularies and is applicable to various types of changes.

  6. Evolution in Metadata Quality: Common Metadata Repository's Role in NASA Curation Efforts

    Science.gov (United States)

    Gilman, Jason; Shum, Dana; Baynes, Katie

    2016-01-01

    Metadata Quality is one of the chief drivers of discovery and use of NASA EOSDIS (Earth Observing System Data and Information System) data. Issues with metadata such as lack of completeness, inconsistency, and use of legacy terms directly hinder data use. As the central metadata repository for NASA Earth Science data, the Common Metadata Repository (CMR) has a responsibility to its users to ensure the quality of CMR search results. This poster covers how we use humanizers, a technique for dealing with the symptoms of metadata issues, as well as our plans for future metadata validation enhancements. The CMR currently indexes 35K collections and 300M granules.

  7. phosphorus retention data and metadata

    Science.gov (United States)

    phosphorus retention in wetlands data and metadataThis dataset is associated with the following publication:Lane , C., and B. Autrey. Phosphorus retention of forested and emergent marsh depressional wetlands in differing land uses in Florida, USA. Wetlands Ecology and Management. Springer Science and Business Media B.V;Formerly Kluwer Academic Publishers B.V., GERMANY, 24(1): 45-60, (2016).

  8. ncISO Facilitating Metadata and Scientific Data Discovery

    Science.gov (United States)

    Neufeld, D.; Habermann, T.

    2011-12-01

    Increasing the usability and availability climate and oceanographic datasets for environmental research requires improved metadata and tools to rapidly locate and access relevant information for an area of interest. Because of the distributed nature of most environmental geospatial data, a common approach is to use catalog services that support queries on metadata harvested from remote map and data services. A key component to effectively using these catalog services is the availability of high quality metadata associated with the underlying data sets. In this presentation, we examine the use of ncISO, and Geoportal as open source tools that can be used to document and facilitate access to ocean and climate data available from Thematic Realtime Environmental Distributed Data Services (THREDDS) data services. Many atmospheric and oceanographic spatial data sets are stored in the Network Common Data Format (netCDF) and served through the Unidata THREDDS Data Server (TDS). NetCDF and THREDDS are becoming increasingly accepted in both the scientific and geographic research communities as demonstrated by the recent adoption of netCDF as an Open Geospatial Consortium (OGC) standard. One important source for ocean and atmospheric based data sets is NOAA's Unified Access Framework (UAF) which serves over 3000 gridded data sets from across NOAA and NOAA-affiliated partners. Due to the large number of datasets, browsing the data holdings to locate data is impractical. Working with Unidata, we have created a new service for the TDS called "ncISO", which allows automatic generation of ISO 19115-2 metadata from attributes and variables in TDS datasets. The ncISO metadata records can be harvested by catalog services such as ESSI-labs GI-Cat catalog service, and ESRI's Geoportal which supports query through a number of services, including OpenSearch and Catalog Services for the Web (CSW). ESRI's Geoportal Server provides a number of user friendly search capabilities for end users

  9. Evolution of the ATLAS Metadata Interface (AMI)

    CERN Document Server

    Odier, Jerome; The ATLAS collaboration; Fulachier, Jerome; Lambert, Fabian

    2015-01-01

    The ATLAS Metadata Interface (AMI) can be considered to be a mature application because it has existed for at least 10 years. Over the years, the number of users and the number of functions provided for these users has increased. It has been necessary to adapt the hardware infrastructure in a seamless way so that the Quality of Service remains high. We will describe the evolution of the application from the initial one, using single server with a MySQL backend database, to the current state, where we use a cluster of Virtual Machines on the French Tier 1 Cloud at Lyon, an ORACLE database backend also at Lyon, with replication to CERN using ORACLE streams behind a back-up server.

  10. Security in a Replicated Metadata Catalogue

    CERN Document Server

    Koblitz, B

    2007-01-01

    The gLite-AMGA metadata has been developed by NA4 to provide simple relational metadata access for the EGEE user community. As advanced features, which will be the focus of this presentation, AMGA provides very fine-grained security also in connection with the built-in support for replication and federation of metadata. AMGA is extensively used by the biomedical community to store medical images metadata, digital libraries, in HEP for logging and bookkeeping data and in the climate community. The biomedical community intends to deploy a distributed metadata system for medical images consisting of various sites, which range from hospitals to computing centres. Only safe sharing of the highly sensitive metadata as provided in AMGA makes such a scenario possible. Other scenarios are digital libraries, which federate copyright protected (meta-) data into a common catalogue. The biomedical and digital libraries have been deployed using a centralized structure already for some time. They now intend to decentralize ...

  11. XML for catalogers and metadata librarians

    CERN Document Server

    Cole, Timothy W

    2013-01-01

    How are today's librarians to manage and describe the everexpanding volumes of resources, in both digital and print formats? The use of XML in cataloging and metadata workflows can improve metadata quality, the consistency of cataloging workflows, and adherence to standards. This book is intended to enable current and future catalogers and metadata librarians to progress beyond a bare surfacelevel acquaintance with XML, thereby enabling them to integrate XML technologies more fully into their cataloging workflows. Building on the wealth of work on library descriptive practices, cataloging, and metadata, XML for Catalogers and Metadata Librarians explores the use of XML to serialize, process, share, and manage library catalog and metadata records. The authors' expert treatment of the topic is written to be accessible to those with little or no prior practical knowledge of or experience with how XML is used. Readers will gain an educated appreciation of the nuances of XML and grasp the benefit of more advanced ...

  12. A Distributed Infrastructure for Metadata about Metadata: The HDMM Architectural Style and PORTAL-DOORS System

    Directory of Open Access Journals (Sweden)

    Carl Taswell

    2010-06-01

    Full Text Available Both the IRIS-DNS System and the PORTAL-DOORS System share a common architectural style for pervasive metadata networks that operate as distributed metadata management systems with hierarchical authorities for entity registering and attribute publishing. Hierarchical control of metadata redistribution throughout the registry-directory networks constitutes an essential characteristic of this architectural style called Hierarchically Distributed Mobile Metadata (HDMM with its focus on moving the metadata for who what where as fast as possible from servers in response to requests from clients. The novel concept of multilevel metadata about metadata has also been defined for the PORTAL-DOORS System with the use of entity, record, infoset, representation and message metadata. Other new features implemented include the use of aliases, priorities and metaresources.

  13. Deploying the ATLAS Metadata Interface (AMI) on the cloud with Jenkins.

    CERN Document Server

    AUTHOR|(SzGeCERN)637120; The ATLAS collaboration; Odier, Jerome; Fulachier, Jerome

    2016-01-01

    The ATLAS Metadata Interface (AMI) is a mature application of more than 15 years of existence. Mainly used by the ATLAS experiment at CERN, it consists of a very generic tool ecosystem for metadata aggregation and cataloguing. AMI is used by the ATLAS production system, therefore the service must guarantee a high level of availability. We describe our monitoring and administration systems, and the Jenkins-based strategy used to dynamically test and deploy cloud OpenStack nodes on demand.

  14. Deploying the ATLAS Metadata Interface (AMI) on the cloud with Jenkins

    Science.gov (United States)

    Lambert, F.; Odier, J.; Fulachier, J.; ATLAS Collaboration

    2017-10-01

    The ATLAS Metadata Interface (AMI) is a mature application of more than 15 years of existence. Mainly used by the ATLAS experiment at CERN, it consists of a very generic tool ecosystem for metadata aggregation and cataloguing. AMI is used by the ATLAS production system, therefore the service must guarantee a high level of availability. We describe our monitoring and administration systems, and the Jenkins-based strategy used to dynamically test and deploy cloud OpenStack nodes on demand.

  15. International Metadata Standards and Enterprise Data Quality Metadata Systems

    Science.gov (United States)

    Habermann, Ted

    2016-01-01

    Well-documented data quality is critical in situations where scientists and decision-makers need to combine multiple datasets from different disciplines and collection systems to address scientific questions or difficult decisions. Standardized data quality metadata could be very helpful in these situations. Many efforts at developing data quality standards falter because of the diversity of approaches to measuring and reporting data quality. The one size fits all paradigm does not generally work well in this situation. I will describe these and other capabilities of ISO 19157 with examples of how they are being used to describe data quality across the NASA EOS Enterprise and also compare these approaches with other standards.

  16. Automated Atmospheric Composition Dataset Level Metadata Discovery. Difficulties and Surprises

    Science.gov (United States)

    Strub, R. F.; Falke, S. R.; Kempler, S.; Fialkowski, E.; Goussev, O.; Lynnes, C.

    2015-12-01

    The Atmospheric Composition Portal (ACP) is an aggregator and curator of information related to remotely sensed atmospheric composition data and analysis. It uses existing tools and technologies and, where needed, enhances those capabilities to provide interoperable access, tools, and contextual guidance for scientists and value-adding organizations using remotely sensed atmospheric composition data. The initial focus is on Essential Climate Variables identified by the Global Climate Observing System - CH4, CO, CO2, NO2, O3, SO2 and aerosols. This poster addresses our efforts in building the ACP Data Table, an interface to help discover and understand remotely sensed data that are related to atmospheric composition science and applications. We harvested GCMD, CWIC, GEOSS metadata catalogs using machine to machine technologies - OpenSearch, Web Services. We also manually investigated the plethora of CEOS data providers portals and other catalogs where that data might be aggregated. This poster is our experience of the excellence, variety, and challenges we encountered.Conclusions:1.The significant benefits that the major catalogs provide are their machine to machine tools like OpenSearch and Web Services rather than any GUI usability improvements due to the large amount of data in their catalog.2.There is a trend at the large catalogs towards simulating small data provider portals through advanced services. 3.Populating metadata catalogs using ISO19115 is too complex for users to do in a consistent way, difficult to parse visually or with XML libraries, and too complex for Java XML binders like CASTOR.4.The ability to search for Ids first and then for data (GCMD and ECHO) is better for machine to machine operations rather than the timeouts experienced when returning the entire metadata entry at once. 5.Metadata harvest and export activities between the major catalogs has led to a significant amount of duplication. (This is currently being addressed) 6.Most (if not

  17. Metadata for Content-Based Image Retrieval

    Directory of Open Access Journals (Sweden)

    Adrian Sterca

    2010-12-01

    Full Text Available This paper presents an image retrieval technique that combines content based image retrieval with pre-computed metadata-based image retrieval. The resulting system will have the advantages of both approaches: the speed/efficiency of metadata-based image retrieval and the accuracy/power of content-based image retrieval.

  18. GlamMap : Visualizing library metadata

    NARCIS (Netherlands)

    Betti, Arianna; Gerrits, Dirk; Speckmann, Bettina; van den Berg, Hein

    2014-01-01

    Libraries provide access to large amounts of library metadata. Unfortunately, many libraries only offer textual interfaces for searching and browsing their holdings. Visualisations provide simpler, faster, and more efficient ways to navigate, search and study large quantities of metadata. This paper

  19. A Dynamic Metadata Community Profile for CUAHSI

    Science.gov (United States)

    Bermudez, L.; Piasecki, M.

    2004-12-01

    Common Metadata standards typically lack of domain specific elements, have limited extensibility and do not always resolve semantic heterogeneities that could occur in the annotations. To facilitate the use and extension of metadata specifications a methodology called Dynamic Community Profiles, DCP, is presented. The methodology allows to overwrite elements definitions and to specify core elements as metadata tree paths. DCP uses the Web Ontology Language (OWL), the Resource Description Framework (RDF) and XML syntax to formalize specifications and to create controlled vocabularies in ontologies, which enhances interoperability. This methodology was employed to create a metadata profile for the Consortium of Universities for the Advancement of Hydrologic Science Inc. (CUAHSI). The profile was created by extending ISO-19115:2003 geographic metadata standard and restricting the permissible values of some elements. The values used as controlled vocabularies were inferred from hydrologic keywords found in the Global Change Master Directory (GCMD) and from measurement units found in the Hydrologic Handbook. Also, a core metadata set for CUAHSI was formally expressed as tree paths, containing the ISO core set plus additional elements. Finally a tool was developed to test the extension and to allow creation of metadata instances in RDF/XML which conforms to the profile. Also this tool is able to export the core elements to other schema formats such as Metadata Template Files (MTF).

  20. A Metadata-Rich File System

    Energy Technology Data Exchange (ETDEWEB)

    Ames, S; Gokhale, M B; Maltzahn, C

    2009-01-07

    Despite continual improvements in the performance and reliability of large scale file systems, the management of file system metadata has changed little in the past decade. The mismatch between the size and complexity of large scale data stores and their ability to organize and query their metadata has led to a de facto standard in which raw data is stored in traditional file systems, while related, application-specific metadata is stored in relational databases. This separation of data and metadata requires considerable effort to maintain consistency and can result in complex, slow, and inflexible system operation. To address these problems, we have developed the Quasar File System (QFS), a metadata-rich file system in which files, metadata, and file relationships are all first class objects. In contrast to hierarchical file systems and relational databases, QFS defines a graph data model composed of files and their relationships. QFS includes Quasar, an XPATH-extended query language for searching the file system. Results from our QFS prototype show the effectiveness of this approach. Compared to the defacto standard, the QFS prototype shows superior ingest performance and comparable query performance on user metadata-intensive operations and superior performance on normal file metadata operations.

  1. ATLAS Metadata Infrastructure Evolution for Run 2 and Beyond

    CERN Document Server

    van Gemmeren, Peter; The ATLAS collaboration; Cranshaw, Jack; Vaniachine, Alexandre

    2015-01-01

    ATLAS developed and employed for Run 1 of the Large Hadron Collider a sophisticated infrastructure for metadata handling in event processing jobs. This infrastructure profits from a rich feature set provided by the ATLAS execution control framework, including standardized interfaces and invocation mechanisms for tools and services, segregation of transient data stores with concomitant object lifetime management, and mechanisms for handling occurrences asynchronous to the control framework’s state machine transitions. This metadata infrastructure is evolving and being extended for Run 2 to allow its use and reuse in downstream physics analyses, analyses that may or may not utilize the ATLAS control framework. At the same time, multiprocessing versions of the control framework and the requirements of future multithreaded frameworks are leading to redesign of components that use an incident-handling approach to asynchrony. The increased use of scatter-gather architectures, both local and distributed, requires ...

  2. Improving Earth Science Metadata: Modernizing ncISO

    Science.gov (United States)

    O'Brien, K.; Schweitzer, R.; Neufeld, D.; Burger, E. F.; Signell, R. P.; Arms, S. C.; Wilcox, K.

    2016-12-01

    ncISO is a package of tools developed at NOAA's National Center for Environmental Information (NCEI) that facilitates the generation of ISO 19115-2 metadata from NetCDF data sources. The tool currently exists in two iterations: a command line utility and a web-accessible service within the THREDDS Data Server (TDS). Several projects, including NOAA's Unified Access Framework (UAF), depend upon ncISO to generate the ISO-compliant metadata from their data holdings and use the resulting information to populate discovery tools such as NCEI's ESRI Geoportal and NOAA's data.noaa.gov CKAN system. In addition to generating ISO 19115-2 metadata, the tool calculates a rubric score based on how well the dataset follows the Attribute Conventions for Dataset Discovery (ACDD). The result of this rubric calculation, along with information about what has been included and what is missing is displayed in an HTML document generated by the ncISO software package. Recently ncISO has fallen behind in terms of supporting updates to conventions such updates to the ACDD. With the blessing of the original programmer, NOAA's UAF has been working to modernize the ncISO software base. In addition to upgrading ncISO to utilize version1.3 of the ACDD, we have been working with partners at Unidata and IOOS to unify the tool's code base. In essence, we are merging the command line capabilities into the same software that will now be used by the TDS service, allowing easier updates when conventions such as ACDD are updated in the future. In this presentation, we will discuss the work the UAF project has done to support updated conventions within ncISO, as well as describe how the updated tool is helping to improve metadata throughout the earth and ocean sciences.

  3. Twenty-first century metadata operations challenges, opportunities, directions

    CERN Document Server

    Lee Eden, Bradford

    2014-01-01

    It has long been apparent to academic library administrators that the current technical services operations within libraries need to be redirected and refocused in terms of both format priorities and human resources. A number of developments and directions have made this reorganization imperative, many of which have been accelerated by the current economic crisis. All of the chapters detail some aspect of technical services reorganization due to downsizing and/or reallocation of human resources, retooling professional and support staff in higher level duties and/or non-MARC metadata, ""value-a

  4. Approach to Facilitating Geospatial Data and Metadata Publication Using a Standard Geoservice

    Directory of Open Access Journals (Sweden)

    Sergio Trilles

    2017-04-01

    Full Text Available Nowadays, the existence of metadata is one of the most important aspects of effective discovery of geospatial data published in Spatial Data Infrastructures (SDIs. However, due to lack of efficient mechanisms integrated in the data workflow, to assist users in metadata generation, a lot of low quality and outdated metadata are stored in the catalogues. This paper presents a mechanism for generating and publishing metadata through a publication service. This mechanism is provided as a web service implemented with a standard interface called a web processing service, which improves interoperability between other SDI components. This work extends previous research, in which a publication service has been designed in the framework of the European Directive Infrastructure for Spatial Information in Europe (INSPIRE as a solution to assist users in automatically publishing geospatial data and metadata in order to improve, among other aspects, SDI maintenance and usability. Also, this work adds more extra features in order to support more geospatial formats, such as sensor data.

  5. Metadata Standards and Workflow Systems

    Science.gov (United States)

    Habermann, T.

    2012-12-01

    All modern workflow systems include mechanisms for recording inputs, outputs and processes. These descriptions can include details required to reproduce the workflows exactly and, in some cases, can include virtual images of the hardware and operating system. There are several on-going and emerging standards for representing these detailed workflows including the Open Provenance Model (OPM) and the W3C PROV. At the same time, ISO metadata standards include a simple provenance or lineage model that includes many important elements of workflows. The ISO model could play a critical role in sharing and discovering workflow information for collections and perhaps in recording some details in granules. In order for this goal to be reached, connections between the detailed standards and ISO must be understood and conventions for using them must be developed.

  6. Reflecting on the challenges of building a rich interconnected metadata database to describe the experiments of phase six of the coupled climate model intercomparison project (CMIP6) for the Earth System Documentation Project (ES-DOC) and anticipating the opportunities that tooling and services based on rich metadata can provide.

    Science.gov (United States)

    Pascoe, C. L.

    2017-12-01

    The Coupled Model Intercomparison Project (CMIP) has coordinated climate model experiments involving multiple international modelling teams since 1995. This has led to a better understanding of past, present, and future climate. The 2017 sixth phase of the CMIP process (CMIP6) consists of a suite of common experiments, and 21 separate CMIP-Endorsed Model Intercomparison Projects (MIPs) making a total of 244 separate experiments. Precise descriptions of the suite of CMIP6 experiments have been captured in a Common Information Model (CIM) database by the Earth System Documentation Project (ES-DOC). The database contains descriptions of forcings, model configuration requirements, ensemble information and citation links, as well as text descriptions and information about the rationale for each experiment. The database was built from statements about the experiments found in the academic literature, the MIP submissions to the World Climate Research Programme (WCRP), WCRP summary tables and correspondence with the principle investigators for each MIP. The database was collated using spreadsheets which are archived in the ES-DOC Github repository and then rendered on the ES-DOC website. A diagramatic view of the workflow of building the database of experiment metadata for CMIP6 is shown in the attached figure.The CIM provides the formalism to collect detailed information from diverse sources in a standard way across all the CMIP6 MIPs. The ES-DOC documentation acts as a unified reference for CMIP6 information to be used both by data producers and consumers. This is especially important given the federated nature of the CMIP6 project. Because the CIM allows forcing constraints and other experiment attributes to be referred to by more than one experiment, we can streamline the process of collecting information from modelling groups about how they set up their models for each experiment. End users of the climate model archive will be able to ask questions enabled by the

  7. CanCore: Metadata for Learning Objects

    Directory of Open Access Journals (Sweden)

    Norm Friesen

    2002-10-01

    Full Text Available The vision of reusable digital learning resources or objects, made accessible through coordinated repository architectures and metadata technologies, has gained considerable attention within distance education and training communities. However, the pivotal role of metadata in this vision raises important and longstanding issues about classification, description and meaning. The purpose of this paper is to provide an overview of this vision, focusing specifically on issues of semantics. It will describe the CanCore Learning Object Metadata Application Profile as an important first step in addressing these issues in the context of the discovery, reuse and management of learning resources or objects.

  8. Metadata in Chaos: how researchers tag radio broadcasts

    DEFF Research Database (Denmark)

    Lykke, Marianne; Lund, Haakon; Skov, Mette

    2015-01-01

    . To optimally support the researchers a user-centred approach was taken to develop the platform and related metadata scheme. Based on the requirements a three level metadata scheme was developed: (1) core archival metadata, (2) LARM metadata, and (3) project-specific metadata. The paper analyses how researchers...... apply the metadata scheme in their research work. The study consists of two studies, a) a qualitative study of subjects and vocabulary of the applied metadata and annotations, and 5 semi-structured interviews about goals for tagging. The findings clearly show that the primary role of LARM...

  9. Generation of Multiple Metadata Formats from a Geospatial Data Repository

    Science.gov (United States)

    Hudspeth, W. B.; Benedict, K. K.; Scott, S.

    2012-12-01

    The Earth Data Analysis Center (EDAC) at the University of New Mexico is partnering with the CYBERShARE and Environmental Health Group from the Center for Environmental Resource Management (CERM), located at the University of Texas, El Paso (UTEP), the Biodiversity Institute at the University of Kansas (KU), and the New Mexico Geo- Epidemiology Research Network (GERN) to provide a technical infrastructure that enables investigation of a variety of climate-driven human/environmental systems. Two significant goals of this NASA-funded project are: a) to increase the use of NASA Earth observational data at EDAC by various modeling communities through enabling better discovery, access, and use of relevant information, and b) to expose these communities to the benefits of provenance for improving understanding and usability of heterogeneous data sources and derived model products. To realize these goals, EDAC has leveraged the core capabilities of its Geographic Storage, Transformation, and Retrieval Engine (Gstore) platform, developed with support of the NSF EPSCoR Program. The Gstore geospatial services platform provides general purpose web services based upon the REST service model, and is capable of data discovery, access, and publication functions, metadata delivery functions, data transformation, and auto-generated OGC services for those data products that can support those services. Central to the NASA ACCESS project is the delivery of geospatial metadata in a variety of formats, including ISO 19115-2/19139, FGDC CSDGM, and the Proof Markup Language (PML). This presentation details the extraction and persistence of relevant metadata in the Gstore data store, and their transformation into multiple metadata formats that are increasingly utilized by the geospatial community to document not only core library catalog elements (e.g. title, abstract, publication data, geographic extent, projection information, and database elements), but also the processing steps used to

  10. Better Living Through Metadata: Examining Archive Usage

    Science.gov (United States)

    Becker, G.; Winkelman, S.; Rots, A.

    2013-10-01

    The primary purpose of an observatory's archive is to provide access to the data through various interfaces. User interactions with the archive are recorded in server logs, which can be used to answer basic questions like: Who has downloaded dataset X? When did she do this? Which tools did she use? The answers to questions like these fill in patterns of data access (e.g., how many times dataset X has been downloaded in the past three years). Analysis of server logs provides metrics of archive usage and provides feedback on interface use which can be used to guide future interface development. The Chandra X-ray Observatory is fortunate in that a database to track data access and downloads has been continuously recording such transactions for years; however, it is overdue for an update. We will detail changes we hope to effect and the differences the changes may make to our usage metadata picture. We plan to gather more information about the geographic location of users without compromising privacy; create improved archive statistics; and track and assess the impact of web “crawlers” and other scripted access methods on the archive. With the improvements to our download tracking we hope to gain a better understanding of the dissemination of Chandra's data; how effectively it is being done; and perhaps discover ideas for new services.

  11. PERANCANGAN SISTEM METADATA UNTUK DATA WAREHOUSE DENGAN STUDI KASUS REVENUE TRACKING PADA PT. TELKOM DIVRE V JAWA TIMUR

    Directory of Open Access Journals (Sweden)

    Yudhi Purwananto

    2004-07-01

    Full Text Available Normal 0 false false false IN X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} Data warehouse merupakan media penyimpanan data dalam perusahaan yang diambil dari berbagai sistem dan dapat digunakan untuk berbagai keperluan seperti analisis dan pelaporan. Di PT Telkom Divre V Jawa Timur telah dibangun sebuah data warehouse yang disebut dengan Regional Database. Di Regional Database memerlukan sebuah komponen penting dalam data warehouse yaitu metadata. Definisi metadata secara sederhana adalah "data tentang data". Dalam penelitian ini dirancang sistem metadata dengan studi kasus Revenue Tracking sebagai komponen analisis dan pelaporan pada Regional Database. Metadata sangat perlu digunakan dalam pengelolaan dan memberikan informasi tentang data warehouse. Proses - proses di dalam data warehouse serta komponen - komponen yang berkaitan dengan data warehouse harus saling terintegrasi untuk mewujudkan karakteristik data warehouse yang subject-oriented, integrated, time-variant, dan non-volatile. Karena itu metadata juga harus memiliki kemampuan mempertukarkan informasi (exchange antar komponen dalam data warehouse tersebut. Web service digunakan sebagai mekanisme pertukaran ini. Web service menggunakan teknologi XML dan protokol HTTP dalam berkomunikasi. Dengan web service, setiap komponen

  12. Structural Metadata Research in the Ears Program

    National Research Council Canada - National Science Library

    Liu, Yang; Shriberg, Elizabeth; Stolcke, Andreas; Peskin, Barbara; Ang, Jeremy; Hillard, Dustin; Ostendorf, Mari; Tomalin, Marcus; Woodland, Phil; Harper, Mary

    2005-01-01

    Both human and automatic processing of speech require recognition of more than just words. In this paper we provide a brief overview of research on structural metadata extraction in the DARPA EARS rich transcription program...

  13. USGS Digital Orthophoto Quad (DOQ) Metadata

    Data.gov (United States)

    Minnesota Department of Natural Resources — Metadata for the USGS DOQ Orthophoto Layer. Each orthophoto is represented by a Quarter 24k Quad tile polygon. The polygon attributes contain the quarter-quad tile...

  14. WDCC Metadata Generation with GeoNetwork

    Science.gov (United States)

    Ramthun, Hans; Lautenschlager, Michael; Winter, Hans-Hermann

    2010-05-01

    Earth system science data like modeling output data are described by metadata. At the WDCC (World Data Center of Climate) the data and metadata are stored inside the CERA (Climate and Environmental Retrieval and Archive) relational database. To fill in the describing metadata several types of XML documents are used to upload data into the database. GeoNetwork is an Ajax based web framework, which offers a wide range of XML data handling for search and update and is especially designed to meet the ISO19115/19139 standard. This framework was extended by the schema's which allow create and update CERA upload XML records. An upload function is also included as well as a connection to the local LDAP (Lightweight Directory Access Protocol) for authentication. Keywords: metadata, WDCC, CERA, Ajax

  15. FSA 2003-2004 Digital Orthophoto Metadata

    Data.gov (United States)

    Minnesota Department of Natural Resources — Metadata for the 2003-2004 FSA Color Orthophotos Layer. Each orthophoto is represented by a Quarter 24k Quad tile polygon. The polygon attributes contain the...

  16. Optimising metadata workflows in a distributed information environment

    OpenAIRE

    Robertson, R. John; Barton, Jane

    2005-01-01

    The different purposes present within a distributed information environment create the potential for repositories to enhance their metadata by capitalising on the diversity of metadata available for any given object. This paper presents three conceptual reference models required to achieve this optimisation of metadata workflow: the ecology of repositories, the object lifecycle model, and the metadata lifecycle model. It suggests a methodology for developing the metadata lifecycle model, and ...

  17. Analytics Platform for ATLAS Computing Services

    CERN Document Server

    Vukotic, Ilija; The ATLAS collaboration; Bryant, Lincoln

    2016-01-01

    Big Data technologies have proven to be very useful for storage, processing and visualization of derived metrics associated with ATLAS distributed computing (ADC) services. Log file data and database records, and metadata from a diversity of systems have been aggregated and indexed to create an analytics platform for ATLAS ADC operations analysis. Dashboards, wide area data access cost metrics, user analysis patterns, and resource utilization efficiency charts are produced flexibly through queries against a powerful analytics cluster. Here we explore whether these techniques and analytics ecosystem can be applied to add new modes of open, quick, and pervasive access to ATLAS event data so as to simplify access and broaden the reach of ATLAS public data to new communities of users. An ability to efficiently store, filter, search and deliver ATLAS data at the event and/or sub-event level in a widely supported format would enable or significantly simplify usage of machine learning tools like Spark, Jupyter, R, S...

  18. Automated Test Methods for XML Metadata

    Science.gov (United States)

    2017-12-28

    definition (XSD) format and other standards and conventions. This method should be of interest primarily to parties having tools or applications that...consume RCC metadata standard documents, and may be of interest to developers of tools or applications that produce RCC metadata standard documents...instance document and encodings to verify that the rules engines and other tools work together. 1. Initialize the programming environment. 2. Write test

  19. A New Look at Data Usage by Using Metadata Attributes as Indicators of Data Quality

    Science.gov (United States)

    Won, Young-In; Wanchoo, Lalit; Behnke, Jeanne

    2016-01-01

    This study reviews the key metrics (users, distributed volume, and files) in multiple ways to gain an understanding of the significance of the metadata. Characterizing the usability of data by key metadata elements, such as discipline and study area, will assist in understanding how the user needs have evolved over time. The data usage pattern based on product level provides insight into the level of data quality. In addition, the data metrics by various services, such as the Open-source Project for a Network Data Access Protocol (OPeNDAP) and subsets, address how these services have extended the usage of data. Over-all, this study presents the usage of data and metadata by metrics analyses, which may assist data centers in better supporting the needs of the users.

  20. Distributed metadata servers for cluster file systems using shared low latency persistent key-value metadata store

    Science.gov (United States)

    Bent, John M.; Faibish, Sorin; Pedone, Jr., James M.; Tzelnic, Percy; Ting, Dennis P. J.; Ionkov, Latchesar A.; Grider, Gary

    2017-12-26

    A cluster file system is provided having a plurality of distributed metadata servers with shared access to one or more shared low latency persistent key-value metadata stores. A metadata server comprises an abstract storage interface comprising a software interface module that communicates with at least one shared persistent key-value metadata store providing a key-value interface for persistent storage of key-value metadata. The software interface module provides the key-value metadata to the at least one shared persistent key-value metadata store in a key-value format. The shared persistent key-value metadata store is accessed by a plurality of metadata servers. A metadata request can be processed by a given metadata server independently of other metadata servers in the cluster file system. A distributed metadata storage environment is also disclosed that comprises a plurality of metadata servers having an abstract storage interface to at least one shared persistent key-value metadata store.

  1. Evolution of the architecture of the ATLAS Metadata Interface (AMI)

    Science.gov (United States)

    Odier, J.; Aidel, O.; Albrand, S.; Fulachier, J.; Lambert, F.

    2015-12-01

    The ATLAS Metadata Interface (AMI) is now a mature application. Over the years, the number of users and the number of provided functions has dramatically increased. It is necessary to adapt the hardware infrastructure in a seamless way so that the quality of service re - mains high. We describe the AMI evolution since its beginning being served by a single MySQL backend database server to the current state having a cluster of virtual machines at French Tier1, an Oracle database at Lyon with complementary replication to the Oracle DB at CERN and AMI back-up server.

  2. Evolution of the Architecture of the ATLAS Metadata Interface (AMI)

    CERN Document Server

    Odier, Jerome; The ATLAS collaboration; Fulachier, Jerome; Lambert, Fabian

    2015-01-01

    The ATLAS Metadata Interface (AMI) is now a mature application. Over the years, the number of users and the number of provided functions has dramatically increased. It is necessary to adapt the hardware infrastructure in a seamless way so that the quality of service remains high. We describe the evolution from the beginning of the application life, using one server with a MySQL backend database, to the current state in which a cluster of virtual machines on the French Tier 1 cloud at Lyon, an Oracle database also at Lyon, with replication to Oracle at CERN and a back-up server are used.

  3. Exposing and Harvesting Metadata Using the OAI Metadata Harvesting Protocol A Tutorial

    CERN Document Server

    Warner, Simeon

    2001-01-01

    In this article I outline the ideas behind the Open Archives Initiative metadata harvesting protocol (OAIMH), and attempt to clarify some common misconceptions. I then consider how the OAIMH protocol can be used to expose and harvest metadata. Perl code examples are given as practical illustration.

  4. Mapping metadata for SWHi : Aligning schemas with library metadata for a historical ontology

    NARCIS (Netherlands)

    Zhang, Junte; Fahmi, I.; Ellermann, Henk; Bouma, G.; Weske, M; Hacid, MS; Godart, C

    2007-01-01

    What are the possibilities of Semantic Web technologies for organizations which traditionally have lots of structured data, such as metadata, available? A library is such a particular organization. We mapped a digital library's descriptive (bibliographic) metadata for a large historical document

  5. Improvements to the Ontology-based Metadata Portal for Unified Semantics (OlyMPUS)

    Science.gov (United States)

    Linsinbigler, M. A.; Gleason, J. L.; Huffer, E.

    2016-12-01

    The Ontology-based Metadata Portal for Unified Semantics (OlyMPUS), funded by the NASA Earth Science Technology Office Advanced Information Systems Technology program, is an end-to-end system designed to support Earth Science data consumers and data providers, enabling the latter to register data sets and provision them with the semantically rich metadata that drives the Ontology-Driven Interactive Search Environment for Earth Sciences (ODISEES). OlyMPUS complements the ODISEES' data discovery system with an intelligent tool to enable data producers to auto-generate semantically enhanced metadata and upload it to the metadata repository that drives ODISEES. Like ODISEES, the OlyMPUS metadata provisioning tool leverages robust semantics, a NoSQL database and query engine, an automated reasoning engine that performs first- and second-order deductive inferencing, and uses a controlled vocabulary to support data interoperability and automated analytics. The ODISEES data discovery portal leverages this metadata to provide a seamless data discovery and access experience for data consumers who are interested in comparing and contrasting the multiple Earth science data products available across NASA data centers. Olympus will support scientists' services and tools for performing complex analyses and identifying correlations and non-obvious relationships across all types of Earth System phenomena using the full spectrum of NASA Earth Science data available. By providing an intelligent discovery portal that supplies users - both human users and machines - with detailed information about data products, their contents and their structure, ODISEES will reduce the level of effort required to identify and prepare large volumes of data for analysis. This poster will explain how OlyMPUS leverages deductive reasoning and other technologies to create an integrated environment for generating and exploiting semantically rich metadata.

  6. Increasing the international visibility of research data by a joint metadata schema

    Science.gov (United States)

    Svoboda, Nikolai; Zoarder, Muquit; Gärtner, Philipp; Hoffmann, Carsten; Heinrich, Uwe

    2017-04-01

    The BonaRes Project ("Soil as a sustainable resource for the bioeconomy") was launched in 2015 to promote sustainable soil management and to avoid fragmentation of efforts (Wollschläger et al., 2016). For this purpose, an IT infrastructure is being developed to upload, manage, store, and provide research data and its associated metadata. The research data provided by the BonaRes data centre are, in principle, not subject to any restrictions on reuse. For all research data considerable standardized metadata are the key enablers for the effective use of these data. Providing proper metadata is often viewed as an extra burden with further work and resources consumed. In our lecture we underline the benefits of structured and interoperable metadata like: accessibility of data, discovery of data, interpretation of data, linking data and several more and we counter these advantages with the effort of time, personnel and further costs. Building on this, we describe the framework of metadata in BonaRes combining the standards of OGC for description, visualization, exchange and discovery of geodata as well as the schema of DataCite for the publication and citation of this research data. This enables the generation of a DOI, a unique identifier that provides a permanent link to the citable research data. By using OGC standards, data and metadata become interoperable with numerous research data provided via INSPIRE. It enables further services like CSW for harvesting WMS for visualization and WFS for downloading. We explain the mandatory fields that result from our approach and we give a general overview about our metadata architecture implementation. Literature: Wollschläger, U; Helming, K.; Heinrich, U.; Bartke, S.; Kögel-Knabner, I.; Russell, D.; Eberhardt, E. & Vogel, H.-J.: The BonaRes Centre - A virtual institute for soil research in the context of a sustainable bio-economy. Geophysical Research Abstracts, Vol. 18, EGU2016-9087, 2016.

  7. Using XML to encode TMA DES metadata.

    Science.gov (United States)

    Lyttleton, Oliver; Wright, Alexander; Treanor, Darren; Lewis, Paul

    2011-01-01

    The Tissue Microarray Data Exchange Specification (TMA DES) is an XML specification for encoding TMA experiment data. While TMA DES data is encoded in XML, the files that describe its syntax, structure, and semantics are not. The DTD format is used to describe the syntax and structure of TMA DES, and the ISO 11179 format is used to define the semantics of TMA DES. However, XML Schema can be used in place of DTDs, and another XML encoded format, RDF, can be used in place of ISO 11179. Encoding all TMA DES data and metadata in XML would simplify the development and usage of programs which validate and parse TMA DES data. XML Schema has advantages over DTDs such as support for data types, and a more powerful means of specifying constraints on data values. An advantage of RDF encoded in XML over ISO 11179 is that XML defines rules for encoding data, whereas ISO 11179 does not. We created an XML Schema version of the TMA DES DTD. We wrote a program that converted ISO 11179 definitions to RDF encoded in XML, and used it to convert the TMA DES ISO 11179 definitions to RDF. We validated a sample TMA DES XML file that was supplied with the publication that originally specified TMA DES using our XML Schema. We successfully validated the RDF produced by our ISO 11179 converter with the W3C RDF validation service. All TMA DES data could be encoded using XML, which simplifies its processing. XML Schema allows datatypes and valid value ranges to be specified for CDEs, which enables a wider range of error checking to be performed using XML Schemas than could be performed using DTDs.

  8. Using XML to encode TMA DES metadata

    Directory of Open Access Journals (Sweden)

    Oliver Lyttleton

    2011-01-01

    Full Text Available Background: The Tissue Microarray Data Exchange Specification (TMA DES is an XML specification for encoding TMA experiment data. While TMA DES data is encoded in XML, the files that describe its syntax, structure, and semantics are not. The DTD format is used to describe the syntax and structure of TMA DES, and the ISO 11179 format is used to define the semantics of TMA DES. However, XML Schema can be used in place of DTDs, and another XML encoded format, RDF, can be used in place of ISO 11179. Encoding all TMA DES data and metadata in XML would simplify the development and usage of programs which validate and parse TMA DES data. XML Schema has advantages over DTDs such as support for data types, and a more powerful means of specifying constraints on data values. An advantage of RDF encoded in XML over ISO 11179 is that XML defines rules for encoding data, whereas ISO 11179 does not. Materials and Methods: We created an XML Schema version of the TMA DES DTD. We wrote a program that converted ISO 11179 definitions to RDF encoded in XML, and used it to convert the TMA DES ISO 11179 definitions to RDF. Results: We validated a sample TMA DES XML file that was supplied with the publication that originally specified TMA DES using our XML Schema. We successfully validated the RDF produced by our ISO 11179 converter with the W3C RDF validation service. Conclusions: All TMA DES data could be encoded using XML, which simplifies its processing. XML Schema allows datatypes and valid value ranges to be specified for CDEs, which enables a wider range of error checking to be performed using XML Schemas than could be performed using DTDs.

  9. Deploying the ATLAS Metadata Interface (AMI) on the cloud with Jenkins

    CERN Document Server

    Lambert, Fabian; The ATLAS collaboration

    2016-01-01

    The ATLAS Metadata Interface (AMI) is a mature application of more than 15 years of existence. Mainly used by the ATLAS experiment at CERN, it consists of a very generic tool ecosystem for metadata aggregation and cataloguing. AMI is used by the ATLAS production system, therefore the service must guarantee a high level of availability. We describe our monitoring system and the Jenkins-based strategy used to dynamically test and deploy cloud OpenStack nodes on demand. Moreover, we describe how to switch to a distant replica in case of downtime.

  10. Omics Metadata Management Software v. 1 (OMMS)

    Energy Technology Data Exchange (ETDEWEB)

    2013-09-09

    Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and to perform bioinformatics analyses and information management tasks via a simple and intuitive web-based interface. Several use cases with short-read sequence datasets are provided to showcase the full functionality of the OMMS, from metadata curation tasks, to bioinformatics analyses and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for web-based deployment supporting geographically dispersed research teams. Our software was developed with open-source bundles, is flexible, extensible and easily installed and run by operators with general system administration and scripting language literacy.

  11. DataNet: A flexible metadata overlay over file resources

    CERN Multimedia

    CERN. Geneva

    2014-01-01

    Managing and sharing data stored in files results in a challenge due to data amounts produced by various scientific experiments [1]. While solutions such as Globus Online [2] focus on file transfer and synchronization, in this work we propose an additional layer of metadata over file resources which helps to categorize and structure the data, as well as to make it efficient in integration with web-based research gateways. A basic concept of the proposed solution [3] is a data model consisting of entities built from primitive types such as numbers, texts and also from files and relationships among different entities. This allows for building complex data structure definitions and mix metadata and file data into a single model tailored for a given scientific field. A data model becomes actionable after being deployed as a data repository which is done automatically by the proposed framework by using one of the available PaaS (platform-as-a-service) platforms and is exposed to the world as a REST service, which...

  12. INSPIRE: Managing Metadata in a Global Digital Library for High-Energy Physics

    CERN Document Server

    Martin Montull, Javier

    2011-01-01

    Four leading laboratories in the High-Energy Physics (HEP) field are collaborating to roll-out the next-generation scientific information portal: INSPIRE. The goal of this project is to replace the popular 40 year-old SPIRES database. INSPIRE already provides access to about 1 million records and includes services such as fulltext search, automatic keyword assignment, ingestion and automatic display of LaTeX, citation analysis, automatic author disambiguation, metadata harvesting, extraction of figures from fulltext and search in figure captions. In order to achieve high quality metadata both automatic processing and manual curation are needed. The different tools available in the system use modern web technologies to provide the curators of the maximum efficiency, while dealing with the MARC standard format. The project is under heavy development in order to provide new features including semantic analysis, crowdsourcing of metadata curation, user tagging, recommender systems, integration of OAIS standards a...

  13. Pembuatan Aplikasi Metadata Generator untuk Koleksi Peninggalan Warisan Budaya

    Directory of Open Access Journals (Sweden)

    Wimba Agra Wicesa

    2017-03-01

    Full Text Available Warisan budaya merupakan suatu aset penting yang digunakan sebagai sumber informasi dalam mempelajari ilmu sejarah. Mengelola data warisan budaya menjadi suatu hal yang harus diperhatikan guna menjaga keutuhan data warisan budaya di masa depan. Menciptakan sebuah metadata warisan budaya merupakan salah satu langkah yang dapat diambil untuk menjaga nilai dari sebuah artefak. Dengan menggunakan konsep metadata, informasi dari setiap objek warisan budaya tersebut menjadi mudah untuk dibaca, dikelola, maupun dicari kembali meskipun telah tersimpan lama. Selain itu dengan menggunakan konsep metadata, informasi tentang warisan budaya dapat digunakan oleh banyak sistem. Metadata warisan budaya merupakan metadata yang cukup besar. Sehingga untuk membangun metada warisan budaya dibutuhkan waktu yang cukup lama. Selain itu kesalahan (human error juga dapat menghambat proses pembangunan metadata warisan budaya. Proses pembangkitan metadata warisan budaya melalui Aplikasi Metadata Generator menjadi lebih cepat dan mudah karena dilakukan secara otomatis oleh sistem. Aplikasi ini juga dapat menekan human error sehingga proses pembangkitan menjadi lebih efisien.

  14. Defining an Open Metadata Framework for Proteomics: The PROMIS Project

    OpenAIRE

    MacMullen, W. John; Parmelee, Mary C.; Fenstermacher, David A.; Hemminger, Bradley M.

    2002-01-01

    This presentation describes the PROMIS project under development at UNC Chapel Hill. PROMIS (Proteomics Metadata Interchange Schema) is a proof-of-concept prototype of an open metadata standard for compositional proteomics.

  15. Geodesy Data and Metadata Integration Strategies for Collaborative Global Research Infrastructures

    Science.gov (United States)

    Boler, Fran; Meertens, Charles

    2017-04-01

    Through multiple pathways, UNAVCO is collaborating with US and international partners to integrate geodesy-related research infrastructures. One of the earliest of UNAVCO's efforts at an integrated research infrastructure for geodesy was the Geodesy Seamless Archive Centers (GSAC) software, a web services-based data and metadata search and access system that was pioneered by UNAVCO and collaborators at Scripps and NASA. GSAC was adopted as an enabling technology in the early phases of the European Plate Observing System through the CoopEUS European and US initiative. GSAC is also a core piece of the infrastructure used in Dataworks for GNSS (Global Navigation Satellite System), a UNAVCO effort to build integrated GNSS data system components. In addition to GSAC, Dataworks has components that facilitate data download from a network of GNSS receivers, and data and metadata management. Dataworks has been deployed for capacity building in the Caribbean. The web services approach continues to be a major focus for UNAVCO and has been implemented within the NSF EarthCube Building Block project GeoWS, which takes the web services concept from an inter-domain infrastructure capability (across institutions but within geodesy) to the next level as a cross-domain (geodesy, seismology, marine geophysics) infrastructure capability through definition of common, standards-based vocabularies and exchange formats. In a separate effort focused on metadata, UNAVCO is working under the Data Centers Working Group of the International GNSS Service to establish metadata formats and exchange mechanisms using standards via the GeodesyML effort of Geosciences Australia and others for Open Geospatial Consortium web services for metadata.

  16. Taxonomy of XML-based metadata in a real-time digiTV deployment environment: digital broadcast item taxonomy

    Science.gov (United States)

    Lugmayr, Artur R.; Kalli, Seppo

    2003-04-01

    XML based metadata schemes and Interactive Digital Television (digiTV) are two new paradigms in the world of multimedia. Both paradigms shall be converged and provide an integrated solution for several participants in a digital, interactive television broadcast. The local digiTV equipment and software requirements for a metadata based service provision move more to an integrated multimedia experience. To be able to present a heterogeneous solution to the participants, certain preliminary assignments and structures have to be introduced. One integral requirement is the conceptualization of a XML based real-time metadata architecture in the world of digiTV to be able to apply advanced interactive narrative patterns (e.g. parallel stories), content descriptions (based on MPEG-7), and the description of items that are exchanged between users and the broadcast- and interaction service provider (e.g. MPEG-21). Within the scope of this research work we focus on the appliance of basic metadata concepts, real-time constrains, description schemes design for interactive broadcasts, cover conceptual design issues, metadata life-cycle, and synchronization mechanisms. We consider Digital Video Broadcasts (DVB) compliant design as entire requirement and show how metadata can be useful applied in accordance with this standard

  17. Creating metadata that work for digital libraries and Google

    OpenAIRE

    Dawson, Alan

    2004-01-01

    For many years metadata has been recognised as a significant component of the digital information environment. Substantial work has gone into creating complex metadata schemes for describing digital content. Yet increasingly Web search engines, and Google in particular, are the primary means of discovering and selecting digital resources, although they make little use of metadata. This article considers how digital libraries can gain more value from their metadata by adapting it for Google us...

  18. Revision of IRIS/IDA Seismic Station Metadata

    Science.gov (United States)

    Xu, W.; Davis, P.; Auerbach, D.; Klimczak, E.

    2017-12-01

    Trustworthy data quality assurance has always been one of the goals of seismic network operators and data management centers. This task is considerably complex and evolving due to the huge quantities as well as the rapidly changing characteristics and complexities of seismic data. Published metadata usually reflect instrument response characteristics and their accuracies, which includes zero frequency sensitivity for both seismometer and data logger as well as other, frequency-dependent elements. In this work, we are mainly focused studying the variation of the seismometer sensitivity with time of IRIS/IDA seismic recording systems with a goal to improve the metadata accuracy for the history of the network. There are several ways to measure the accuracy of seismometer sensitivity for the seismic stations in service. An effective practice recently developed is to collocate a reference seismometer in proximity to verify the in-situ sensors' calibration. For those stations with a secondary broadband seismometer, IRIS' MUSTANG metric computation system introduced a transfer function metric to reflect two sensors' gain ratios in the microseism frequency band. In addition, a simulation approach based on M2 tidal measurements has been proposed and proven to be effective. In this work, we compare and analyze the results from three different methods, and concluded that the collocated-sensor method is most stable and reliable with the minimum uncertainties all the time. However, for epochs without both the collocated sensor and secondary seismometer, we rely on the analysis results from tide method. For the data since 1992 on IDA stations, we computed over 600 revised seismometer sensitivities for all the IRIS/IDA network calibration epochs. Hopefully further revision procedures will help to guarantee that the data is accurately reflected by the metadata of these stations.

  19. Multimedia Learning Systems Based on IEEE Learning Object Metadata (LOM).

    Science.gov (United States)

    Holzinger, Andreas; Kleinberger, Thomas; Muller, Paul

    One of the "hottest" topics in recent information systems and computer science is metadata. Learning Object Metadata (LOM) appears to be a very powerful mechanism for representing metadata, because of the great variety of LOM Objects. This is on of the reasons why the LOM standard is repeatedly cited in projects in the field of eLearning…

  20. Handling multiple metadata streams regarding digital learning material

    NARCIS (Netherlands)

    Roes, J.B.M.; Vuuren, J. van; Verbeij, N.; Nijstad, H.

    2010-01-01

    This paper presents the outcome of a study performed in the Netherlands on handling multiple metadata streams regarding digital learning material. The paper describes the present metadata architecture in the Netherlands, the present suppliers and users of metadata and digital learning materials. It

  1. A quick scan on possibilities for automatic metadata generation

    NARCIS (Netherlands)

    Benneker, Frank

    2006-01-01

    The Quick Scan is a report on research into useable solutions for automatic generation of metadata or parts of metadata. The aim of this study is to explore possibilities for facilitating the process of attaching metadata to learning objects. This document is aimed at developers of digital learning

  2. PROGRAM SYSTEM AND INFORMATION METADATA BANK OF TERTIARY PROTEIN STRUCTURES

    Directory of Open Access Journals (Sweden)

    T. A. Nikitin

    2013-01-01

    Full Text Available The article deals with the architecture of metadata storage model for check results of three-dimensional protein structures. Concept database model was built. The service and procedure of database update as well as data transformation algorithms for protein structures and their quality were presented. Most important information about entries and their submission forms to store, access, and delivery to users were highlighted. Software suite was developed for the implementation of functional tasks using Java programming language in the NetBeans v.7.0 environment and JQL to query and interact with the database JavaDB. The service was tested and results have shown system effectiveness while protein structures filtration.

  3. Metadata Guidelines for Digital Moving Images

    National Research Council Canada - National Science Library

    Flynn, Marcy

    2000-01-01

    ...." Examples for each data element and sample records are presented. Technical metadata essential to the preservation and management of digital materials is also addressed in the Guidelines. This manual is also available at the Defense Virtual Library Web site, http://dvl.dtic.mil:8100/notes.html.

  4. Metadata management and semantics in microarray repositories.

    Science.gov (United States)

    Kocabaş, F; Can, T; Baykal, N

    2011-12-01

    The number of microarray and other high-throughput experiments on primary repositories keeps increasing as do the size and complexity of the results in response to biomedical investigations. Initiatives have been started on standardization of content, object model, exchange format and ontology. However, there are backlogs and inability to exchange data between microarray repositories, which indicate that there is a great need for a standard format and data management. We have introduced a metadata framework that includes a metadata card and semantic nets that make experimental results visible, understandable and usable. These are encoded in syntax encoding schemes and represented in RDF (Resource Description Frame-word), can be integrated with other metadata cards and semantic nets, and can be exchanged, shared and queried. We demonstrated the performance and potential benefits through a case study on a selected microarray repository. We concluded that the backlogs can be reduced and that exchange of information and asking of knowledge discovery questions can become possible with the use of this metadata framework.

  5. Metadata-catalogue of European spatial datasets

    NARCIS (Netherlands)

    Willemen, J.P.M.; Kooistra, L.

    2004-01-01

    In order to facilitate a more effective accessibility of European spatial datasets, an assessment was carried out by the GeoDesk of the WUR to identify and describe key datasets that will be relevant for research carried out within WUR and MNP. The outline of the Metadata catalogue European spatial

  6. Metadata Effectiveness in Internet Discovery: An Analysis of Digital Collection Metadata Elements and Internet Search Engine Keywords

    Science.gov (United States)

    Yang, Le

    2016-01-01

    This study analyzed digital item metadata and keywords from Internet search engines to learn what metadata elements actually facilitate discovery of digital collections through Internet keyword searching and how significantly each metadata element affects the discovery of items in a digital repository. The study found that keywords from Internet…

  7. A Flexible Online Metadata Editing and Management System

    Energy Technology Data Exchange (ETDEWEB)

    Aguilar, Raul [Arizona State University; Pan, Jerry Yun [ORNL; Gries, Corinna [Arizona State University; Inigo, Gil San [University of New Mexico, Albuquerque; Palanisamy, Giri [ORNL

    2010-01-01

    A metadata editing and management system is being developed employing state of the art XML technologies. A modular and distributed design was chosen for scalability, flexibility, options for customizations, and the possibility to add more functionality at a later stage. The system consists of a desktop design tool or schema walker used to generate code for the actual online editor, a native XML database, and an online user access management application. The design tool is a Java Swing application that reads an XML schema, provides the designer with options to combine input fields into online forms and give the fields user friendly tags. Based on design decisions, the tool generates code for the online metadata editor. The code generated is an implementation of the XForms standard using the Orbeon Framework. The design tool fulfills two requirements: First, data entry forms based on one schema may be customized at design time and second data entry applications may be generated for any valid XML schema without relying on custom information in the schema. However, the customized information generated at design time is saved in a configuration file which may be re-used and changed again in the design tool. Future developments will add functionality to the design tool to integrate help text, tool tips, project specific keyword lists, and thesaurus services. Additional styling of the finished editor is accomplished via cascading style sheets which may be further customized and different look-and-feels may be accumulated through the community process. The customized editor produces XML files in compliance with the original schema, however, data from the current page is saved into a native XML database whenever the user moves to the next screen or pushes the save button independently of validity. Currently the system uses the open source XML database eXist for storage and management, which comes with third party online and desktop management tools. However, access to

  8. Multi-facetted Metadata - Describing datasets with different metadata schemas at the same time

    Science.gov (United States)

    Ulbricht, Damian; Klump, Jens; Bertelmann, Roland

    2013-04-01

    Inspired by the wish to re-use research data a lot of work is done to bring data systems of the earth sciences together. Discovery metadata is disseminated to data portals to allow building of customized indexes of catalogued dataset items. Data that were once acquired in the context of a scientific project are open for reappraisal and can now be used by scientists that were not part of the original research team. To make data re-use easier, measurement methods and measurement parameters must be documented in an application metadata schema and described in a written publication. Linking datasets to publications - as DataCite [1] does - requires again a specific metadata schema and every new use context of the measured data may require yet another metadata schema sharing only a subset of information with the meta information already present. To cope with the problem of metadata schema diversity in our common data repository at GFZ Potsdam we established a solution to store file-based research data and describe these with an arbitrary number of metadata schemas. Core component of the data repository is an eSciDoc infrastructure that provides versioned container objects, called eSciDoc [2] "items". The eSciDoc content model allows assigning files to "items" and adding any number of metadata records to these "items". The eSciDoc items can be submitted, revised, and finally published, which makes the data and metadata available through the internet worldwide. GFZ Potsdam uses eSciDoc to support its scientific publishing workflow, including mechanisms for data review in peer review processes by providing temporary web links for external reviewers that do not have credentials to access the data. Based on the eSciDoc API, panMetaDocs [3] provides a web portal for data management in research projects. PanMetaDocs, which is based on panMetaWorks [4], is a PHP based web application that allows to describe data with any XML-based schema. It uses the eSciDoc infrastructures

  9. Mining Building Metadata by Data Stream Comparison

    DEFF Research Database (Denmark)

    Holmegaard, Emil; Kjærgaard, Mikkel Baun

    2016-01-01

    ways to annotate sensor and actuation points. This makes it difficult to create intuitive queries for retrieving data streams from points. Another problem is the amount of insufficient or missing metadata. We introduce Metafier, a tool for extracting metadata from comparing data streams. Metafier...... are Dynamic Time Warping (DTW), Empirical Mode Decomposition (EMD), and the differential coefficient. Two of the algorithms compare the slope of the data stream in the values. EMD finds similarities based on the frequency bands among the data stream. By using several algorithms the system is robust enough...... to handle data streams with only slightly similar patterns. We have evaluated Metafier with points and data from one building located in Denmark. We have evaluated Metafier with 903 points, and the overall accuracy, with only 3 known examples, was 94.71%. Furthermore we found that using DTW for mining...

  10. Metadata Analysis at the Command-Line

    Directory of Open Access Journals (Sweden)

    Mark Phillips

    2013-01-01

    Full Text Available Over the past few years the University of North Texas Libraries' Digital Projects Unit (DPU has developed a set of metadata analysis tools, processes, and methodologies aimed at helping to focus limited quality control resources on the areas of the collection where they might have the most benefit. The key to this work lies in its simplicity: records harvested from OAI-PMH-enabled digital repositories are transformed into a format that makes them easily parsable using traditional Unix/Linux-based command-line tools. This article describes the overall methodology, introduces two simple open-source tools developed to help with the aforementioned harvesting and breaking, and provides example commands to demonstrate some common metadata analysis requests. All software tools described in the article are available with an open-source license via the author's GitHub account.

  11. A Highly Available Grid Metadata Catalog

    DEFF Research Database (Denmark)

    Jensen, Henrik Thostrup; Kleist, Joshva

    2009-01-01

    This article presents a metadata catalog, intended foruse in grids. The catalog provides high availability, by replication across several hosts. The replicas are kept consistent using a replication protocol based on the Paxos algorithm. A majority of the replicas must be available in order...... HTTP with proxy certificates, and uses GACL for flexible access control.The performance of the catalog is tested in several ways, including a distributed setup between geographically separated sites....

  12. GraphMeta: Managing HPC Rich Metadata in Graphs

    Energy Technology Data Exchange (ETDEWEB)

    Dai, Dong; Chen, Yong; Carns, Philip; Jenkins, John; Zhang, Wei; Ross, Robert

    2016-01-01

    High-performance computing (HPC) systems face increasingly critical metadata management challenges, especially in the approaching exascale era. These challenges arise not only from exploding metadata volumes, but also from increasingly diverse metadata, which contains data provenance and arbitrary user-defined attributes in addition to traditional POSIX metadata. This ‘rich’ metadata is becoming critical to supporting advanced data management functionality such as data auditing and validation. In our prior work, we identified a graph-based model as a promising solution to uniformly manage HPC rich metadata due to its flexibility and generality. However, at the same time, graph-based HPC rich metadata anagement also introduces significant challenges to the underlying infrastructure. In this study, we first identify the challenges on the underlying infrastructure to support scalable, high-performance rich metadata management. Based on that, we introduce GraphMeta, a graphbased engine designed for this use case. It achieves performance scalability by introducing a new graph partitioning algorithm and a write-optimal storage engine. We evaluate GraphMeta under both synthetic and real HPC metadata workloads, compare it with other approaches, and demonstrate its advantages in terms of efficiency and usability for rich metadata management in HPC systems.

  13. Mdmap: A Tool for Metadata Collection and Matching

    Directory of Open Access Journals (Sweden)

    Rico Simke

    2014-10-01

    Full Text Available This paper describes a front-end for the semi-automatic collection, matching, and generation of bibliographic metadata obtained from different sources for use within a digitization architecture. The Library of a Billion Words project is building an infrastructure for digitizing text that requires high-quality bibliographic metadata, but currently only sparse metadata from digitized editions is available. The project’s approach is to collect metadata for each digitized item from as many sources as possible. An expert user can then use an intuitive front-end tool to choose matching metadata. The collected metadata are centrally displayed in an interactive grid view. The user can choose which metadata they want to assign to a certain edition, and export these data as MARCXML. This paper presents a new approach to bibliographic work and metadata correction. We try to achieve a high quality of the metadata by generating a large amount of metadata to choose from, as well as by giving librarians an intuitive tool to manage their data.

  14. Leveraging Metadata to Create Interactive Images... Today!

    Science.gov (United States)

    Hurt, Robert L.; Squires, G. K.; Llamas, J.; Rosenthal, C.; Brinkworth, C.; Fay, J.

    2011-01-01

    The image gallery for NASA's Spitzer Space Telescope has been newly rebuilt to fully support the Astronomy Visualization Metadata (AVM) standard to create a new user experience both on the website and in other applications. We encapsulate all the key descriptive information for a public image, including color representations and astronomical and sky coordinates and make it accessible in a user-friendly form on the website, but also embed the same metadata within the image files themselves. Thus, images downloaded from the site will carry with them all their descriptive information. Real-world benefits include display of general metadata when such images are imported into image editing software (e.g. Photoshop) or image catalog software (e.g. iPhoto). More advanced support in Microsoft's WorldWide Telescope can open a tagged image after it has been downloaded and display it in its correct sky position, allowing comparison with observations from other observatories. An increasing number of software developers are implementing AVM support in applications and an online image archive for tagged images is under development at the Spitzer Science Center. Tagging images following the AVM offers ever-increasing benefits to public-friendly imagery in all its standard forms (JPEG, TIFF, PNG). The AVM standard is one part of the Virtual Astronomy Multimedia Project (VAMP); http://www.communicatingastronomy.org

  15. The Value of Data and Metadata Standardization for Interoperability in Giovanni

    Science.gov (United States)

    Smit, C.; Hegde, M.; Strub, R. F.; Bryant, K.; Li, A.; Petrenko, M.

    2017-12-01

    Giovanni (https://giovanni.gsfc.nasa.gov/giovanni/) is a data exploration and visualization tool at the NASA Goddard Earth Sciences Data Information Services Center (GES DISC). It has been around in one form or another for more than 15 years. Giovanni calculates simple statistics and produces 22 different visualizations for more than 1600 geophysical parameters from more than 90 satellite and model products. Giovanni relies on external data format standards to ensure interoperability, including the NetCDF CF Metadata Conventions. Unfortunately, these standards were insufficient to make Giovanni's internal data representation truly simple to use. Finding and working with dimensions can be convoluted with the CF Conventions. Furthermore, the CF Conventions are silent on machine-friendly descriptive metadata such as the parameter's source product and product version. In order to simplify analyzing disparate earth science data parameters in a unified way, we developed Giovanni's internal standard. First, the format standardizes parameter dimensions and variables so they can be easily found. Second, the format adds all the machine-friendly metadata Giovanni needs to present our parameters to users in a consistent and clear manner. At a glance, users can grasp all the pertinent information about parameters both during parameter selection and after visualization. This poster gives examples of how our metadata and data standards, both external and internal, have both simplified our code base and improved our users' experiences.

  16. The Semantics of Metadata: Avalon Media System and the Move to RDF

    Directory of Open Access Journals (Sweden)

    Juliet L. Hardesty

    2017-07-01

    Full Text Available The Avalon Media System (Avalon provides access and management for digital audio and video collections in libraries and archives. The open source project is led by the libraries of Indiana University Bloomington and Northwestern University and is funded in part by grants from The Andrew W. Mellon Foundation and Institute of Museum and Library Services. Avalon is based on the Samvera Community (formerly Hydra Project software stack and uses Fedora as the digital repository back end. The Avalon project team is in the process of migrating digital repositories from Fedora 3 to Fedora 4 and incorporating metadata statements using the Resource Description Framework (RDF instead of XML files accompanying the digital objects in the repository. The Avalon team has worked on the migration path for technical metadata and is now working on the migration paths for structural metadata (PCDM and descriptive metadata (from MODS XML to RDF. This paper covers the decisions made to begin using RDF for software development and offers a window into how Semantic Web technology functions in the real world.

  17. NCI's national environmental research data collection: metadata management built on standards and preparing for the semantic web

    Science.gov (United States)

    Wang, Jingbo; Bastrakova, Irina; Evans, Ben; Gohar, Kashif; Santana, Fabiana; Wyborn, Lesley

    2015-04-01

    National Computational Infrastructure (NCI) manages national environmental research data collections (10+ PB) as part of its specialized high performance data node of the Research Data Storage Infrastructure (RDSI) program. We manage 40+ data collections using NCI's Data Management Plan (DMP), which is compatible with the ISO 19100 metadata standards. We utilize ISO standards to make sure our metadata is transferable and interoperable for sharing and harvesting. The DMP is used along with metadata from the data itself, to create a hierarchy of data collection, dataset and time series catalogues that is then exposed through GeoNetwork for standard discoverability. This hierarchy catalogues are linked using a parent-child relationship. The hierarchical infrastructure of our GeoNetwork catalogues system aims to address both discoverability and in-house administrative use-cases. At NCI, we are currently improving the metadata interoperability in our catalogue by linking with standardized community vocabulary services. These emerging vocabulary services are being established to help harmonise data from different national and international scientific communities. One such vocabulary service is currently being established by the Australian National Data Services (ANDS). Data citation is another important aspect of the NCI data infrastructure, which allows tracking of data usage and infrastructure investment, encourage data sharing, and increasing trust in research that is reliant on these data collections. We incorporate the standard vocabularies into the data citation metadata so that the data citation become machine readable and semantically friendly for web-search purpose as well. By standardizing our metadata structure across our entire data corpus, we are laying the foundation to enable the application of appropriate semantic mechanisms to enhance discovery and analysis of NCI's national environmental research data information. We expect that this will further

  18. An emergent theory of digital library metadata enrich then filter

    CERN Document Server

    Stevens, Brett

    2015-01-01

    An Emergent Theory of Digital Library Metadata is a reaction to the current digital library landscape that is being challenged with growing online collections and changing user expectations. The theory provides the conceptual underpinnings for a new approach which moves away from expert defined standardised metadata to a user driven approach with users as metadata co-creators. Moving away from definitive, authoritative, metadata to a system that reflects the diversity of users’ terminologies, it changes the current focus on metadata simplicity and efficiency to one of metadata enriching, which is a continuous and evolving process of data linking. From predefined description to information conceptualised, contextualised and filtered at the point of delivery. By presenting this shift, this book provides a coherent structure in which future technological developments can be considered.

  19. Understanding Condom Use Decision Making Among Homeless Youth Using Event-Level Data.

    Science.gov (United States)

    Rana, Yashodhara; Brown, Ryan A; Kennedy, David P; Ryan, Gery W; Stern, Stefanie; Tucker, Joan S

    2015-01-01

    This is one of the first qualitative event-based studies to understand the various mechanisms through which multiple factors influence condom use decision making among homeless youth. Event-level interviews that explore characteristics of the environment surrounding sexual events were conducted with 29 youth who were asked to describe two recent sexual encounters. In thematic analyses of data across events, reasons that youth gave for engaging in unprotected sex included the expectation of having sex and use of alternative methods of protection against pregnancy. Other nonevent factors that influenced condom use decision making were related to attributes of the partnership (e.g., testing, trust and love, and assessments of risk) and attributes of the youth (e.g., perceptions of diseases, concerns over pregnancy, and discomfort using condoms). Additional event analyses conducted within the same individuals found that decision making was influenced by multiple interacting factors, with different pathways operating for event and nonevent factors. Future interventions should consider taking a multilevel and individualized approach that focuses on event-based determinants of risky sex in this population.

  20. Demographic Predictors of Event-Level Associations between Alcohol Consumption and Sexual Behavior.

    Science.gov (United States)

    Wells, Brooke E; Rendina, H Jonathon; Kelly, Brian C; Golub, Sarit A; Parsons, Jeffrey T

    2016-02-01

    Alcohol consumption is associated with sexual behavior and outcomes, though research indicates a variety of moderating factors, including demographic characteristics. To better target interventions aimed at alcohol-related sexual risk behavior, our analyses simultaneously examine demographic predictors of both day- and event-level associations between alcohol consumption and sexual behavior in a sample of young adults (N = 301) who are sexually active and consume alcohol. Young adults (aged 18-29) recruited using time-space sampling and incentivized snowball sampling completed a survey and a timeline follow-back calendar reporting alcohol consumption and sexual behavior in the past 30 days. On a given day, a greater number of drinks consumed was associated with higher likelihood of sex occurring, particularly for women and single participants. During a given sexual event, number of drinks consumed was not associated with condom use, nor did any demographic predictors predict that association. Findings highlight associations between alcohol and sexual behavior, though not between alcohol and sexual risk behavior, highlighting the need for additional research exploring the complex role of alcohol in sexual risk behavior and the need to develop prevention efforts to minimize the role of alcohol in the initiation of sexual encounters.

  1. The role of metadata in managing large environmental science datasets. Proceedings

    Energy Technology Data Exchange (ETDEWEB)

    Melton, R.B.; DeVaney, D.M. [eds.] [Pacific Northwest Lab., Richland, WA (United States); French, J. C. [Univ. of Virginia, (United States)

    1995-06-01

    The purpose of this workshop was to bring together computer science researchers and environmental sciences data management practitioners to consider the role of metadata in managing large environmental sciences datasets. The objectives included: establishing a common definition of metadata; identifying categories of metadata; defining problems in managing metadata; and defining problems related to linking metadata with primary data.

  2. Assessing Metadata Quality of a Federally Sponsored Health Data Repository

    OpenAIRE

    Marc, David T.; Beattie, James; Herasevich, Vitaly; Gatewood, La?l; Zhang, Rui

    2017-01-01

    The U.S. Federal Government developed HealthData.gov to disseminate healthcare datasets to the public. Metadata is provided for each datasets and is the sole source of information to find and retrieve data. This study employed automated quality assessments of the HealthData.gov metadata published from 2012 to 2014 to measure completeness, accuracy, and consistency of applying standards. The results demonstrated that metadata published in earlier years had lower completeness, accuracy, and con...

  3. CMO: Cruise Metadata Organizer for JAMSTEC Research Cruises

    Science.gov (United States)

    Fukuda, K.; Saito, H.; Hanafusa, Y.; Vanroosebeke, A.; Kitayama, T.

    2011-12-01

    JAMSTEC's Data Research Center for Marine-Earth Sciences manages and distributes a wide variety of observational data and samples obtained from JAMSTEC research vessels and deep sea submersibles. Generally, metadata are essential to identify data and samples were obtained. In JAMSTEC, cruise metadata include cruise information such as cruise ID, name of vessel, research theme, and diving information such as dive number, name of submersible and position of diving point. They are submitted by chief scientists of research cruises in the Microsoft Excel° spreadsheet format, and registered into a data management database to confirm receipt of observational data files, cruise summaries, and cruise reports. The cruise metadata are also published via "JAMSTEC Data Site for Research Cruises" within two months after end of cruise. Furthermore, these metadata are distributed with observational data, images and samples via several data and sample distribution websites after a publication moratorium period. However, there are two operational issues in the metadata publishing process. One is that duplication efforts and asynchronous metadata across multiple distribution websites due to manual metadata entry into individual websites by administrators. The other is that differential data types or representation of metadata in each website. To solve those problems, we have developed a cruise metadata organizer (CMO) which allows cruise metadata to be connected from the data management database to several distribution websites. CMO is comprised of three components: an Extensible Markup Language (XML) database, an Enterprise Application Integration (EAI) software, and a web-based interface. The XML database is used because of its flexibility for any change of metadata. Daily differential uptake of metadata from the data management database to the XML database is automatically processed via the EAI software. Some metadata are entered into the XML database using the web

  4. CHIME: A Metadata-Based Distributed Software Development Environment

    National Research Council Canada - National Science Library

    Dossick, Stephen E; Kaiser, Gail E

    2005-01-01

    We introduce CHIME, the Columbia Hypermedia IMmersion Environment, a metadata-based information environment, and describe its potential applications for internet and intranet-based distributed software development...

  5. Migration of the ATLAS Metadata Interface (AMI) to Web 2.0 and cloud

    Science.gov (United States)

    Odier, J.; Albrand, S.; Fulachier, J.; Lambert, F.

    2015-12-01

    The ATLAS Metadata Interface (AMI), a mature application of more than 10 years of existence, is currently under adaptation to some recently available technologies. The web interfaces, which previously manipulated XML documents using XSL transformations, are being migrated to Asynchronous JavaScript (AJAX). Web development is considerably simplified by the introduction of a framework based on JQuery and Twitter Bootstrap. Finally, the AMI services are being migrated to an OpenStack cloud infrastructure.

  6. Migration of the ATLAS Metadata Interface (AMI) to Web 2.0 and cloud

    CERN Document Server

    Odier, Jerome; The ATLAS collaboration; Fulachier, Jerome; Lambert, Fabian

    2015-01-01

    The ATLAS Metadata Interface (AMI), a mature application of more than 10 years of existence, is currently under adaptation to some recently available technologies. The web interfaces, which previously manipulated XML documents using XSL transformations, are being migrated to Asynchronous JavaScript (AJAX). Web development is considerably simplified by the introduction of a framework based on JQuery and Twitter Bootstrap. Finally, the AMI services are being migrated to an OpenStack cloud infrastructure.

  7. NERIES: Seismic Data Gateways and User Composed Datasets Metadata Management

    Science.gov (United States)

    Spinuso, Alessandro; Trani, Luca; Kamb, Linus; Frobert, Laurent

    2010-05-01

    One of the NERIES EC project main objectives is to establish and improve the networking of seismic waveform data exchange and access among four main data centers in Europe: INGV, GFZ, ORFEUS and IPGP. Besides the implementation of the data backbone, several investigations and developments have been conducted in order to offer to the users the data available from this network, either programmatically or interactively. One of the challenges is to understand how to enable users` activities such as discovering, aggregating, describing and sharing datasets to obtain a decrease in the replication of similar data queries towards the network, exempting the data centers to guess and create useful pre-packed products. We`ve started to transfer this task more and more towards the users community, where the users` composed data products could be extensively re-used. The main link to the data is represented by a centralized webservice (SeismoLink) acting like a single access point to the whole data network. Users can download either waveform data or seismic station inventories directly from their own software routines by connecting to this webservice, which routes the request to the data centers. The provenance of the data is maintained and transferred to the users in the form of URIs, that identify the dataset and implicitly refer to the data provider. SeismoLink, combined with other webservices (eg EMSC-QuakeML earthquakes catalog service), is used from a community gateway such as the NERIES web portal (http://www.seismicportal.eu). Here the user interacts with a map based portlet which allows the dynamic composition of a data product, binding seismic event`s parameters with a set of seismic stations. The requested data is collected by the back-end processes of the portal, preserved and offered to the user in a personal data cart, where metadata can be generated interactively on-demand. The metadata, expressed in RDF, can also be remotely ingested. They offer rating

  8. Integrating Semantic Information in Metadata Descriptions for a Geoscience-wide Resource Inventory.

    Science.gov (United States)

    Zaslavsky, I.; Richard, S. M.; Gupta, A.; Valentine, D.; Whitenack, T.; Ozyurt, I. B.; Grethe, J. S.; Schachne, A.

    2016-12-01

    Integrating semantic information into legacy metadata catalogs is a challenging issue and so far has been mostly done on a limited scale. We present experience of CINERGI (Community Inventory of Earthcube Resources for Geoscience Interoperability), an NSF Earthcube Building Block project, in creating a large cross-disciplinary catalog of geoscience information resources to enable cross-domain discovery. The project developed a pipeline for automatically augmenting resource metadata, in particular generating keywords that describe metadata documents harvested from multiple geoscience information repositories or contributed by geoscientists through various channels including surveys and domain resource inventories. The pipeline examines available metadata descriptions using text parsing, vocabulary management and semantic annotation and graph navigation services of GeoSciGraph. GeoSciGraph, in turn, relies on a large cross-domain ontology of geoscience terms, which bridges several independently developed ontologies or taxonomies including SWEET, ENVO, YAGO, GeoSciML, GCMD, SWO, and CHEBI. The ontology content enables automatic extraction of keywords reflecting science domains, equipment used, geospatial features, measured properties, methods, processes, etc. We specifically focus on issues of cross-domain geoscience ontology creation, resolving several types of semantic conflicts among component ontologies or vocabularies, and constructing and managing facets for improved data discovery and navigation. The ontology and keyword generation rules are iteratively improved as pipeline results are presented to data managers for selective manual curation via a CINERGI Annotator user interface. We present lessons learned from applying CINERGI metadata augmentation pipeline to a number of federal agency and academic data registries, in the context of several use cases that require data discovery and integration across multiple earth science data catalogs of varying quality

  9. Multi-Unit Initiating Event Analysis for a Single-Unit Internal Events Level 1 PSA

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Dong San; Park, Jin Hee; Lim, Ho Gon [KAERI, Daejeon (Korea, Republic of)

    2016-05-15

    The Fukushima nuclear accident in 2011 highlighted the importance of considering the risks from multi-unit accidents at a site. The ASME/ANS probabilistic risk assessment (PRA) standard also includes some requirements related to multi-unit aspects, one of which (IE-B5) is as follows: 'For multi-unit sites with shared systems, DO NOT SUBSUME multi-unit initiating events if they impact mitigation capability [1].' However, the existing single-unit PSA models do not explicitly consider multi-unit initiating events and hence systems shared by multiple units (e.g., alternate AC diesel generator) are fully credited for the single unit and ignores the need for the shared systems by other units at the same site [2]. This paper describes the results of the multi-unit initiating event (IE) analysis performed as a part of the at-power internal events Level 1 probabilistic safety assessment (PSA) for an OPR1000 single unit ('reference unit'). In this study, a multi-unit initiating event analysis for a single-unit PSA was performed, and using the results, dual-unit LOOP initiating event was added to the existing PSA model for the reference unit (OPR1000 type). Event trees were developed for dual-unit LOOP and dual-unit SBO which can be transferred from dual- unit LOOP. Moreover, CCF basic events for 5 diesel generators were modelled. In case of simultaneous SBO occurrences in both units, this study compared two different assumptions on the availability of the AAC D/G. As a result, when dual-unit LOOP initiating event was added to the existing single-unit PSA model, the total CDF increased by 1∼ 2% depending on the probability that the AAC D/G is available to a specific unit in case of simultaneous SBO in both units.

  10. Metadata For Identity Management of Population Registers

    Directory of Open Access Journals (Sweden)

    Olivier Glassey

    2011-04-01

    Full Text Available A population register is an inventory of residents within a country, with their characteristics (date of birth, sex, marital status, etc. and other socio-economic data, such as occupation or education. However, data on population are also stored in numerous other public registers such as tax, land, building and housing, military, foreigners, vehicles, etc. Altogether they contain vast amounts of personal and sensitive information. Access to public information is granted by law in many countries, but this transparency is generally subject to tensions with data protection laws. This paper proposes a framework to analyze data access (or protection requirements, as well as a model of metadata for data exchange.

  11. Metadata: A user`s view

    Energy Technology Data Exchange (ETDEWEB)

    Bretherton, F.P. [Univ. of Wisconsin, Madison, WI (United States); Singley, P.T. [Oak Ridge National Lab., TN (United States)

    1994-12-31

    An analysis is presented of the uses of metadata from four aspects of database operations: (1) search, query, retrieval, (2) ingest, quality control, processing, (3) application to application transfer; (4) storage, archive. Typical degrees of database functionality ranging from simple file retrieval to interdisciplinary global query with metadatabase-user dialog and involving many distributed autonomous databases, are ranked in approximate order of increasing sophistication of the required knowledge representation. An architecture is outlined for implementing such functionality in many different disciplinary domains utilizing a variety of off the shelf database management subsystems and processor software, each specialized to a different abstract data model.

  12. Information resource description creating and managing metadata

    CERN Document Server

    Hider, Philip

    2012-01-01

    An overview of the field of information organization that examines resource description as both a product and process of the contemporary digital environment.This timely book employs the unifying mechanism of the semantic web and the resource description framework to integrate the various traditions and practices of information and knowledge organization. Uniquely, it covers both the domain-specific traditions and practices and the practices of the ?metadata movement' through a single lens ? that of resource description in the broadest, semantic web sense.This approach more readily accommodate

  13. Metadata Laws, Journalism and Resistance in Australia

    Directory of Open Access Journals (Sweden)

    Benedetta Brevini

    2017-03-01

    Full Text Available The intelligence leaks from Edward Snowden in 2013 unveiled the sophistication and extent of data collection by the United States’ National Security Agency and major global digital firms prompting domestic and international debates about the balance between security and privacy, openness and enclosure, accountability and secrecy. It is difficult not to see a clear connection with the Snowden leaks in the sharp acceleration of new national security legislations in Australia, a long term member of the Five Eyes Alliance. In October 2015, the Australian federal government passed controversial laws that require telecommunications companies to retain the metadata of their customers for a period of two years. The new acts pose serious threats for the profession of journalism as they enable government agencies to easily identify and pursue journalists’ sources. Bulk data collections of this type of information deter future whistleblowers from approaching journalists, making the performance of the latter’s democratic role a challenge. After situating this debate within the scholarly literature at the intersection between surveillance studies and communication studies, this article discusses the political context in which journalists are operating and working in Australia; assesses how metadata laws have affected journalism practices and addresses the possibility for resistance.

  14. Metadata Access Tool for Climate and Health

    Science.gov (United States)

    Trtanji, J.

    2012-12-01

    The need for health information resources to support climate change adaptation and mitigation decisions is growing, both in the United States and around the world, as the manifestations of climate change become more evident and widespread. In many instances, these information resources are not specific to a changing climate, but have either been developed or are highly relevant for addressing health issues related to existing climate variability and weather extremes. To help address the need for more integrated data, the Interagency Cross-Cutting Group on Climate Change and Human Health, a working group of the U.S. Global Change Research Program, has developed the Metadata Access Tool for Climate and Health (MATCH). MATCH is a gateway to relevant information that can be used to solve problems at the nexus of climate science and public health by facilitating research, enabling scientific collaborations in a One Health approach, and promoting data stewardship that will enhance the quality and application of climate and health research. MATCH is a searchable clearinghouse of publicly available Federal metadata including monitoring and surveillance data sets, early warning systems, and tools for characterizing the health impacts of global climate change. Examples of relevant databases include the Centers for Disease Control and Prevention's Environmental Public Health Tracking System and NOAA's National Climate Data Center's national and state temperature and precipitation data. This presentation will introduce the audience to this new web-based geoportal and demonstrate its features and potential applications.

  15. Forensic devices for activism: Metadata tracking and public proof

    NARCIS (Netherlands)

    van der Velden, L.

    2015-01-01

    The central topic of this paper is a mobile phone application, ‘InformaCam’, which turns metadata from a surveillance risk into a method for the production of public proof. InformaCam allows one to manage and delete metadata from images and videos in order to diminish surveillance risks related to

  16. Metadata as a means for correspondence on digital media

    NARCIS (Netherlands)

    Stouffs, R.; Kooistra, J.; Tuncer, B.

    2004-01-01

    Metadata derive their action from their association to data and from the relationship they maintain with this data. An interpretation of this action is that the metadata lays claim to the data collection to which it is associated, where the claim is successful if the data collection gains quality as

  17. Shared Geospatial Metadata Repository for Ontario University Libraries: Collaborative Approaches

    Science.gov (United States)

    Forward, Erin; Leahey, Amber; Trimble, Leanne

    2015-01-01

    Successfully providing access to special collections of digital geospatial data in academic libraries relies upon complete and accurate metadata. Creating and maintaining metadata using specialized standards is a formidable challenge for libraries. The Ontario Council of University Libraries' Scholars GeoPortal project, which created a shared…

  18. Learning Object Metadata in a Web-Based Learning Environment

    NARCIS (Netherlands)

    Avgeriou, Paris; Koutoumanos, Anastasios; Retalis, Symeon; Papaspyrou, Nikolaos

    2000-01-01

    The plethora and variance of learning resources embedded in modern web-based learning environments require a mechanism to enable their structured administration. This goal can be achieved by defining metadata on them and constructing a system that manages the metadata in the context of the learning

  19. Making the Case for Embedded Metadata in Digital Images

    DEFF Research Database (Denmark)

    Smith, Kari R.; Saunders, Sarah; Kejser, U.B.

    2014-01-01

    exchange in heritage institutions and the culture sector. Our examples and findings support the case for embedded metadata in digital images and the opportunities for such use more broadly in non-heritage sectors as well. We encourage the adoption of embedded metadata by digital image content creators......This paper discusses the standards, methods, use cases, and opportunities for using embedded metadata in digital images. In this paper we explain the past and current work engaged with developing specifications, standards for embedding metadata of different types, and the practicalities of data...... and curators as well as those developing software and hardware that support the creation or re-use of digital images. We conclude that the usability of born digital images as well as physical objects that are digitized can be extended and the files preserved more readily with embedded metadata....

  20. Interpreting the ASTM 'content standard for digital geospatial metadata'

    Science.gov (United States)

    Nebert, Douglas D.

    1996-01-01

    ASTM and the Federal Geographic Data Committee have developed a content standard for spatial metadata to facilitate documentation, discovery, and retrieval of digital spatial data using vendor-independent terminology. Spatial metadata elements are identifiable quality and content characteristics of a data set that can be tied to a geographic location or area. Several Office of Management and Budget Circulars and initiatives have been issued that specify improved cataloguing of and accessibility to federal data holdings. An Executive Order further requires the use of the metadata content standard to document digital spatial data sets. Collection and reporting of spatial metadata for field investigations performed for the federal government is an anticipated requirement. This paper provides an overview of the draft spatial metadata content standard and a description of how the standard could be applied to investigations collecting spatially-referenced field data.

  1. EXIF Custom: Automatic image metadata extraction for Scratchpads and Drupal

    Directory of Open Access Journals (Sweden)

    Ed Baker

    2013-09-01

    Full Text Available Many institutions and individuals use embedded metadata to aid in the management of their image collections. Many deskop image management solutions such as Adobe Bridge and online tools such as Flickr also make use of embedded metadata to describe, categorise and license images. Until now Scratchpads (a data management system and virtual research environment for biodiversity  have not made use of these metadata, and users have had to manually re-enter this information if they have wanted to display it on their Scratchpad site. The Drupal described here allows users to map metadata embedded in their images to the associated field in the Scratchpads image form using one or more customised mappings. The module works seamlessly with the bulk image uploader used on Scratchpads and it is therefore possible to upload hundreds of images easily with automatic metadata (EXIF, XMP and IPTC extraction and mapping.

  2. Managing ebook metadata in academic libraries taming the tiger

    CERN Document Server

    Frederick, Donna E

    2016-01-01

    Managing ebook Metadata in Academic Libraries: Taming the Tiger tackles the topic of ebooks in academic libraries, a trend that has been welcomed by students, faculty, researchers, and library staff. However, at the same time, the reality of acquiring ebooks, making them discoverable, and managing them presents library staff with many new challenges. Traditional methods of cataloging and managing library resources are no longer relevant where the purchasing of ebooks in packages and demand driven acquisitions are the predominant models for acquiring new content. Most academic libraries have a complex metadata environment wherein multiple systems draw upon the same metadata for different purposes. This complexity makes the need for standards-based interoperable metadata more important than ever. In addition to complexity, the nature of the metadata environment itself typically varies slightly from library to library making it difficult to recommend a single set of practices and procedures which would be releva...

  3. Metafier - a Tool for Annotating and Structuring Building Metadata

    DEFF Research Database (Denmark)

    Holmegaard, Emil; Johansen, Aslak; Kjærgaard, Mikkel Baun

    2017-01-01

    , describing the instrumentation of the building. We have created Metafier, a tool for annotating and structuring metadata for buildings. Metafier optimizes the workflow of establishing metadata for buildings by enabling a human-in-the-loop to validate, search and group points. We have evaluated Metafier...... for two buildings, with different sizes, locations, ages and purposes. The evaluation was performed as a user test with three subjects with different backgrounds. The evaluation results indicates that the tool enabled the users to validate, search and group points while annotating metadata. One challenge...... is to get users to understand the concept of metadata for the tool to be useable. Based on our evaluation, we have listed guidelines for creating a tool for annotating building metadata....

  4. Department of the Interior metadata implementation guide—Framework for developing the metadata component for data resource management

    Science.gov (United States)

    Obuch, Raymond C.; Carlino, Jennifer; Zhang, Lin; Blythe, Jonathan; Dietrich, Christopher; Hawkinson, Christine

    2018-04-12

    The Department of the Interior (DOI) is a Federal agency with over 90,000 employees across 10 bureaus and 8 agency offices. Its primary mission is to protect and manage the Nation’s natural resources and cultural heritage; provide scientific and other information about those resources; and honor its trust responsibilities or special commitments to American Indians, Alaska Natives, and affiliated island communities. Data and information are critical in day-to-day operational decision making and scientific research. DOI is committed to creating, documenting, managing, and sharing high-quality data and metadata in and across its various programs that support its mission. Documenting data through metadata is essential in realizing the value of data as an enterprise asset. The completeness, consistency, and timeliness of metadata affect users’ ability to search for and discover the most relevant data for the intended purpose; and facilitates the interoperability and usability of these data among DOI bureaus and offices. Fully documented metadata describe data usability, quality, accuracy, provenance, and meaning.Across DOI, there are different maturity levels and phases of information and metadata management implementations. The Department has organized a committee consisting of bureau-level points-of-contacts to collaborate on the development of more consistent, standardized, and more effective metadata management practices and guidance to support this shared mission and the information needs of the Department. DOI’s metadata implementation plans establish key roles and responsibilities associated with metadata management processes, procedures, and a series of actions defined in three major metadata implementation phases including: (1) Getting started—Planning Phase, (2) Implementing and Maintaining Operational Metadata Management Phase, and (3) the Next Steps towards Improving Metadata Management Phase. DOI’s phased approach for metadata management addresses

  5. Drinking motives moderate the effect of the social environment on alcohol use: An event-level study among young adults

    NARCIS (Netherlands)

    Smit, K.; Groefsema, M.M.; Luijten, M.; Engels, R.C.M.E.; Kuntsche, E.N.

    2015-01-01

    Objective: The purpose of this study was to test (a) whether drinking motives predict event-level drinking on weekend evenings; (b) whether the number of friends present in social situations was associated with drinking on weekend evenings; and (c) whether drinking motives moderate the association

  6. Taxonomic names, metadata, and the Semantic Web

    Directory of Open Access Journals (Sweden)

    Roderic D. M. Page

    2006-01-01

    Full Text Available Life Science Identifiers (LSIDs offer an attractive solution to the problem of globally unique identifiers for digital objects in biology. However, I suggest that in the context of taxonomic names, the most compelling benefit of adopting these identifiers comes from the metadata associated with each LSID. By using existing vocabularies wherever possible, and using a simple vocabulary for taxonomy-specific concepts we can quickly capture the essential information about a taxonomic name in the Resource Description Framework (RDF format. This opens up the prospect of using technologies developed for the Semantic Web to add ``taxonomic intelligence" to biodiversity databases. This essay explores some of these ideas in the context of providing a taxonomic framework for the phylogenetic database TreeBASE.

  7. Metadata specification in a dynamic geometry software

    Science.gov (United States)

    Radaković, Davorka; Herceg, Äńorde

    2017-07-01

    Attributes in C# are a mechanism that provides association of declarative information with C# code such as classes, types, methods, properties, namespaces etc. Once defined and associated with a program entity, an attribute can be queried at run time. However, the attributes have certain restrictions which limit their application to representing complex metadata necessary for development of dynamic geometry software (DGS). We have devised a solution, independent of attributes, which was developed to overcome the limitations, while maintaining the functionality of attributes. Our solution covers a wide range of uses, from providing extensibility to a functional programming language and declaring new data types and operations, to being a foundation for runtime optimizations of expression tree evaluation, and helpful user interface features, such as code completion.

  8. Semantic Metadata for Heterogeneous Spatial Planning Documents

    Science.gov (United States)

    Iwaniak, A.; Kaczmarek, I.; Łukowicz, J.; Strzelecki, M.; Coetzee, S.; Paluszyński, W.

    2016-09-01

    Spatial planning documents contain information about the principles and rights of land use in different zones of a local authority. They are the basis for administrative decision making in support of sustainable development. In Poland these documents are published on the Web according to a prescribed non-extendable XML schema, designed for optimum presentation to humans in HTML web pages. There is no document standard, and limited functionality exists for adding references to external resources. The text in these documents is discoverable and searchable by general-purpose web search engines, but the semantics of the content cannot be discovered or queried. The spatial information in these documents is geographically referenced but not machine-readable. Major manual efforts are required to integrate such heterogeneous spatial planning documents from various local authorities for analysis, scenario planning and decision support. This article presents results of an implementation using machine-readable semantic metadata to identify relationships among regulations in the text, spatial objects in the drawings and links to external resources. A spatial planning ontology was used to annotate different sections of spatial planning documents with semantic metadata in the Resource Description Framework in Attributes (RDFa). The semantic interpretation of the content, links between document elements and links to external resources were embedded in XHTML pages. An example and use case from the spatial planning domain in Poland is presented to evaluate its efficiency and applicability. The solution enables the automated integration of spatial planning documents from multiple local authorities to assist decision makers with understanding and interpreting spatial planning information. The approach is equally applicable to legal documents from other countries and domains, such as cultural heritage and environmental management.

  9. Design and Implementation of a Metadata-rich File System

    Energy Technology Data Exchange (ETDEWEB)

    Ames, S; Gokhale, M B; Maltzahn, C

    2010-01-19

    Despite continual improvements in the performance and reliability of large scale file systems, the management of user-defined file system metadata has changed little in the past decade. The mismatch between the size and complexity of large scale data stores and their ability to organize and query their metadata has led to a de facto standard in which raw data is stored in traditional file systems, while related, application-specific metadata is stored in relational databases. This separation of data and semantic metadata requires considerable effort to maintain consistency and can result in complex, slow, and inflexible system operation. To address these problems, we have developed the Quasar File System (QFS), a metadata-rich file system in which files, user-defined attributes, and file relationships are all first class objects. In contrast to hierarchical file systems and relational databases, QFS defines a graph data model composed of files and their relationships. QFS incorporates Quasar, an XPATH-extended query language for searching the file system. Results from our QFS prototype show the effectiveness of this approach. Compared to the de facto standard, the QFS prototype shows superior ingest performance and comparable query performance on user metadata-intensive operations and superior performance on normal file metadata operations.

  10. Metadata Creation, Management and Search System for your Scientific Data

    Science.gov (United States)

    Devarakonda, R.; Palanisamy, G.

    2012-12-01

    Mercury Search Systems is a set of tools for creating, searching, and retrieving of biogeochemical metadata. Mercury toolset provides orders of magnitude improvements in search speed, support for any metadata format, integration with Google Maps for spatial queries, multi-facetted type search, search suggestions, support for RSS (Really Simple Syndication) delivery of search results, and enhanced customization to meet the needs of the multiple projects that use Mercury. Mercury's metadata editor provides a easy way for creating metadata and Mercury's search interface provides a single portal to search for data and information contained in disparate data management systems, each of which may use any metadata format including FGDC, ISO-19115, Dublin-Core, Darwin-Core, DIF, ECHO, and EML. Mercury harvests metadata and key data from contributing project servers distributed around the world and builds a centralized index. The search interfaces then allow the users to perform a variety of fielded, spatial, and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury is being used more than 14 different projects across 4 federal agencies. It was originally developed for NASA, with continuing development funded by NASA, USGS, and DOE for a consortium of projects. Mercury search won the NASA's Earth Science Data Systems Software Reuse Award in 2008. References: R. Devarakonda, G. Palanisamy, B.E. Wilson, and J.M. Green, "Mercury: reusable metadata management data discovery and access system", Earth Science Informatics, vol. 3, no. 1, pp. 87-94, May 2010. R. Devarakonda, G. Palanisamy, J.M. Green, B.E. Wilson, "Data sharing and retrieval using OAI-PMH", Earth Science Informatics DOI: 10.1007/s12145-010-0073-0, (2010);

  11. Composing Distributed Services for Selection and Retrieval of Event Data in the ATLAS Experiment

    CERN Document Server

    Vinek, E; The ATLAS collaboration

    2011-01-01

    TAGs are event-level metadata allowing a quick search for interesting events for further analysis, based on selection criteria defined by the user. They are stored in a file-based format as well as in relational databases. The overall TAG system encompasses a range of web services providing functionality for the required use cases. The data as well as the services are replicated to several ATLAS sites, i.e. inside each service group there exist several concrete deployments, differing only in site-related non-functional attributes. In order to satisfy a user's request, the above mentioned atomic data sources and web services have to be composed on demand to provide the required functionality. As several instances of each service exist, one service has to be selected out of each group. The overall goal is to maximize the system’s throughput, in order to give to as many users as possible efficient access to the TAGs, while meeting end-to-end quality of service (QoS) requirements. Many approaches can be found t...

  12. Composing Distributed Services for Selection and Retrieval of Event Data in the ATLAS Experiment

    CERN Document Server

    Vinek, E; The ATLAS collaboration; Zhang, Q

    2010-01-01

    TAGs are event-level metadata allowing a quick search for interesting events for further analysis, based on selection criteria defined by the user. They are stored in a file-based format as well as in relational databases. The overall TAG system encompasses a range of web services providing functionality for the required use cases. The data as well as the services are replicated to several ATLAS sites, i.e. inside each service group there exist several concrete deployments, differing only in site-related non-functional attributes. In order to satisfy a user’s request, the above mentioned atomic data sources and web services have to be composed on demand to provide the full functionality. As several instances of each service exist, one service has to be selected out of each group. The overall goal is to maximize the system’s throughput, in order to give to as many users as possible efficient access to the TAGs, while meeting end-to-end quality of service (QoS) requirements. Many approaches can be found to ...

  13. Web Approach for Ontology-Based Classification, Integration, and Interdisciplinary Usage of Geoscience Metadata

    Directory of Open Access Journals (Sweden)

    B Ritschel

    2012-10-01

    Full Text Available The Semantic Web is a W3C approach that integrates the different sources of semantics within documents and services using ontology-based techniques. The main objective of this approach in the geoscience domain is the improvement of understanding, integration, and usage of Earth and space science related web content in terms of data, information, and knowledge for machines and people. The modeling and representation of semantic attributes and relations within and among documents can be realized by human readable concept maps and machine readable OWL documents. The objectives for the usage of the Semantic Web approach in the GFZ data center ISDC project are the design of an extended classification of metadata documents for product types related to instruments, platforms, and projects as well as the integration of different types of metadata related to data product providers, users, and data centers. Sources of content and semantics for the description of Earth and space science product types and related classes are standardized metadata documents (e.g., DIF documents, publications, grey literature, and Web pages. Other sources are information provided by users, such as tagging data and social navigation information. The integration of controlled vocabularies as well as folksonomies plays an important role in the design of well formed ontologies.

  14. Building a High Performance Metadata Broker using Clojure, NoSQL and Message Queues

    Science.gov (United States)

    Truslove, I.; Reed, S.

    2013-12-01

    In practice, Earth and Space Science Informatics often relies on getting more done with less: fewer hardware resources, less IT staff, fewer lines of code. As a capacity-building exercise focused on rapid development of high-performance geoinformatics software, the National Snow and Ice Data Center (NSIDC) built a prototype metadata brokering system using a new JVM language, modern database engines and virtualized or cloud computing resources. The metadata brokering system was developed with the overarching goals of (i) demonstrating a technically viable product with as little development effort as possible, (ii) using very new yet very popular tools and technologies in order to get the most value from the least legacy-encumbered code bases, and (iii) being a high-performance system by using scalable subcomponents, and implementation patterns typically used in web architectures. We implemented the system using the Clojure programming language (an interactive, dynamic, Lisp-like JVM language), Redis (a fast in-memory key-value store) as both the data store for original XML metadata content and as the provider for the message queueing service, and ElasticSearch for its search and indexing capabilities to generate search results. On evaluating the results of the prototyping process, we believe that the technical choices did in fact allow us to do more for less, due to the expressive nature of the Clojure programming language and its easy interoperability with Java libraries, and the successful reuse or re-application of high performance products or designs. This presentation will describe the architecture of the metadata brokering system, cover the tools and techniques used, and describe lessons learned, conclusions, and potential next steps.

  15. Opening up the cartographic heritage of the Spanish Geographical Institute by means of publishing standardized, Inspire compatible metadata

    Directory of Open Access Journals (Sweden)

    Joan Capdevila Subirana

    2013-02-01

    Full Text Available By making the most of the versatility that the Internet provides, and following the latest guidelines developed by the European Union through the Inspire and PSI Directives, the Spanish Geographical Institute has been undertaking in the last few years the release of geographic information in a free and interoperable way. Essential factors for this task are both creating metadata to describe and better use that information, and applying standards, both in the data model and in the web services. In this piece of work we will explain the application of these principles on the historical information held in the Technical Archive of the Spanish Geographical Institute. We will present the recent publication of 120,000 metadata records described as ISO (NEM model, which can be accessed through an Inspire compliant discovery service. This service is interrogated by an open source software catalogue client developed by the Spanish Geographical Institute.

  16. Requirements for multimedia metadata schemes in surveillance applications for security

    NARCIS (Netherlands)

    Rest, J.H.C. van; Grootjen, F.A.; Grootjen, M.; Wijn, R.; Aarts, O.A.J.; Roelofs, M.L.; Burghouts, G.J.; Bouma, H.; Alic, L.; Kraaij, W.

    2013-01-01

    Surveillance for security requires communication between systems and humans, involves behavioural and multimedia research, and demands an objective benchmarking for the performance of system components.Metadata representation schemes are extremely important to facilitate (system) interoperability

  17. Ontology-based Metadata Portal for Unified Semantics

    Data.gov (United States)

    National Aeronautics and Space Administration — The Ontology-based Metadata Portal for Unified Semantics (OlyMPUS) will extend the prototype Ontology-Driven Interactive Search Environment for Earth Sciences...

  18. Large geospatial images discovery: metadata model and technological framework

    Directory of Open Access Journals (Sweden)

    Lukáš Brůha

    2015-12-01

    Full Text Available The advancements in geospatial web technology triggered efforts for disclosure of valuable resources of historical collections. This paper focuses on the role of spatial data infrastructures (SDI in such efforts. The work describes the interplay between SDI technologies and potential use cases in libraries such as cartographic heritage. The metadata model is introduced to link up the sources from these two distinct fields. To enhance the data search capabilities, the work focuses on the representation of the content-based metadata of raster images, which is the crucial prerequisite to target the search in a more effective way. The architecture of the prototype system for automatic raster data processing, storage, analysis and distribution is introduced. The architecture responds to the characteristics of input datasets, namely to the continuous flow of very large raster data and related metadata. Proposed solutions are illustrated on the case study of cartometric analysis of digitised early maps and related metadata encoding.

  19. Distributed metadata in a high performance computing environment

    Science.gov (United States)

    Bent, John M.; Faibish, Sorin; Zhang, Zhenhua; Liu, Xuezhao; Tang, Haiying

    2017-07-11

    A computer-executable method, system, and computer program product for managing meta-data in a distributed storage system, wherein the distributed storage system includes one or more burst buffers enabled to operate with a distributed key-value store, the co computer-executable method, system, and computer program product comprising receiving a request for meta-data associated with a block of data stored in a first burst buffer of the one or more burst buffers in the distributed storage system, wherein the meta data is associated with a key-value, determining which of the one or more burst buffers stores the requested metadata, and upon determination that a first burst buffer of the one or more burst buffers stores the requested metadata, locating the key-value in a portion of the distributed key-value store accessible from the first burst buffer.

  20. Metadata and Metacognition: How can we stimulate reflection for learning?

    NARCIS (Netherlands)

    Specht, Marcus

    2012-01-01

    Specht, M. (2012, 12 September). Metadata and Metacognition: How can we stimulate reflection for learning? Invited presentation given at the seminar on awareness and reflection in learning at the University of Leuven, Leuven, Belgium.

  1. USGS 24k Digital Raster Graphic (DRG) Metadata

    Data.gov (United States)

    Minnesota Department of Natural Resources — Metadata for the scanned USGS 24k Topograpic Map Series (also known as 24k Digital Raster Graphic). Each scanned map is represented by a polygon in the layer and the...

  2. Linked Metadata - lightweight semantics for data integration (Invited)

    Science.gov (United States)

    Hendler, J. A.

    2013-12-01

    fly integration may prefer to do more traditional data queries and then convert and link the 'views' returned at retrieval time, providing another means of using the linked data infrastructure without having to convert whole datasets to triples to provide linking. Web companies have been taking advantage of 'lightweight' semantic metadata for search quality and optimization (cf. schema.org), linking networks within and without web sites (cf. Facebook's Open Graph Protocol), and in doing various kinds of advertisement and user modeling across datasets. Scientific metadata, on the other hand, has traditionally been geared at being largescale and highly descriptive, and scientific ontologies have been aimed at high expressivity, essentially providing complex reasoning services rather than the less expressive vocabularies needed for data discovery and simple mappings that can allow humans (or more complex systems) when full scale integration is needed. Although this work is just the beginning for providing integration, as the community creates more and more datasets, discovery of these data resources on the Web becomes a crucial starting place. Simple descriptors, that can be combined with textual fields and/or common community vocabularies, can be a great starting place on bringing scientific data into the Web of Data that is growing in other communities. References: [1] Pouchard, Line C., et al. "A Linked Science investigation: enhancing climate change data discovery with semantic technologies." Earth science informatics 6.3 (2013): 175-185.

  3. Assigning creative commons licenses to research metadata: issues and cases

    OpenAIRE

    Poblet, Marta

    2016-01-01

    This paper discusses the problem of lack of clear licensing and transparency of usage terms and conditions for research metadata. Making research data connected, discoverable and reusable are the key enablers of the new data revolution in research. We discuss how the lack of transparency hinders discovery of research data and make it disconnected from the publication and other trusted research outcomes. In addition, we discuss the application of Creative Commons licenses for research metadata...

  4. Massive Meta-Data: A New Data Mining Resource

    Science.gov (United States)

    Hugo, W.

    2012-04-01

    Worldwide standardisation, and interoperability initiatives such as GBIF, Open Access and GEOSS (to name but three of many) have led to the emergence of interlinked and overlapping meta-data repositories containing, potentially, tens of millions of entries collectively. This forms the backbone of an emerging global scientific data infrastructure that is both driven by changes in the way we work, and opens up new possibilities in management, research, and collaboration. Several initiatives are concentrated on building a generalised, shared, easily available, scalable, and indefinitely preserved scientific data infrastructure to aid future scientific work. This paper deals with the parallel aspect of the meta-data that will be used to support the global scientific data infrastructure. There are obvious practical issues (semantic interoperability and speed of discovery being the most important), but we are here more concerned with some of the less obvious conceptual questions and opportunities: 1. Can we use meta-data to assess, pinpoint, and reduce duplication of meta-data? 2. Can we use it to reduce overlaps of mandates in data portals, research collaborations, and research networks? 3. What possibilities exist for mining the relationships that exist implicitly in very large meta-data collections? 4. Is it possible to define an explicit 'scientific data infrastructure' as a complex, multi-relational network database, that can become self-maintaining and self-organising in true Web 2.0 and 'social networking' fashion? The paper provides a blueprint for a new approach to massive meta-data collections, and how this can be processed using established analysis techniques to answer the questions posed. It assesses the practical implications of working with standard meta-data definitions (such as ISO 19115, Dublin Core, and EML) in a meta-data mining context, and makes recommendations in respect of extension to support self-organising, semantically oriented 'networks of

  5. A Metadata Schema for Geospatial Resource Discovery Use Cases

    Directory of Open Access Journals (Sweden)

    Darren Hardy

    2014-07-01

    Full Text Available We introduce a metadata schema that focuses on GIS discovery use cases for patrons in a research library setting. Text search, faceted refinement, and spatial search and relevancy are among GeoBlacklight's primary use cases for federated geospatial holdings. The schema supports a variety of GIS data types and enables contextual, collection-oriented discovery applications as well as traditional portal applications. One key limitation of GIS resource discovery is the general lack of normative metadata practices, which has led to a proliferation of metadata schemas and duplicate records. The ISO 19115/19139 and FGDC standards specify metadata formats, but are intricate, lengthy, and not focused on discovery. Moreover, they require sophisticated authoring environments and cataloging expertise. Geographic metadata standards target preservation and quality measure use cases, but they do not provide for simple inter-institutional sharing of metadata for discovery use cases. To this end, our schema reuses elements from Dublin Core and GeoRSS to leverage their normative semantics, community best practices, open-source software implementations, and extensive examples already deployed in discovery contexts such as web search and mapping. Finally, we discuss a Solr implementation of the schema using a "geo" extension to MODS.

  6. Using Metadata to Build Geographic Information Sharing Environment on Internet

    Directory of Open Access Journals (Sweden)

    Chih-hong Sun

    1999-12-01

    Full Text Available Internet provides a convenient environment to share geographic information. Web GIS (Geographic Information System even provides users a direct access environment to geographic databases through Internet. However, the complexity of geographic data makes it difficult for users to understand the real content and the limitation of geographic information. In some cases, users may misuse the geographic data and make wrong decisions. Meanwhile, geographic data are distributed across various government agencies, academic institutes, and private organizations, which make it even more difficult for users to fully understand the content of these complex data. To overcome these difficulties, this research uses metadata as a guiding mechanism for users to fully understand the content and the limitation of geographic data. We introduce three metadata standards commonly used for geographic data and metadata authoring tools available in the US. We also review the current development of geographic metadata standard in Taiwan. Two metadata authoring tools are developed in this research, which will enable users to build their own geographic metadata easily.[Article content in Chinese

  7. Forensic devices for activism: Metadata tracking and public proof

    Directory of Open Access Journals (Sweden)

    Lonneke van der Velden

    2015-10-01

    Full Text Available The central topic of this paper is a mobile phone application, ‘InformaCam’, which turns metadata from a surveillance risk into a method for the production of public proof. InformaCam allows one to manage and delete metadata from images and videos in order to diminish surveillance risks related to online tracking. Furthermore, it structures and stores the metadata in such a way that the documentary material becomes better accommodated to evidentiary settings, if needed. In this paper I propose InformaCam should be interpreted as a ‘forensic device’. By using the conceptualization of forensics and work on socio-technical devices the paper discusses how InformaCam, through a range of interventions, rearranges metadata into a technology of evidence. InformaCam explicitly recognizes mobile phones as context aware, uses their sensors, and structures metadata in order to facilitate data analysis after images are captured. Through these modifications it invents a form of ‘sensory data forensics'. By treating data in this particular way, surveillance resistance does more than seeking awareness. It becomes engaged with investigatory practices. Considering the extent by which states conduct metadata surveillance, the project can be seen as a timely response to the unequal distribution of power over data.

  8. Automated metadata--final project report

    Energy Technology Data Exchange (ETDEWEB)

    Schissel, David [General Atomics, San Diego, CA (United States)

    2016-04-01

    This report summarizes the work of the Automated Metadata, Provenance Cataloging, and Navigable Interfaces: Ensuring the Usefulness of Extreme-Scale Data Project (MPO Project) funded by the United States Department of Energy (DOE), Offices of Advanced Scientific Computing Research and Fusion Energy Sciences. Initially funded for three years starting in 2012, it was extended for 6 months with additional funding. The project was a collaboration between scientists at General Atomics, Lawrence Berkley National Laboratory (LBNL), and Massachusetts Institute of Technology (MIT). The group leveraged existing computer science technology where possible, and extended or created new capabilities where required. The MPO project was able to successfully create a suite of software tools that can be used by a scientific community to automatically document their scientific workflows. These tools were integrated into workflows for fusion energy and climate research illustrating the general applicability of the project’s toolkit. Feedback was very positive on the project’s toolkit and the value of such automatic workflow documentation to the scientific endeavor.

  9. Automated metadata--final project report

    International Nuclear Information System (INIS)

    Schissel, David

    2016-01-01

    This report summarizes the work of the Automated Metadata, Provenance Cataloging, and Navigable Interfaces: Ensuring the Usefulness of Extreme-Scale Data Project (MPO Project) funded by the United States Department of Energy (DOE), Offices of Advanced Scientific Computing Research and Fusion Energy Sciences. Initially funded for three years starting in 2012, it was extended for 6 months with additional funding. The project was a collaboration between scientists at General Atomics, Lawrence Berkley National Laboratory (LBNL), and Massachusetts Institute of Technology (MIT). The group leveraged existing computer science technology where possible, and extended or created new capabilities where required. The MPO project was able to successfully create a suite of software tools that can be used by a scientific community to automatically document their scientific workflows. These tools were integrated into workflows for fusion energy and climate research illustrating the general applicability of the project's toolkit. Feedback was very positive on the project's toolkit and the value of such automatic workflow documentation to the scientific endeavor.

  10. Educational Rationale Metadata for Learning Objects

    Directory of Open Access Journals (Sweden)

    Tom Carey

    2002-10-01

    Full Text Available Instructors searching for learning objects in online repositories will be guided in their choices by the content of the object, the characteristics of the learners addressed, and the learning process embodied in the object. We report here on a feasibility study for metadata to record process-oriented information about instructional approaches for learning objects, though a set of Educational Rationale [ER] tags which would allow authors to describe the critical elements in their design intent. The prototype ER tags describe activities which have been demonstrated to be of value in learning, and authors select the activities whose support was critical in their design decisions. The prototype ER tag set consists descriptors of the instructional approach used in the design, plus optional sub-elements for Comments, Importance and Features which implement the design intent. The tag set was tested by creators of four learning object modules, three intended for post-secondary learners and one for K-12 students and their families. In each case the creators reported that the ER tag set allowed them to express succinctly the key instructional approaches embedded in their designs. These results confirmed the overall feasibility of the ER tag approach as a means of capturing design intent from creators of learning objects. Much work remains to be done before a usable ER tag set could be specified, including evaluating the impact of ER tags during design to improve instructional quality of learning objects.

  11. Migration of the ATLAS Metadata Interface (AMI) to Web 2.0 and cloud

    CERN Document Server

    Odier, Jerome; The ATLAS collaboration; Fulachier, Jerome; Lambert, Fabian

    2015-01-01

    The ATLAS Metadata Interface (AMI) can be considered to be a mature application because it has existed for at least 10 years. Over the last year, we have been adapting the application to some recently available technologies. The web interface, which previously manipulated XML documents using XSL transformations, has been migrated to Asynchronous Java Script (AJAX). Web development has been considerably simplified by the development of a framework for AMI based on JQuery and Twitter Bootstrap. Finally there has been a major upgrade of the python web service client.

  12. Facilitating Centralized Access to Earth Science Imagery & Metadata with NASA GIBS

    Science.gov (United States)

    De Cesare, C.; Alarcon, C.; Huang, T.; Cechini, M. F.; Boller, R. A.

    2015-12-01

    NASA's Earth Observing System Data and Information System (EOSDIS)' Global Imagery Browse Services (GIBS) is a system that provides full resolution imagery from a broad set of Earth science disciplines to the public. Behind this service lies The Imagery Exchange (TIE), a workflow data management solution developed at the Jet Propulsion Laboratory. TIE is an Open Archival Information System responsible for orchestrating the workflow for acquisition, preparation, generation, and archiving of imagery to be served by the GIBS' web mapping tile service, OnEarth. Before GIBS & TIE, users could access this imagery & metadata only by making requests directly to different providers & DAACs. GIBS solves this problem by offering a centralized way to access this data. TIE provides GIBS a mash-up of data by supporting a variety of different interfaces, security protocols, and metadata standards that are found across different providers. This presentation will detail the challenges we've faced during the implementation of this data mash-up, and will highlight our current efforts to make GIBS a robust resource for our scientific community.

  13. VT Wireless Internet Service Providers 2007

    Data.gov (United States)

    Vermont Center for Geographic Information — (Link to Metadata) The VT Wireless Internet Service Provider (ISP) dataset (WISP2007) includes polygons depicting the extent of Vermont's WISP broadband system as of...

  14. From European Standard to User Service

    DEFF Research Database (Denmark)

    Jacobi, Ole Illum; Lind, Morten

    1997-01-01

    Today’s public administration and planning need access to proper spatial information. The tremendous growth in the area of maps and other geographically referenced databases increases the needs of the user as well as the supplier of information for an overview of the jungle of spatial data....... The answer to this need is a metadata service that gives relevant and up-to-date, at-your-fingertips information on available geographical datasets.As a result of the work in the standardization organizations, we are now, luckily, able to take the first steps towards an implementation of metadata services...... in the design of the next generation of metadata services.On the basis of recent Danish experiences with implementation of the CEN/TC 287 standard into a WWW Geographical Information metadata service, we will present and discuss some general issues: The conceptual strategy, the implementation of dataset...

  15. A prospective event-level analysis of condom use experiences following STI testing among patients in three US cities.

    Science.gov (United States)

    Crosby, Richard; Shrier, Lydia A; Charnigo, Richard J; Weathers, Chandra; Sanders, Stephanie A; Graham, Cynthia A; Milhausen, Robin; Yarber, William L

    2012-10-01

    This study prospectively assessed and compared the incidence of condom use errors/problems among clinic patients testing positive for one or more of 3 sexually transmitted diseases (STDs) and those testing negative. The study also identified event-level condom use errors associated with condom breakage and slippage during sex. Enrolled clinic patients (N = 928) were tested for 3 STDs, then patients electronically recorded sexual intercourse and condom use behaviors daily for up to 6 months. Data were available on condom use errors and problems for the >10,000 sex events involving condoms. Assessed errors/problems were as follows: (1) not using a new condom, (2) allowing condoms to contact sharp objects, (3) not using condoms from start to finish of sex, (4) condoms drying out, (5) erection loss during condom use, (6) breakage, (7) slippage during sex, and (8) slippage after sex. Because the event-level measures were correlated within individual, generalized estimation equation models were used for analyses. All 8 forms of errors/problems with condom use occurred, with varying levels of frequency, without significant differences by baseline STD status for either men or women. Condom breakage was associated with contact with sharp objects (P condom may not confer adequate protection. Problems found to be associated with condom breakage and slippage are potentially amenable to counseling interventions.

  16. Interoperable Solar Data and Metadata via LISIRD 3

    Science.gov (United States)

    Wilson, A.; Lindholm, D. M.; Pankratz, C. K.; Snow, M. A.; Woods, T. N.

    2015-12-01

    LISIRD 3 is a major upgrade of the LASP Interactive Solar Irradiance Data Center (LISIRD), which serves several dozen space based solar irradiance and related data products to the public. Through interactive plots, LISIRD 3 provides data browsing supported by data subsetting and aggregation. Incorporating a semantically enabled metadata repository, LISIRD 3 users see current, vetted, consistent information about the datasets offered. Users can now also search for datasets based on metadata fields such as dataset type and/or spectral or temporal range. This semantic database enables metadata browsing, so users can discover the relationships between datasets, instruments, spacecraft, mission and PI. The database also enables creation and publication of metadata records in a variety of formats, such as SPASE or ISO, making these datasets more discoverable. The database also enables the possibility of a public SPARQL endpoint, making the metadata browsable in an automated fashion. LISIRD 3's data access middleware, LaTiS, provides dynamic, on demand reformatting of data and timestamps, subsetting and aggregation, and other server side functionality via a RESTful OPeNDAP compliant API, enabling interoperability between LASP datasets and many common tools. LISIRD 3's templated front end design, coupled with the uniform data interface offered by LaTiS, allows easy integration of new datasets. Consequently the number and variety of datasets offered by LISIRD has grown to encompass several dozen, with many more to come. This poster will discuss design and implementation of LISIRD 3, including tools used, capabilities enabled, and issues encountered.

  17. Meta-Data Objects as the Basis for System Evolution

    CERN Document Server

    Estrella, Florida; Tóth, N; Kovács, Z; Le Goff, J M; Clatchey, Richard Mc; Toth, Norbert; Kovacs, Zsolt; Goff, Jean-Marie Le

    2001-01-01

    One of the main factors driving object-oriented software development in the Web- age is the need for systems to evolve as user requirements change. A crucial factor in the creation of adaptable systems dealing with changing requirements is the suitability of the underlying technology in allowing the evolution of the system. A reflective system utilizes an open architecture where implicit system aspects are reified to become explicit first-class (meta-data) objects. These implicit system aspects are often fundamental structures which are inaccessible and immutable, and their reification as meta-data objects can serve as the basis for changes and extensions to the system, making it self- describing. To address the evolvability issue, this paper proposes a reflective architecture based on two orthogonal abstractions - model abstraction and information abstraction. In this architecture the modeling abstractions allow for the separation of the description meta-data from the system aspects they represent so that th...

  18. Metadata for fine-grained processing at ATLAS

    CERN Document Server

    Cranshaw, Jack; The ATLAS collaboration

    2016-01-01

    High energy physics experiments are implementing highly parallel solutions for event processing on resources that support concurrency at multiple levels. These range from the inherent large-scale parallelism of HPC resources to the multiprocessing and multithreading needed for effective use of multi-core and GPU-augmented nodes. Such modes of processing, and the efficient opportunistic use of transiently-available resources, lead to finer-grained processing of event data. Previously metadata systems were tailored to jobs that were atomic and processed large, well-defined units of data. The new environment requires a more fine-grained approach to metadata handling, especially with regard to bookkeeping. For opportunistic resources metadata propagation needs to work even if individual jobs are not finalized. This contribution describes ATLAS solutions to this problem in the context of the multiprocessing framework currently in use for LHC Run 2, development underway for the ATLAS multithreaded framework (Athena...

  19. Statistical Data Processing with R – Metadata Driven Approach

    Directory of Open Access Journals (Sweden)

    Rudi SELJAK

    2016-06-01

    Full Text Available In recent years the Statistical Office of the Republic of Slovenia has put a lot of effort into re-designing its statistical process. We replaced the classical stove-pipe oriented production system with general software solutions, based on the metadata driven approach. This means that one general program code, which is parametrized with process metadata, is used for data processing for a particular survey. Currently, the general program code is entirely based on SAS macros, but in the future we would like to explore how successfully statistical software R can be used for this approach. Paper describes the metadata driven principle for data validation, generic software solution and main issues connected with the use of statistical software R for this approach.

  20. EU Law and Mass Internet Metadata Surveillance in the Post-Snowden Era

    Directory of Open Access Journals (Sweden)

    Nora Ni Loideain

    2015-09-01

    Full Text Available Legal frameworks exist within democracies to prevent the misuse and abuse of personal data that law enforcement authorities obtain from private communication service providers. The fundamental rights to respect for private life and the protection of personal data underpin this framework within the European Union. Accordingly, the protection of the principles and safeguards required by these rights is key to ensuring that the oversight of State surveillance powers is robust and transparent. Furthermore, without the robust scrutiny of independent judicial review, the principles and safeguards guaranteed by these rights may become more illusory than real. Following the Edward Snowden revelations, major concerns have been raised worldwide regarding the legality, necessity and proportionality standards governing these laws. In 2014, the highest court in the EU struck down the legal framework that imposed a mandatory duty on communication service providers to undertake the mass retention of metadata for secret intelligence and law enforcement authorities across the EU. This article considers the influence of the Snowden revelations on this landmark judgment. Subsequently, the analysis explores the significance of this ruling for the future reform of EU law governing metadata surveillance and its contribution to the worldwide debate on indiscriminate and covert monitoring in the post-Snowden era.

  1. Large-Scale Data Collection Metadata Management at the National Computation Infrastructure

    Science.gov (United States)

    Wang, J.; Evans, B. J. K.; Bastrakova, I.; Ryder, G.; Martin, J.; Duursma, D.; Gohar, K.; Mackey, T.; Paget, M.; Siddeswara, G.

    2014-12-01

    Data Collection management has become an essential activity at the National Computation Infrastructure (NCI) in Australia. NCI's partners (CSIRO, Bureau of Meteorology, Australian National University, and Geoscience Australia), supported by the Australian Government and Research Data Storage Infrastructure (RDSI), have established a national data resource that is co-located with high-performance computing. This paper addresses the metadata management of these data assets over their lifetime. NCI manages 36 data collections (10+ PB) categorised as earth system sciences, climate and weather model data assets and products, earth and marine observations and products, geosciences, terrestrial ecosystem, water management and hydrology, astronomy, social science and biosciences. The data is largely sourced from NCI partners, the custodians of many of the national scientific records, and major research community organisations. The data is made available in a HPC and data-intensive environment - a ~56000 core supercomputer, virtual labs on a 3000 core cloud system, and data services. By assembling these large national assets, new opportunities have arisen to harmonise the data collections, making a powerful cross-disciplinary resource.To support the overall management, a Data Management Plan (DMP) has been developed to record the workflows, procedures, the key contacts and responsibilities. The DMP has fields that can be exported to the ISO19115 schema and to the collection level catalogue of GeoNetwork. The subset or file level metadata catalogues are linked with the collection level through parent-child relationship definition using UUID. A number of tools have been developed that support interactive metadata management, bulk loading of data, and support for computational workflows or data pipelines. NCI creates persistent identifiers for each of the assets. The data collection is tracked over its lifetime, and the recognition of the data providers, data owners, data

  2. Facilitating the production of ISO-compliant metadata of geospatial datasets

    Science.gov (United States)

    Giuliani, Gregory; Guigoz, Yaniss; Lacroix, Pierre; Ray, Nicolas; Lehmann, Anthony

    2016-02-01

    Metadata are recognized as an essential element to enable efficient and effective discovery of geospatial data published in spatial data infrastructures (SDI). However, metadata production is still perceived as a complex, tedious and time-consuming task. This typically results in little metadata production and can seriously hinder the objective of facilitating data discovery. In response to this issue, this paper presents a proof of concept based on an interoperable workflow between a data publication server and a metadata catalog to automatically generate ISO-compliant metadata. The proposed approach facilitates metadata creation by embedding this task in daily data management workflows; ensures that data and metadata are permanently up-to-date; significantly reduces the obstacles of metadata production; and potentially facilitates contributions to initiatives like the Global Earth Observation System of Systems (GEOSS) by making geospatial resources discoverable.

  3. Linked data for libraries, archives and museums how to clean, link and publish your metadata

    CERN Document Server

    Hooland, Seth van

    2014-01-01

    This highly practical handbook teaches you how to unlock the value of your existing metadata through cleaning, reconciliation, enrichment and linking and how to streamline the process of new metadata creation. Libraries, archives and museums are facing up to the challenge of providing access to fast growing collections whilst managing cuts to budgets. Key to this is the creation, linking and publishing of good quality metadata as Linked Data that will allow their collections to be discovered, accessed and disseminated in a sustainable manner. This highly practical handbook teaches you how to unlock the value of your existing metadata through cleaning, reconciliation, enrichment and linking and how to streamline the process of new metadata creation. Metadata experts Seth van Hooland and Ruben Verborgh introduce the key concepts of metadata standards and Linked Data and how they can be practically applied to existing metadata, giving readers the tools and understanding to achieve maximum results with limited re...

  4. A novel framework for assessing metadata quality in epidemiological and public health research settings.

    Science.gov (United States)

    McMahon, Christiana; Denaxas, Spiros

    2016-01-01

    Metadata are critical in epidemiological and public health research. However, a lack of biomedical metadata quality frameworks and limited awareness of the implications of poor quality metadata renders data analyses problematic. In this study, we created and evaluated a novel framework to assess metadata quality of epidemiological and public health research datasets. We performed a literature review and surveyed stakeholders to enhance our understanding of biomedical metadata quality assessment. The review identified 11 studies and nine quality dimensions; none of which were specifically aimed at biomedical metadata. 96 individuals completed the survey; of those who submitted data, most only assessed metadata quality sometimes, and eight did not at all. Our framework has four sections: a) general information; b) tools and technologies; c) usability; and d) management and curation. We evaluated the framework using three test cases and sought expert feedback. The framework can assess biomedical metadata quality systematically and robustly.

  5. Definition of a CDI metadata profile and its ISO 19139 based encoding

    Science.gov (United States)

    Boldrini, Enrico; de Korte, Arjen; Santoro, Mattia; Schaap, Dick M. A.; Nativi, Stefano; Manzella, Giuseppe

    2010-05-01

    The Common Data Index (CDI) is the middleware service adopted by SeaDataNet for discovery and query. The primary goal of the EU funded project SeaDataNet is to develop a system which provides transparent access to marine data sets and data products from 36 countries in and around Europe. The European context of SeaDataNet requires that the developed system complies with European Directive INSPIRE. In order to assure the required conformity a GI-cat based solution is proposed. GI-cat is a broker service able to mediate from different metadata sources and publish them through a consistent and unified interface. In this case GI-cat is used as a front end to the SeaDataNet portal publishing the original data, based on CDI v.1 XML schema, through an ISO 19139 application profile catalog interface (OGC CSW AP ISO). The choice of ISO 19139 is supported and driven by INSPIRE Implementing Rules, that have been used as a reference through the whole development process. A mapping from the CDI data model to the ISO 19139 was hence to be implemented in GI-cat and a first draft quickly developed, as both CDI v.1 and ISO 19139 happen to be XML implementations based on the same abstract data model (standard ISO 19115 - metadata about geographic information). This first draft mapping pointed out the CDI metadata model differences with respect to ISO 19115, as it was not possible to accommodate all the information contained in CDI v.1 into ISO 19139. Moreover some modifications were needed in order to reach INSPIRE compliance. The consequent work consisted in the definition of the CDI metadata model as a profile of ISO 19115. This included checking of all the metadata elements present in CDI and their cardinality. A comparison was made with respect to ISO 19115 and possible extensions were individuated. ISO 19139 was then chosen as a natural XML implementation of this new CDI metadata profile. The mapping and the profile definition processes were iteratively refined leading up to a

  6. Alcohol Mixed with Energy Drink Use as an Event-Level Predictor of Physical and Verbal Aggression in Bar Conflicts.

    Science.gov (United States)

    Miller, Kathleen E; Quigley, Brian M; Eliseo-Arras, Rebecca K; Ball, Natalie J

    2016-01-01

    Young adult use of alcohol mixed with caffeinated energy drinks (AmEDs) has been globally linked with increased odds of interpersonal aggression, compared with the use of alcohol alone. However, no prior research has linked these behaviors at the event level in bar drinking situations. The present study assessed whether AmED use is associated with the perpetration of verbal and physical aggression in bar conflicts at the event level. In Fall 2014, a community sample of 175 young adult AmED users (55% female) completed a web survey describing a recent conflict experienced while drinking in a bar. Use of both AmED and non-AmED alcoholic drinks in the incident were assessed, allowing calculation of our main predictor variable, the proportion of AmEDs consumed (AmED/total drinks consumed). To measure perpetration of aggression, participants reported on the occurrence of 6 verbal and 6 physical acts during the bar conflict incident. Linear regression analyses showed that the proportion of AmEDs consumed predicted scores for perpetration of both verbal aggression (β = 0.16, p bar environments, and total number of drinks. Results of this study suggest that in alcohol-related bar conflicts, higher levels of young adult AmED use are associated with higher levels of aggression perpetration than alcohol use alone and that the elevated risk is not attributable to individual differences between AmED users and nonusers or to contextual differences in bar drinking settings. While future research is needed to identify motivations, dosages, and sequencing issues associated with AmED use, these beverages should be considered a potential risk factor in the escalation of aggressive bar conflicts. Copyright © 2016 by the Research Society on Alcoholism.

  7. Overview of long-term field experiments in Germany - metadata visualization

    Science.gov (United States)

    Muqit Zoarder, Md Abdul; Heinrich, Uwe; Svoboda, Nikolai; Grosse, Meike; Hierold, Wilfried

    2017-04-01

    BonaRes ("soil as a sustainable resource for the bioeconomy") is conducting to collect data and metadata of agricultural long-term field experiments (LTFE) of Germany. It is funded by the German Federal Ministry of Education and Research (BMBF) under the umbrella of the National Research Strategy BioEconomy 2030. BonaRes consists of ten interdisciplinary research project consortia and the 'BonaRes - Centre for Soil Research'. BonaRes Data Centre is responsible for collecting all LTFE data and regarding metadata into an enterprise database upon higher level of security and visualization of the data and metadata through data portal. In the frame of the BonaRes project, we are compiling an overview of long-term field experiments in Germany that is based on a literature review, the results of the online survey and direct contacts with LTFE operators. Information about research topic, contact person, website, experiment setup and analyzed parameters are collected. Based on the collected LTFE data, an enterprise geodatabase is developed and a GIS-based web-information system about LTFE in Germany is also settled. Various aspects of the LTFE, like experiment type, land-use type, agricultural category and duration of experiment, are presented in thematic maps. This information system is dynamically linked to the database, which means changes in the data directly affect the presentation. An easy data searching option using LTFE name, -location or -operators and the dynamic layer selection ensure a user-friendly web application. Dispersion and visualization of the overlapping LTFE points on the overview map are also challenging and we make it automatized at very zoom level which is also a consistent part of this application. The application provides both, spatial location and meta-information of LTFEs, which is backed-up by an enterprise geodatabase, GIS server for hosting map services and Java script API for web application development.

  8. A Window to the World: Lessons Learned from NASA's Collaborative Metadata Curation Effort

    Science.gov (United States)

    Bugbee, K.; Dixon, V.; Baynes, K.; Shum, D.; le Roux, J.; Ramachandran, R.

    2017-12-01

    Well written descriptive metadata adds value to data by making data easier to discover as well as increases the use of data by providing the context or appropriateness of use. While many data centers acknowledge the importance of correct, consistent and complete metadata, allocating resources to curate existing metadata is often difficult. To lower resource costs, many data centers seek guidance on best practices for curating metadata but struggle to identify those recommendations. In order to assist data centers in curating metadata and to also develop best practices for creating and maintaining metadata, NASA has formed a collaborative effort to improve the Earth Observing System Data and Information System (EOSDIS) metadata in the Common Metadata Repository (CMR). This effort has taken significant steps in building consensus around metadata curation best practices. However, this effort has also revealed gaps in EOSDIS enterprise policies and procedures within the core metadata curation task. This presentation will explore the mechanisms used for building consensus on metadata curation, the gaps identified in policies and procedures, the lessons learned from collaborating with both the data centers and metadata curation teams, and the proposed next steps for the future.

  9. Competence Based Educational Metadata for Supporting Lifelong Competence Development Programmes

    NARCIS (Netherlands)

    Sampson, Demetrios; Fytros, Demetrios

    2008-01-01

    Sampson, D., & Fytros, D. (2008). Competence Based Educational Metadata for Supporting Lifelong Competence Development Programmes. In P. Diaz, Kinshuk, I. Aedo & E. Mora (Eds.), Proceedings of the 8th IEEE International Conference on Advanced Learning Technologies (ICALT 2008), pp. 288-292. July,

  10. The evolution of chondrichthyan research through a metadata ...

    African Journals Online (AJOL)

    We compiled metadata from Sharks Down Under (1991) and the two Sharks International conferences (2010 and 2014), spanning 23 years. Analysis of the data highlighted taxonomic biases towards charismatic species, a declining number of studies in fundamental science such as those related to taxonomy and basic life ...

  11. Standardizing metadata and taxonomic identification in metabarcoding studies

    NARCIS (Netherlands)

    Tedersoo, Leho; Ramirez, Kelly; Nilsson, R; Kaljuvee, Aivi; Koljalg, Urmas; Abarenkov, Kessy

    2015-01-01

    High-throughput sequencing-based metabarcoding studies produce vast amounts of ecological data, but a lack of consensus on standardization of metadata and how to refer to the species recovered severely hampers reanalysis and comparisons among studies. Here we propose an automated workflow covering

  12. Metadata for data rescue and data at risk

    Science.gov (United States)

    Anderson, William L.; Faundeen, John L.; Greenberg, Jane; Taylor, Fraser

    2011-01-01

    Scientific data age, become stale, fall into disuse and run tremendous risks of being forgotten and lost. These problems can be addressed by archiving and managing scientific data over time, and establishing practices that facilitate data discovery and reuse. Metadata documentation is integral to this work and essential for measuring and assessing high priority data preservation cases. The International Council for Science: Committee on Data for Science and Technology (CODATA) has a newly appointed Data-at-Risk Task Group (DARTG), participating in the general arena of rescuing data. The DARTG primary objective is building an inventory of scientific data that are at risk of being lost forever. As part of this effort, the DARTG is testing an approach for documenting endangered datasets. The DARTG is developing a minimal and easy to use set of metadata properties for sufficiently describing endangered data, which will aid global data rescue missions. The DARTG metadata framework supports rapid capture, and easy documentation, across an array of scientific domains. This paper reports on the goals and principles supporting the DARTG metadata schema, and provides a description of the preliminary implementation.

  13. A metadata schema for data objects in clinical research.

    Science.gov (United States)

    Canham, Steve; Ohmann, Christian

    2016-11-24

    A large number of stakeholders have accepted the need for greater transparency in clinical research and, in the context of various initiatives and systems, have developed a diverse and expanding number of repositories for storing the data and documents created by clinical studies (collectively known as data objects). To make the best use of such resources, we assert that it is also necessary for stakeholders to agree and deploy a simple, consistent metadata scheme. The relevant data objects and their likely storage are described, and the requirements for metadata to support data sharing in clinical research are identified. Issues concerning persistent identifiers, for both studies and data objects, are explored. A scheme is proposed that is based on the DataCite standard, with extensions to cover the needs of clinical researchers, specifically to provide (a) study identification data, including links to clinical trial registries; (b) data object characteristics and identifiers; and (c) data covering location, ownership and access to the data object. The components of the metadata scheme are described. The metadata schema is proposed as a natural extension of a widely agreed standard to fill a gap not tackled by other standards related to clinical research (e.g., Clinical Data Interchange Standards Consortium, Biomedical Research Integrated Domain Group). The proposal could be integrated with, but is not dependent on, other moves to better structure data in clinical research.

  14. Training and Best Practice Guidelines: Implications for Metadata Creation

    Science.gov (United States)

    Chuttur, Mohammad Y.

    2012-01-01

    In response to the rapid development of digital libraries over the past decade, researchers have focused on the use of metadata as an effective means to support resource discovery within online repositories. With the increasing involvement of libraries in digitization projects and the growing number of institutional repositories, it is anticipated…

  15. Metadata Schema Used in OCLC Sampled Web Pages

    Directory of Open Access Journals (Sweden)

    Fei Yu

    2005-12-01

    Full Text Available The tremendous growth of Web resources has made information organization and retrieval more and more difficult. As one approach to this problem, metadata schemas have been developed to characterize Web resources. However, many questions have been raised about the use of metadata schemas such as which metadata schemas have been used on the Web? How did they describe Web accessible information? What is the distribution of these metadata schemas among Web pages? Do certain schemas dominate the others? To address these issues, this study analyzed 16,383 Web pages with meta tags extracted from 200,000 OCLC sampled Web pages in 2000. It found that only 8.19% Web pages used meta tags; description tags, keyword tags, and Dublin Core tags were the only three schemas used in the Web pages. This article revealed the use of meta tags in terms of their function distribution, syntax characteristics, granularity of the Web pages, and the length distribution and word number distribution of both description and keywords tags.

  16. Transforming and enhancing metadata for enduser discovery: a case study

    Directory of Open Access Journals (Sweden)

    Edward M. Corrado

    2014-05-01

    The Libraries’ workflow and portions of code will be shared; issues and challenges involved will be discussed. While this case study is specific to Binghamton University Libraries, examples of strategies used at other institutions will also be introduced. This paper should be useful to anyone interested in describing large quantities of photographs or other materials with preexisting embedded metadata.

  17. Aspect oriented implementation of design patterns using metadata ...

    African Journals Online (AJOL)

    Aspect oriented programming extends object oriented programming by managing crosscutting concerns using aspects. Two of the most important critics of aspect oriented programming are the “tyranny of the dominant signature” and lack of visibility of program's flow. Metadata, in form of Java annotations, is a solution to ...

  18. Big Earth Data Initiative: Metadata Improvement: Case Studies

    Science.gov (United States)

    Kozimor, John; Habermann, Ted; Farley, John

    2016-01-01

    Big Earth Data Initiative (BEDI) The Big Earth Data Initiative (BEDI) invests in standardizing and optimizing the collection, management and delivery of U.S. Government's civil Earth observation data to improve discovery, access use, and understanding of Earth observations by the broader user community. Complete and consistent standard metadata helps address all three goals.

  19. ONEMercury: Towards Automatic Annotation of Earth Science Metadata

    Science.gov (United States)

    Tuarob, S.; Pouchard, L. C.; Noy, N.; Horsburgh, J. S.; Palanisamy, G.

    2012-12-01

    Earth sciences have become more data-intensive, requiring access to heterogeneous data collected from multiple places, times, and thematic scales. For example, research on climate change may involve exploring and analyzing observational data such as the migration of animals and temperature shifts across the earth, as well as various model-observation inter-comparison studies. Recently, DataONE, a federated data network built to facilitate access to and preservation of environmental and ecological data, has come to exist. ONEMercury has recently been implemented as part of the DataONE project to serve as a portal for discovering and accessing environmental and observational data across the globe. ONEMercury harvests metadata from the data hosted by multiple data repositories and makes it searchable via a common search interface built upon cutting edge search engine technology, allowing users to interact with the system, intelligently filter the search results on the fly, and fetch the data from distributed data sources. Linking data from heterogeneous sources always has a cost. A problem that ONEMercury faces is the different levels of annotation in the harvested metadata records. Poorly annotated records tend to be missed during the search process as they lack meaningful keywords. Furthermore, such records would not be compatible with the advanced search functionality offered by ONEMercury as the interface requires a metadata record be semantically annotated. The explosion of the number of metadata records harvested from an increasing number of data repositories makes it impossible to annotate the harvested records manually, urging the need for a tool capable of automatically annotating poorly curated metadata records. In this paper, we propose a topic-model (TM) based approach for automatic metadata annotation. Our approach mines topics in the set of well annotated records and suggests keywords for poorly annotated records based on topic similarity. We utilize the

  20. Provenance metadata gathering and cataloguing of EFIT++ code execution

    International Nuclear Information System (INIS)

    Lupelli, I.; Muir, D.G.; Appel, L.; Akers, R.; Carr, M.; Abreu, P.

    2015-01-01

    Highlights: • An approach for automatic gathering of provenance metadata has been presented. • A provenance metadata catalogue has been created. • The overhead in the code runtime is less than 10%. • The metadata/data size ratio is about ∼20%. • A visualization interface based on Gephi, has been presented. - Abstract: Journal publications, as the final product of research activity, are the result of an extensive complex modeling and data analysis effort. It is of paramount importance, therefore, to capture the origins and derivation of the published data in order to achieve high levels of scientific reproducibility, transparency, internal and external data reuse and dissemination. The consequence of the modern research paradigm is that high performance computing and data management systems, together with metadata cataloguing, have become crucial elements within the nuclear fusion scientific data lifecycle. This paper describes an approach to the task of automatically gathering and cataloguing provenance metadata, currently under development and testing at Culham Center for Fusion Energy. The approach is being applied to a machine-agnostic code that calculates the axisymmetric equilibrium force balance in tokamaks, EFIT++, as a proof of principle test. The proposed approach avoids any code instrumentation or modification. It is based on the observation and monitoring of input preparation, workflow and code execution, system calls, log file data collection and interaction with the version control system. Pre-processing, post-processing, and data export and storage are monitored during the code runtime. Input data signals are captured using a data distribution platform called IDAM. The final objective of the catalogue is to create a complete description of the modeling activity, including user comments, and the relationship between data output, the main experimental database and the execution environment. For an intershot or post-pulse analysis (∼1000

  1. A Metadata Standard for Hydroinformatic Data Conforming to International Standards

    Science.gov (United States)

    Notay, Vikram; Carstens, Georg; Lehfeldt, Rainer

    2017-04-01

    The affordable availability of computing power and digital storage has been a boon for the scientific community. The hydroinformatics community has also benefitted from the so-called digital revolution, which has enabled the tackling of more and more complex physical phenomena using hydroinformatic models, instruments, sensors, etc. With models getting more and more complex, computational domains getting larger and the resolution of computational grids and measurement data getting finer, a large amount of data is generated and consumed in any hydroinformatics related project. The ubiquitous availability of internet also contributes to this phenomenon with data being collected through sensor networks connected to telecommunications networks and the internet long before the term Internet of Things existed. Although generally good, this exponential increase in the number of available datasets gives rise to the need to describe this data in a standardised way to not only be able to get a quick overview about the data but to also facilitate interoperability of data from different sources. The Federal Waterways Engineering and Research Institute (BAW) is a federal authority of the German Federal Ministry of Transport and Digital Infrastructure. BAW acts as a consultant for the safe and efficient operation of the German waterways. As part of its consultation role, BAW operates a number of physical and numerical models for sections of inland and marine waterways. In order to uniformly describe the data produced and consumed by these models throughout BAW and to ensure interoperability with other federal and state institutes on the one hand and with EU countries on the other, a metadata profile for hydroinformatic data has been developed at BAW. The metadata profile is composed in its entirety using the ISO 19115 international standard for metadata related to geographic information. Due to the widespread use of the ISO 19115 standard in the existing geodata infrastructure

  2. Serious Games for Health: The Potential of Metadata.

    Science.gov (United States)

    Göbel, Stefan; Maddison, Ralph

    2017-02-01

    Numerous serious games and health games exist, either as commercial products (typically with a focus on entertaining a broad user group) or smaller games and game prototypes, often resulting from research projects (typically tailored to a smaller user group with a specific health characteristic). A major drawback of existing health games is that they are not very well described and attributed with (machine-readable, quantitative, and qualitative) metadata such as the characterizing goal of the game, the target user group, or expected health effects well proven in scientific studies. This makes it difficult or even impossible for end users to find and select the most appropriate game for a specific situation (e.g., health needs). Therefore, the aim of this article was to motivate the need and potential/benefit of metadata for the description and retrieval of health games and to describe a descriptive model for the qualitative description of games for health. It was not the aim of the article to describe a stable, running system (portal) for health games. This will be addressed in future work. Building on previous work toward a metadata format for serious games, a descriptive model for the formal description of games for health is introduced. For the conceptualization of this model, classification schemata of different existing health game repositories are considered. The classification schema consists of three levels: a core set of mandatory descriptive fields relevant for all games for health application areas, a detailed level with more comprehensive, optional information about the games, and so-called extension as level three with specific descriptive elements relevant for dedicated health games application areas, for example, cardio training. A metadata format provides a technical framework to describe, find, and select appropriate health games matching the needs of the end user. Future steps to improve, apply, and promote the metadata format in the health games

  3. Empirical Analysis of Errors on Human-Generated Learning Objects Metadata

    Science.gov (United States)

    Cechinel, Cristian; Sánchez-Alonso, Salvador; Sicilia, Miguel Ángel

    Learning object metadata is considered crucial for the right management of learning objects stored in public repositories. Search operations, in particular, rely on the quality of these metadata as an essential precondition for finding results adequate to users requirements and needs. However, learning object metadata are not always reliable, as many factors have a negative influence in metadata quality (human annotators not having the minimum skills, unvoluntary mistakes, lack of information, for instance). This paper analyses human-generated learning object metadata records described according to the IEEE LOM standard, identifies the most significant errors committed and points out which parts of the standard should be improved for the sake of quality.

  4. The ATLAS EventIndex: data flow and inclusion of other metadata

    CERN Document Server

    Prokoshin, Fedor; The ATLAS collaboration; Cardenas Zarate, Simon Ernesto; Favareto, Andrea; Fernandez Casani, Alvaro; Gallas, Elizabeth; Garcia Montoro, Carlos; Gonzalez de la Hoz, Santiago; Hrivnac, Julius; Malon, David; Salt, Jose; Sanchez, Javier; Toebbicke, Rainer; Yuan, Ruijun

    2016-01-01

    The ATLAS EventIndex is the catalogue of the event-related metadata for the information obtained from the ATLAS detector. The basic unit of this information is event record, containing the event identification parameters, pointers to the files containing this event as well as trigger decision information. The main use case for the EventIndex are the event picking, providing information for the Event Service and data consistency checks for large production campaigns. The EventIndex employs the Hadoop platform for data storage and handling, as well as a messaging system for the collection of information. The information for the EventIndex is collected both at Tier-0, when the data are first produced, and from the GRID, when various types of derived data are produced. The EventIndex uses various types of auxiliary information from other ATLAS sources for data collection and processing: trigger tables from the condition metadata database (COMA), dataset information from the data catalog AMI and the Rucio data man...

  5. The Ontological Perspectives of the Semantic Web and the Metadata Harvesting Protocol: Applications of Metadata for Improving Web Search.

    Science.gov (United States)

    Fast, Karl V.; Campbell, D. Grant

    2001-01-01

    Compares the implied ontological frameworks of the Open Archives Initiative Protocol for Metadata Harvesting and the World Wide Web Consortium's Semantic Web. Discusses current search engine technology, semantic markup, indexing principles of special libraries and online databases, and componentization and the distinction between data and…

  6. Alcohol Use, Partner Type, and Risky Sexual Behavior Among College Students: Findings from an Event-Level Study

    Science.gov (United States)

    Brown, Jennifer L.; Vanable, Peter A.

    2009-01-01

    Alcohol use is prevalent among college students and may contribute to elevated rates of sexual risk taking. Using event-level data, the hypothesis that partner type would moderate the effect of alcohol consumption on condom use was tested. Sexually active college students (N = 330; 67% female) reported on characteristics of their most recent sexual encounter, including partner type, alcohol use, and condom use, along with measures of sex-related alcohol expectancies, sensation seeking, and typical alcohol use. Unprotected vaginal sex (UVS) was reported by 39% of the sample and 32% reported alcohol use prior to sex. For the complete sample, UVS was just as likely for non-drinking events as for events involving alcohol use. However, for sexual encounters involving a non-steady partner, alcohol consumption was associated with an increase in UVS, whereas rates of UVS did not vary by drinking status for encounters involving a steady partner. These effects remained in analyses that controlled for sex-related alcohol expectancies, sensation seeking, and typical alcohol use. Findings confirm that the effects of alcohol vary according to the context in which it is used. PMID:17611038

  7. Treating metadata as annotations: separating the content markup from the content

    Directory of Open Access Journals (Sweden)

    Fredrik Paulsson

    2007-11-01

    Full Text Available The use of digital learning resources creates an increasing need for semantic metadata, describing the whole resource, as well as parts of resources. Traditionally, schemas such as Text Encoding Initiative (TEI have been used to add semantic markup for parts of resources. This is not sufficient for use in a ”metadata ecology”, where metadata is distributed, coherent to different Application Profiles, and added by different actors. A new methodology, where metadata is “pointed in” as annotations, using XPointers, and RDF is proposed. A suggestion for how such infrastructure can be implemented, using existing open standards for metadata, and for the web is presented. We argue that such methodology and infrastructure is necessary to realize the decentralized metadata infrastructure needed for a “metadata ecology".

  8. Metadata for the description of broadcast assets

    DEFF Research Database (Denmark)

    Efthimiadis, Efthimis N.; Mai, Jens Erik; Burrows, Paul E.

    2003-01-01

    The Corporation for Public Broadcasting (CPB) and public broadcasters consider Media Asset Management (MAM) of critical importance since without a concerted and cooperative plan to manage their vast library of content, broadcasters are unable to reach their potential for service in the digital ag...... is currently under development; the Disney Nomenclature Registry at Disney Corporation; and the ViDe (video access group) MPEG-7 and Dublin Core mapping....

  9. A network analysis using metadata to investigate innovation in clean-tech – Implications for energy policy

    International Nuclear Information System (INIS)

    Marra, Alessandro; Antonelli, Paola; Dell’Anna, Luca; Pozzi, Cesare

    2015-01-01

    Clean-technology (clean-tech) is a large and increasing sector. Research and development (R&D) is the lifeline of the industry and innovation is fostered by a plethora of high-tech start-ups and small and medium-sized enterprises (SMEs). Any empirical-based attempt to detect the pattern of technological innovation in the industry is challenging. This paper proposes an investigation of innovation in clean-tech using metadata provided by CrunchBase. Metadata reveal information on markets, products, services and technologies driving innovation in the clean-tech industry worldwide and for San Francisco, the leader in clean-tech innovation with more than two hundred specialised companies. A network analysis using metadata is the employed methodology and the main metrics of the resulting networks are discussed from an economic point of view. The purpose of the paper is to understand specifically specializations and technological complementarities underlying innovative companies, detect emerging industrial clusters at the global and local/metropolitan level and, finally, suggest a way to realize whether observed start-ups, SMEs and clusters follow a technological path of complementary innovation and market opportunity or, instead, present a risk of lock-in. The discussion of the results of the network analysis shows interesting implications for energy policy, particularly useful from an operational point of view. - Highlights: • Metadata provide information on companies' products and technologies. • A network analysis enables detection of specializations and complementarities. • An investigation of the network allows to identify emerging industrial clusters. • Metrics help to appreciate complementary innovation and market opportunity. • Results of the network analysis show interesting policy implications.

  10. Perception of intoxication in a field study of the night-time economy: Blood alcohol concentration, patron characteristics, and event-level predictors.

    Science.gov (United States)

    Kaestle, Christine E; Droste, Nicolas; Peacock, Amy; Bruno, Raimondo; Miller, Peter

    2018-01-01

    Determine the relationship of subjective intoxication to blood alcohol concentration (BAC) and examine whether patron and event-level characteristics modify the relationship of BAC to subjective intoxication. An in-situ systematic random sample of alcohol consumers attending night-time entertainment districts between 10pm and 3am on Friday and Saturday nights in five Australian cities completed a brief interview (n=4628). Participants reported age, sex, and pre-drinking, energy drink, tobacco, illicit stimulant and other illicit drug use that night, and their subjective intoxication and BAC were assessed. Male and female drinkers displayed equally low sensitivity to the impact of alcohol consumption when self-assessing their intoxication (BAC only explained 19% of variance). The marginal effect of BAC was not constant. At low BAC, participants were somewhat sensitive to increases in alcohol consumption, but at higher BAC levels that modest sensitivity dissipated (actual BAC had less impact on self-assessed intoxication). The slope ultimately leveled out to be non-responsive to additional alcohol intake. Staying out late, pre-drinking, and being young introduced biases resulting in higher self-assessed intoxication regardless of actual BAC. Further, both energy drinks and stimulant use modified the association between BAC and perceived intoxication, resulting in more compressed changes in self-assessment as BAC varies up or down, indicating less ability to perceive differences in BAC level. The ability of intoxicated patrons to detect further intoxication is impaired. Co-consumption of energy drinks and/or stimulant drugs is associated with impaired intoxication judgment, creating an additional challenge for the responsible service and consumption of alcohol. Copyright © 2017 Elsevier Ltd. All rights reserved.

  11. Event-level analysis of antecedents for youth violence: comparison of dating violence with non-dating violence.

    Science.gov (United States)

    Epstein-Ngo, Quyen M; Walton, Maureen A; Chermack, Stephen T; Blow, Frederic C; Zimmerman, Marc A; Cunningham, Rebecca M

    2014-01-01

    Dating violence (DV) has emerged as a major concern among youth with links to substance use, injuries, and death. The emergency department (ED) provides an opportunity for violence screening and prevention interventions. Additional data are needed regarding antecedents of DV versus non-dating violence (NDV; e.g., acquaintance, stranger) to develop ED-based violence interventions for youth. Participants were 575 patients screening positive for past 6-month drug use in an urban ED who completed timeline follow-back aggression modules at baseline and 6- and 12-months, indicating event-specific antecedents of violence. Multi-level logistic regressions using event-level data, nested by individual and time (i.e. baseline, 6- and 12-month assessment intervals), were used to examine antecedents of DV vs. NDV. Post-hoc analyses examined substance use×reasons and gender interactions. Prescription sedative/opioid misuse was more likely to be reported prior to DV whereas alcohol only, and co-ingested alcohol and marijuana only, were more likely to be reported prior to NDV. Reasons for DV included: "personal belongings", "angry/bad mood," "jealousy," "drunk/high on drugs" and "arguing about sex". Reasons for NDV included: "rumors," "retaliation," "personal space" and "aid (someone) due to physical attack". Substance use before/during conflicts and reasons for conflicts were both uniquely associated with DV versus NDV. Two gender interactions were found. ED based interventions for urban youth need to be tailored by gender, substance use (alcohol, cocaine, sedatives), reasons for violence, and type of violence (DV vs. NDV). © 2013.

  12. Alcohol Use and Unprotected Sex Among HIV-Infected Ugandan Adults: Findings from an Event-Level Study.

    Science.gov (United States)

    Woolf-King, Sarah E; Fatch, Robin; Cheng, Debbie M; Muyindike, Winnie; Ngabirano, Christine; Kekibiina, Allen; Emenyonu, Nneka; Hahn, Judith A

    2018-01-11

    While alcohol is a known risk factor for HIV infection in sub-Saharan Africa (SSA), studies designed to investigate the temporal relationship between alcohol use and unprotected sex are lacking. The purpose of this study was to determine whether alcohol used at the time of a sexual event is associated with unprotected sex at that same event. Data for this study were collected as part of two longitudinal studies of HIV-infected Ugandan adults. A structured questionnaire was administered at regularly scheduled cohort study visits in order to assess the circumstances (e.g., alcohol use, partner type) of the most recent sexual event (MRSE). Generalized estimating equation logistic regression models were used to examine the association between alcohol use (by the participant, the sexual partner, or both the participant and the partner) and the odds of unprotected sex at the sexual event while controlling for participant gender, age, months since HIV diagnosis, unhealthy alcohol use in the prior 3 months, partner type, and HIV status of partner. A total of 627 sexually active participants (57% women) reported 1817 sexual events. Of these events, 19% involved alcohol use and 53% were unprotected. Alcohol use by one's sexual partner (aOR 1.70; 95% CI 1.14, 2.54) or by both partners (aOR 1.78; 95% CI 1.07, 2.98) during the MRSE significantly increased the odds of unprotected sex at that same event. These results add to the growing event-level literature in SSA and support a temporal association between alcohol used prior to a sexual event and subsequent unprotected sex.

  13. The NERC DataGrid services.

    Science.gov (United States)

    Latham, S E; Cramer, R; Grant, M; Kershaw, P; Lawrence, B N; Lowry, R; Lowe, D; O'Neill, K; Miller, P; Pascoe, S; Pritchard, M; Snaith, H; Woolf, A

    2009-03-13

    This short paper outlines the key components of the NERC DataGrid: a discovery service, a vocabulary service and a software stack deployed both centrally to provide a data discovery portal, and at data providers to provide local portals and data and metadata services.

  14. Leveraging Python to improve ebook metadata selection, ingest, and management

    Directory of Open Access Journals (Sweden)

    Kelly Thompson

    2017-10-01

    Full Text Available Libraries face many challenges in managing descriptive metadata for ebooks, including quality control, completeness of coverage, and ongoing management. The recent emergence of library management systems that automatically provide descriptive metadata for e-resources activated in system knowledge bases means that ebook management models are moving toward both greater efficiency and more complex implementation and maintenance choices. Automated and data-driven processes for ebook management have always been desirable, but in the current environment, they become necessary. In addition to initial selection of a record source, automation can be applied to quality control processes and ongoing maintenance in order to keep manual, eyes-on work to a minimum while providing the best possible discovery and access. In this article, we describe how we are using Python scripts to address these challenges.

  15. Automatic Metadata Extraction - The High Energy Physics Use Case

    CERN Document Server

    Boyd, Joseph; Rajman, Martin

    Automatic metadata extraction (AME) of scientific papers has been described as one of the hardest problems in document engineering. Heterogeneous content, varying style, and unpredictable placement of article components render the problem inherently indeterministic. Conditional random fields (CRF), a machine learning technique, can be used to classify document metadata amidst this uncertainty, annotating document contents with semantic labels. High energy physics (HEP) papers, such as those written at CERN, have unique content and structural characteristics, with scientific collaborations of thousands of authors altering article layouts dramatically. The distinctive qualities of these papers necessitate the creation of specialised datasets and model features. In this work we build an unprecedented training set of HEP papers and propose and evaluate a set of innovative features for CRF models. We build upon state-of-the-art AME software, GROBID, a tool coordinating a hierarchy of CRF models in a full document ...

  16. Conditions and configuration metadata for the ATLAS experiment

    CERN Document Server

    Gallas, E J; Albrand, S; Fulachier, J; Lambert, F; Pachal, K E; Tseng, J C L; Zhang, Q

    2012-01-01

    In the ATLAS experiment, a system called COMA (Conditions/Configuration Metadata for ATLAS), has been developed to make globally important run-level metadata more readily accessible. It is based on a relational database storing directly extracted, refined, reduced, and derived information from system-specific database sources as well as information from non-database sources. This information facilitates a variety of unique dynamic interfaces and provides information to enhance the functionality of other systems. This presentation will give an overview of the components of the COMA system, enumerate its diverse data sources, and give examples of some of the interfaces it facilitates. We list important principles behind COMA schema and interface design, and how features of these principles create coherence and eliminate redundancy among the components of the overall system. In addition, we elucidate how interface logging data has been used to refine COMA content and improve the value and performance of end-user...

  17. Quality in Learning Objects: Evaluating Compliance with Metadata Standards

    Science.gov (United States)

    Vidal, C. Christian; Segura, N. Alejandra; Campos, S. Pedro; Sánchez-Alonso, Salvador

    Ensuring a certain level of quality of learning objects used in e-learning is crucial to increase the chances of success of automated systems in recommending or finding these resources. This paper aims to present a proposal for implementation of a quality model for learning objects based on ISO 9126 international standard for the evaluation of software quality. Features indicators associated with the conformance sub-characteristic are defined. Some instruments for feature evaluation are advised, which allow collecting expert opinion on evaluation items. Other quality model features are evaluated using only the information from its metadata using semantic web technologies. Finally, we propose an ontology-based application that allows automatic evaluation of a quality feature. IEEE LOM metadata standard was used in experimentation, and the results shown that most of learning objects analyzed do not complain the standard.

  18. Enhancing Media Personalization by Extracting Similarity Knowledge from Metadata

    DEFF Research Database (Denmark)

    Butkus, Andrius

    ) using Latent Semantic Analysis (one of the unsupervised machine learning techniques). It presents three separate cases to illustrate the similarity knowledge extraction from the metadata, where the emotional components in each case represents different abstraction levels – genres, synopsis and lyrics...... with many interrelated parts – recommendation engines, content metadata, contextual information and user profiles. In the center of any type of recommendation lies the notion of similarity. The most popular way to approach similarity is to look for the feature overlaps. This results often in recommending...... only “more of the same” type of content which does not necessarily lead to the meaningful personalization. Another way to approach similarity is to find a similar underlying meaning in the content. Aspects of meaning in media can be represented using Gardenfors Conceptual Spaces theory, which can...

  19. Indexing of ATLAS data management and analysis system metadata

    CERN Document Server

    Grigoryeva, Maria; The ATLAS collaboration

    2017-01-01

    This manuscript is devoted to the development of the system to manage metainformation of modern HENP experiments. The main purpose of the system is to provide scientists with transparent access to the actual and historical metadata related to data analysis, processing and modeling. The system design addresses the following goals : providing a flexible and fast search for metadata on various combinations of keywords, generating aggregated reports, categorized according to selected parameters, such as the studied physical process, scientific topic, physical group, etc. The article presents the architecture of the developed indexing and search system, as well as the results of performance tests. The comparison of the query execution speed within the developed system and in case of querying the original relational databases showed that the developed system provides results faster. Also the new system allows much more complex search requests, than the original storages.

  20. National Weather Service (NWS) Station Information System (SIS), Version 2

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — National Weather Service (NWS) Station Information System (SIS) contains observing station metadata from November 2016 to present. These are renditions are used for...

  1. EPOS Data and Service Provision

    Science.gov (United States)

    Bailo, Daniele; Jeffery, Keith G.; Atakan, Kuvvet; Harrison, Matt

    2017-04-01

    EPOS is now in IP (implementation phase) after a successful PP (preparatory phase). EPOS consists of essentially two components, one ICS (Integrated Core Services) representing the integrating ICT (Information and Communication Technology) and many TCS (Thematic Core Services) representing the scientific domains. The architecture developed, demonstrated and agreed within the project during the PP is now being developed utilising co-design with the TCS teams and agile, spiral methods within the ICS team. The 'heart' of EPOS is the metadata catalog. This provides for the ICS a digital representation of the TCS assets (services, data, software, equipment, expertise…) thus facilitating access, interoperation and (re-)use. A major part of the work has been interactions with the TCS. The original intention to harvest information from the TCS required (and still requires) discussions to understand fully the TCS organisational structures linked with rights, security and privacy; their (meta)data syntax (structure) and semantics (meaning); their workflows and methods of working and the services offered. To complicate matters further the TCS are each at varying stages of development and the ICS design has to accommodate pre-existing, developing and expected future standards for metadata, data, software and processes. Through information documents, questionnaires and interviews/meetings the EPOS ICS team has collected DDSS (Data, Data Products, Software and Services) information from the TCS. The ICS team developed a simplified metadata model for presentation to the TCS and the ICS team will perform the mapping and conversion from this model to the internal detailed technical metadata model using (CERIF: a EU recommendation to Member States maintained, developed and promoted by euroCRIS www.eurocris.org ). At the time of writing the final modifications of the EPOS metadata model are being made, and the mappings to CERIF designed, prior to the main phase of (meta)data

  2. Standardizing metadata and taxonomic identification in metabarcoding studies.

    Science.gov (United States)

    Tedersoo, Leho; Ramirez, Kelly S; Nilsson, R Henrik; Kaljuvee, Aivi; Kõljalg, Urmas; Abarenkov, Kessy

    2015-01-01

    High-throughput sequencing-based metabarcoding studies produce vast amounts of ecological data, but a lack of consensus on standardization of metadata and how to refer to the species recovered severely hampers reanalysis and comparisons among studies. Here we propose an automated workflow covering data submission, compression, storage and public access to allow easy data retrieval and inter-study communication. Such standardized and readily accessible datasets facilitate data management, taxonomic comparisons and compilation of global metastudies.

  3. Interactive multimedia ethnography: Archiving workflow, interface aesthetics and metadata

    OpenAIRE

    Matthews, P.; Aston, J.

    2012-01-01

    Digital heritage archives often lack engaging user interfaces that strike a balance between providing narrative context and affording user interaction and exploration. It seems nevertheless feasible for metadata tagging and a “joined up” workflow to provide a basis for this rich interaction. After outlining relevant research from within and outside the heritage domain, we present our project FINE (Fluid Interfaces for Narrative Exploration), an effort to develop such a system. Based on conten...

  4. ISO, FGDC, DIF and Dublin Core - Making Sense of Metadata Standards for Earth Science Data

    Science.gov (United States)

    Jones, P. R.; Ritchey, N. A.; Peng, G.; Toner, V. A.; Brown, H.

    2014-12-01

    Metadata standards provide common definitions of metadata fields for information exchange across user communities. Despite the broad adoption of metadata standards for Earth science data, there are still heterogeneous and incompatible representations of information due to differences between the many standards in use and how each standard is applied. Federal agencies are required to manage and publish metadata in different metadata standards and formats for various data catalogs. In 2014, the NOAA National Climatic data Center (NCDC) managed metadata for its scientific datasets in ISO 19115-2 in XML, GCMD Directory Interchange Format (DIF) in XML, DataCite Schema in XML, Dublin Core in XML, and Data Catalog Vocabulary (DCAT) in JSON, with more standards and profiles of standards planned. Of these standards, the ISO 19115-series metadata is the most complete and feature-rich, and for this reason it is used by NCDC as the source for the other metadata standards. We will discuss the capabilities of metadata standards and how these standards are being implemented to document datasets. Successful implementations include developing translations and displays using XSLTs, creating links to related data and resources, documenting dataset lineage, and establishing best practices. Benefits, gaps, and challenges will be highlighted with suggestions for improved approaches to metadata storage and maintenance.

  5. Precision Pointing Reconstruction and Geometric Metadata Generation for Cassini Images

    Science.gov (United States)

    French, Robert S.; Showalter, Mark R.; Gordon, Mitchell K.

    2014-11-01

    Analysis of optical remote sensing (ORS) data from the Cassini spacecraft is a complicated and labor-intensive process. First, small errors in Cassini’s pointing information (up to ~40 pixels for the Imaging Science Subsystem Narrow Angle Camera) must be corrected so that the line of sight vector for each pixel is known. This process involves matching the image contents with known features such as stars, ring edges, or moon limbs. Second, metadata for each pixel must be computed. Depending on the object under observation, this metadata may include lighting geometry, moon or planet latitude and longitude, and/or ring radius and longitude. Both steps require mastering the SPICE toolkit, a highly capable piece of software with a steep learning curve. Only after these steps are completed can the actual scientific investigation begin.We are embarking on a three-year project to perform these steps for all 300,000+ Cassini ISS images as well as images taken by the VIMS, UVIS, and CIRS instruments. The result will be a series of SPICE kernels that include accurate pointing information and a series of backplanes that include precomputed metadata for each pixel. All data will be made public through the PDS Rings Node (http://www.pds-rings.seti.org). We expect this project to dramatically decrease the time required for scientists to analyze Cassini data. In this poster we discuss the project, our current status, and our plans for the next three years.

  6. Embedding Metadata and Other Semantics in Word Processing Documents

    Directory of Open Access Journals (Sweden)

    Peter Sefton

    2009-10-01

    Full Text Available This paper describes a technique for embedding document metadata, and potentially other semantic references inline in word processing documents, which the authors have implemented with the help of a software development team. Several assumptions underly the approach; It must be available across computing platforms and work with both Microsoft Word (because of its user base and OpenOffice.org (because of its free availability. Further the application needs to be acceptable to and usable by users, so the initial implementation covers only small number of features, which will only be extended after user-testing. Within these constraints the system provides a mechanism for encoding not only simple metadata, but for inferring hierarchical relationships between metadata elements from a ‘flat’ word processing file.The paper includes links to open source code implementing the techniques as part of a broader suite of tools for academic writing. This addresses tools and software, semantic web and data curation, integrating curation into research workflows and will provide a platform for integrating work on ontologies, vocabularies and folksonomies into word processing tools.

  7. Accuracy VS Performance: Finding the Sweet Spot in the Geospatial Resolution of Satellite Metadata

    Science.gov (United States)

    Baskin, W. E.; Mangosing, D. C.; Rinsland, P. L.

    2010-12-01

    NASA’s Atmospheric Science Data Center (ASDC) and the Cloud-Aerosol LIDAR and Infrared Pathfinder Satellite Observation (CALIPSO) team at the NASA Langley Research Center recently collaborated in the development of a new CALIPSO Search and Subset web application. The web application is comprised of three elements: (1) A PostGIS-enabled PostgreSQL database system, which is used to store temporal and geospatial metadata from CALIPSO’s LIDAR, Infrared, and Wide Field Camera datasets, (2) the SciFlo engine, which is a data flow engine that enables semantic, scientific data flow executions in a grid or clustered network computational environment, and (3) PHP-based web application that incorporates some Web 2.0 / AJAX technologies used in the web interface. The search portion of the web application leverages geodetic indexing and search capabilities that became available in the February 2010 release of PostGIS version1.5. This presentation highlights the lessons learned in experimenting with various geospatial resolutions of CALIPSO’s LIDAR sensor ground track metadata. Details of the various spatial resolutions, spatial database schema designs, spatial indexing strategies, and performance results will be discussed. The focus will be on illustrating our findings on the spatial resolutions for ground track metadata that optimized search time and search accuracy in the CALIPSO Search and Subset Application. The CALIPSO satellite provides new insight into the role that clouds and atmospheric aerosols (airborne particles) play in regulating Earth's weather, climate, and air quality. CALIPSO combines an active LIDAR instrument with passive infrared and visible imagers to probe the vertical structure and properties of thin clouds and aerosols over the globe. The CALIPSO satellite was launched on April 28, 2006 and is part of the A-train satellite constellation. The ASDC in Langley’s Science Directorate leads NASA’s program for the processing, archival and

  8. An event-level analysis of adding exogenous lubricant to condoms in a sample of men who have vaginal sex with women.

    Science.gov (United States)

    Reece, Michael; Mark, Kristen; Herbenick, Debby; Hensel, Devon J; Jawed-Wessel, Sofia; Dodge, Brian

    2012-03-01

    Little is known about the characteristics of sexual events during which individuals choose to use lubricant with condoms. The aims of this article were to evaluate the determinants of adding lubricant to condoms during baseline and at the event level, to assess the event-level variables' influence on adding lubricants to condoms, and to assess the event-level influence of using condoms with lubricant on event-level condom attitudes. A total of 1,874 men completed a 30-day Internet-based prospective daily diary study of sexual behavior and condom use. Baseline data included demographic variables and information about condom education. Daily diary data included reports of penile-vaginal sex regarding intercourse duration, intercourse intensity, intoxication level, condom application method, partner contraceptive method, and partner and relationship characteristics. Lubricant was added to 24.3% of the study-provided condoms and 26.2% of the condoms selected by study participants. Those with more education and those who were married were more likely to add lubrication to condoms. Adding lubricant to condoms, a female partner putting the condom on with her hands and using contraception, and the event occurring with a wife (vs. girlfriend) was significantly associated with longer intercourse. Event-level lubricant and condom use significantly predicted lower willingness to buy the condom it was used with, as well as to recommend the condom. Adding exogenous lubricant was not related to the participants' confidence in condoms as a method to prevent pregnancy and sexually transmitted infections. The event-level nature of this study provided for a more comprehensive assessment of the situational factors that are associated with applying lubricant to condoms. Findings from this study suggest that men are adding lubricant to condoms for reasons other than to increase condom efficacy. © 2012 International Society for Sexual Medicine.

  9. Mining the GBT Metadata Archive: Statistics on Radio Frequency Use, 2002 - 2010

    Science.gov (United States)

    Blatnik, Michael; Clegg, A. W.; Beaudet, C.; Maddalena, R. J.

    2011-01-01

    The metadata from all standard archived Robert C. Byrd Green Bank Telescope (GBT) data have been mined to develop accurate and detailed statistics on radio frequency use. The purpose of the exercise is to help answer several long-standing questions within the radio astronomy spectrum management community: What fraction of observing time is actually spent within exclusive allocated radio astronomy bands, within bands shared with transmitting services, or in bands that have no allocation to the radio astronomy service? To answer these questions, we developed an automated data mining system (which leveraged existing data analysis tools) and applied it to 6 TB of files in the GBT data archive. The data spanned the range August 30, 2002 - July 24, 2010. Data acquired prior to that pre-dated the standard GBT "production-mode” archive format, and data acquired after that range will be added to the analysis on a yearly basis. GBT data from the antenna, receiver, IF, and other systems are acquired asynchronously in separate FITS files, which must then be time-matched and merged into a single data file to extract the necessary metadata. The automated system merged the FITS files and extracted 73 available parameters, such as sky frequency, bandwidth, azimuth, elevation, RA, DEC, system temperature, and many others. All data that generated full standard FITS data sets, including calibration and drift scans, were included in the statistics. Some observing modes, such as pulsar observations using custom backends without the parallel use of a standard GBT backend, could not be incorporated in the initial analysis. This presentation will summarize observing statistics that are relevant to spectrum management activities, and will provide answers to the above questions. Future work should include extending this analysis to other major radio astronomy facilities, such as Arecibo, the VLA, and the VLBA.

  10. Metadata Wizard: an easy-to-use tool for creating FGDC-CSDGM metadata for geospatial datasets in ESRI ArcGIS Desktop

    Science.gov (United States)

    Ignizio, Drew A.; O'Donnell, Michael S.; Talbert, Colin B.

    2014-01-01

    Creating compliant metadata for scientific data products is mandated for all federal Geographic Information Systems professionals and is a best practice for members of the geospatial data community. However, the complexity of the The Federal Geographic Data Committee’s Content Standards for Digital Geospatial Metadata, the limited availability of easy-to-use tools, and recent changes in the ESRI software environment continue to make metadata creation a challenge. Staff at the U.S. Geological Survey Fort Collins Science Center have developed a Python toolbox for ESRI ArcDesktop to facilitate a semi-automated workflow to create and update metadata records in ESRI’s 10.x software. The U.S. Geological Survey Metadata Wizard tool automatically populates several metadata elements: the spatial reference, spatial extent, geospatial presentation format, vector feature count or raster column/row count, native system/processing environment, and the metadata creation date. Once the software auto-populates these elements, users can easily add attribute definitions and other relevant information in a simple Graphical User Interface. The tool, which offers a simple design free of esoteric metadata language, has the potential to save many government and non-government organizations a significant amount of time and costs by facilitating the development of The Federal Geographic Data Committee’s Content Standards for Digital Geospatial Metadata compliant metadata for ESRI software users. A working version of the tool is now available for ESRI ArcDesktop, version 10.0, 10.1, and 10.2 (downloadable at http:/www.sciencebase.gov/metadatawizard).

  11. Data catalog project—A browsable, searchable, metadata system

    Energy Technology Data Exchange (ETDEWEB)

    Stillerman, Joshua, E-mail: jas@psfc.mit.edu [MIT Plasma Science and Fusion Center, Cambridge, MA (United States); Fredian, Thomas; Greenwald, Martin [MIT Plasma Science and Fusion Center, Cambridge, MA (United States); Manduchi, Gabriele [Consorzio RFX, Euratom-ENEA Association, Corso Stati Uniti 4, Padova 35127 (Italy)

    2016-11-15

    Modern experiments are typically conducted by large, extended groups, where researchers rely on other team members to produce much of the data they use. The experiments record very large numbers of measurements that can be difficult for users to find, access and understand. We are developing a system for users to annotate their data products with structured metadata, providing data consumers with a discoverable, browsable data index. Machine understandable metadata captures the underlying semantics of the recorded data, which can then be consumed by both programs, and interactively by users. Collaborators can use these metadata to select and understand recorded measurements. The data catalog project is a data dictionary and index which enables users to record general descriptive metadata, use cases and rendering information as well as providing them a transparent data access mechanism (URI). Users describe their diagnostic including references, text descriptions, units, labels, example data instances, author contact information and data access URIs. The list of possible attribute labels is extensible, but limiting the vocabulary of names increases the utility of the system. The data catalog is focused on the data products and complements process-based systems like the Metadata Ontology Provenance project [Greenwald, 2012; Schissel, 2015]. This system can be coupled with MDSplus to provide a simple platform for data driven display and analysis programs. Sites which use MDSplus can describe tree branches, and if desired create ‘processed data trees’ with homogeneous node structures for measurements. Sites not currently using MDSplus can either use the database to reference local data stores, or construct an MDSplus tree whose leaves reference the local data store. A data catalog system can provide a useful roadmap of data acquired from experiments or simulations making it easier for researchers to find and access important data and understand the meaning of the

  12. Data catalog project—A browsable, searchable, metadata system

    International Nuclear Information System (INIS)

    Stillerman, Joshua; Fredian, Thomas; Greenwald, Martin; Manduchi, Gabriele

    2016-01-01

    Modern experiments are typically conducted by large, extended groups, where researchers rely on other team members to produce much of the data they use. The experiments record very large numbers of measurements that can be difficult for users to find, access and understand. We are developing a system for users to annotate their data products with structured metadata, providing data consumers with a discoverable, browsable data index. Machine understandable metadata captures the underlying semantics of the recorded data, which can then be consumed by both programs, and interactively by users. Collaborators can use these metadata to select and understand recorded measurements. The data catalog project is a data dictionary and index which enables users to record general descriptive metadata, use cases and rendering information as well as providing them a transparent data access mechanism (URI). Users describe their diagnostic including references, text descriptions, units, labels, example data instances, author contact information and data access URIs. The list of possible attribute labels is extensible, but limiting the vocabulary of names increases the utility of the system. The data catalog is focused on the data products and complements process-based systems like the Metadata Ontology Provenance project [Greenwald, 2012; Schissel, 2015]. This system can be coupled with MDSplus to provide a simple platform for data driven display and analysis programs. Sites which use MDSplus can describe tree branches, and if desired create ‘processed data trees’ with homogeneous node structures for measurements. Sites not currently using MDSplus can either use the database to reference local data stores, or construct an MDSplus tree whose leaves reference the local data store. A data catalog system can provide a useful roadmap of data acquired from experiments or simulations making it easier for researchers to find and access important data and understand the meaning of the

  13. Galileo Declassified: IOV Spacecraft Metadata and Its Impact on Precise Orbit Determination

    Science.gov (United States)

    Dilssner, Florian; Schönemann, Erik; Springer, Tim; Flohrer, Claudia; Enderle, Werner

    2017-04-01

    In December 2016, shortly after the declaration of Galileo Initial Services, the European GNSS Agency (GSA) disclosed Galileo spacecraft metadata relevant to precise orbit determination (POD), such as antenna phase center parameters, dimensions of the solar panels and the main body, specularity and reflectivity coefficients for the surface materials, yaw attitude steering law, and signal group delays. The metadata relates to the first four operational Galileo satellites, known as the In-Orbit Validation (IOV) satellites, and is publicly available through the European GNSS Service Center (GSC) web site. One of the dataset's major benefits is that it includes nearly all information about the satellites' surface properties needed to develop a physically meaningful analytical solar radiation pressure (SRP) macro model, or "box-wing" (BW) model. Such a BW model for the IOV spacecraft has now been generated for use in NAPEOS, the European Space Operation Centre's (ESOC's) main geodetic software package for POD. The model represents the satellite as a simple six-sided box with two attached panels, or "wings", and allows for the a priori computation of the direct and indirect (Earth albedo) SRP force. Further valuable parameters of the metadata set are the IOV navigation antenna (NAVANT) phase center offsets (PCOs) and variations (PCVs) inferred from pre-launch anechoic chamber measurements. In this work, we report on the validation of the Galileo IOV metadata and its impact on POD, an activity ESOC has been deeply committed to since the launch of the first Galileo experimental satellite, GIOVE-A, in 2005. We first reanalyze the full history of Galileo tracking data the global International GNSS Service (IGS) network has collected since 2012. We generate orbit and clock solutions based on the widely used Empirical CODE Orbit Model (ECOM) with and without the IOV a priori BW model. For the satellite antennas, we apply the new as well as the standard IGS-recommended phase

  14. Improving Access to NASA Earth Science Data through Collaborative Metadata Curation

    Science.gov (United States)

    Sisco, A. W.; Bugbee, K.; Shum, D.; Baynes, K.; Dixon, V.; Ramachandran, R.

    2017-12-01

    The NASA-developed Common Metadata Repository (CMR) is a high-performance metadata system that currently catalogs over 375 million Earth science metadata records. It serves as the authoritative metadata management system of NASA's Earth Observing System Data and Information System (EOSDIS), enabling NASA Earth science data to be discovered and accessed by a worldwide user community. The size of the EOSDIS data archive is steadily increasing, and the ability to manage and query this archive depends on the input of high quality metadata to the CMR. Metadata that does not provide adequate descriptive information diminishes the CMR's ability to effectively find and serve data to users. To address this issue, an innovative and collaborative review process is underway to systematically improve the completeness, consistency, and accuracy of metadata for approximately 7,000 data sets archived by NASA's twelve EOSDIS data centers, or Distributed Active Archive Centers (DAACs). The process involves automated and manual metadata assessment of both collection and granule records by a team of Earth science data specialists at NASA Marshall Space Flight Center. The team communicates results to DAAC personnel, who then make revisions and reingest improved metadata into the CMR. Implementation of this process relies on a network of interdisciplinary collaborators leveraging a variety of communication platforms and long-range planning strategies. Curating metadata at this scale and resolving metadata issues through community consensus improves the CMR's ability to serve current and future users and also introduces best practices for stewarding the next generation of Earth Observing System data. This presentation will detail the metadata curation process, its outcomes thus far, and also share the status of ongoing curation activities.

  15. An Assessment of the Evolving Common Metadata Repository Standards for Airborne Field Campaigns

    Science.gov (United States)

    Northup, E. A.; Chen, G.; Early, A. B.; Beach, A. L., III; Walter, J.; Conover, H.

    2016-12-01

    The NASA Earth Venture Program has led to a dramatic increase in airborne observations, requiring updated data management practices with clearly defined data standards and protocols for metadata. While the current data management practices demonstrate some success in serving airborne science team data user needs, existing metadata models and standards such as NASA's Unified Metadata Model (UMM) for Collections (UMM-C) present challenges with respect to accommodating certain features of airborne science metadata. UMM is the model implemented in the Common Metadata Repository (CMR), which catalogs all metadata records for NASA's Earth Observing System Data and Information System (EOSDIS). One example of these challenges is with representation of spatial and temporal metadata. In addition, many airborne missions target a particular geophysical event, such as a developing hurricane. In such cases, metadata about the event is also important for understanding the data. While coverage of satellite missions is highly predictable based on orbit characteristics, airborne missions feature complicated flight patterns where measurements can be spatially and temporally discontinuous. Therefore, existing metadata models will need to be expanded for airborne measurements and sampling strategies. An Airborne Metadata Working Group was established under the auspices of NASA's Earth Science Data Systems Working Group (ESDSWG) to identify specific features of airborne metadata that can not be currently represented in the UMM and to develop new recommendations. The group includes representation from airborne data users and providers. This presentation will discuss the challenges and recommendations in an effort to demonstrate how airborne metadata curation/management can be improved to streamline data ingest and discoverability to a broader user community.

  16. Making Information Visible, Accessible, and Understandable: Meta-Data and Registries

    National Research Council Canada - National Science Library

    Robinson, Clay

    2007-01-01

    .... Expanded use of metadata leads to better-informed decision making, improved management of information, increased return on investment for digital asset production and publishing, improved security...

  17. Defense Virtual Library: Technical Metadata for the Long-Term Management of Digital Materials: Preliminary Guidelines

    National Research Council Canada - National Science Library

    Flynn, Marcy

    2002-01-01

    ... of the digital materials being preserved. This report, prepared by Silver Image Management (SIM), proposes technical metadata elements appropriate for digital objects in the Defense Virtual Library...

  18. Studies of Big Data metadata segmentation between relational and non-relational databases

    Science.gov (United States)

    Golosova, M. V.; Grigorieva, M. A.; Klimentov, A. A.; Ryabinkin, E. A.; Dimitrov, G.; Potekhin, M.

    2015-12-01

    In recent years the concepts of Big Data became well established in IT. Systems managing large data volumes produce metadata that describe data and workflows. These metadata are used to obtain information about current system state and for statistical and trend analysis of the processes these systems drive. Over the time the amount of the stored metadata can grow dramatically. In this article we present our studies to demonstrate how metadata storage scalability and performance can be improved by using hybrid RDBMS/NoSQL architecture.

  19. Studies of Big Data metadata segmentation between relational and non-relational databases

    CERN Document Server

    Golosova, M V; Klimentov, A A; Ryabinkin, E A; Dimitrov, G; Potekhin, M

    2015-01-01

    In recent years the concepts of Big Data became well established in IT. Systems managing large data volumes produce metadata that describe data and workflows. These metadata are used to obtain information about current system state and for statistical and trend analysis of the processes these systems drive. Over the time the amount of the stored metadata can grow dramatically. In this article we present our studies to demonstrate how metadata storage scalability and performance can be improved by using hybrid RDBMS/NoSQL architecture.

  20. The Theory and Implementation for Metadata in Digital Library/Museum

    Directory of Open Access Journals (Sweden)

    Hsueh-hua Chen

    1998-12-01

    Full Text Available Digital Libraries and Museums (DL/M have become one of the important research issues of Library and Information Science as well as other related fields. This paper describes the basic concepts of DL/M and briefly introduces the development of Taiwan Digital Museum Project. Based on the features of various collections, wediscuss how to maintain, to manage and to exchange metadata, especially from the viewpoint of users. We propose the draft of metadata, MICI (Metadata Interchange for Chinese Information , developed by ROSS (Resources Organization and SearchingSpecification team. Finally, current problems and future development of metadata will be touched.[Article content in Chinese

  1. Capturing Sensor Metadata for Cross-Domain Interoperability

    Science.gov (United States)

    Fredericks, J.

    2015-12-01

    Envision a world where a field operator turns on an instrument, and is queried for information needed to create standardized encoded descriptions that, together with the sensor manufacturer knowledge, fully describe the capabilities, limitations and provenance of observational data. The Cross-Domain Observational Metadata Environmental Sensing Network (X-DOMES) pilot project (with support from the NSF/EarthCube IA) is taking the first steps needed in realizing this vision. The knowledge of how an observable physical property becomes a measured observation must be captured at each stage of its creation. Each sensor-based observation is made through the use of applied technologies, each with specific limitations and capabilities. Environmental sensors typically provide a variety of options that can be configured differently for each unique deployment, affecting the observational results. By capturing the information (metadata) at each stage of its generation, a more complete and accurate description of data provenance can be communicated. By documenting the information in machine-harvestable, standards-based encodings, metadata can be shared across disciplinary and geopolitical boundaries. Using standards-based frameworks enables automated harvesting and translation to other community-adopted standards, which facilitates the use of shared tools and workflows. The establishment of a cross-domain network of stakeholders (sensor manufacturers, data providers, domain experts, data centers), called the X-DOMES Network, provides a unifying voice for the specification of content and implementation of standards, as well as a central repository for sensor profiles, vocabularies, guidance and product vetting. The ability to easily share fully described observational data provides a better understanding of data provenance and enables the use of common data processing and assessment workflows, fostering a greater trust in our shared global resources. The X-DOMES Network

  2. Standardized metadata for human pathogen/vector genomic sequences.

    Directory of Open Access Journals (Sweden)

    Vivien G Dugan

    Full Text Available High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs, the Bioinformatics Resource Centers (BRCs for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID, part of the National Institutes of Health (NIH, informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium's minimal information (MIxS and NCBI's BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI. The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will

  3. Radiological dose and metadata management; Radiologisches Dosis- und Metadatenmanagement

    Energy Technology Data Exchange (ETDEWEB)

    Walz, M.; Madsack, B. [TUeV SUeD Life Service GmbH, Aerztliche Stelle fuer Qualitaetssicherung in der Radiologie, Nuklearmedizin und Strahlentherapie Hessen, Frankfurt (Germany); Kolodziej, M. [INFINITT Europe GmbH, Frankfurt/M (Germany)

    2016-12-15

    This article describes the features of management systems currently available in Germany for extraction, registration and evaluation of metadata from radiological examinations, particularly in the digital imaging and communications in medicine (DICOM) environment. In addition, the probable relevant developments in this area concerning radiation protection legislation, terminology, standardization and information technology are presented. (orig.) [German] Dieser Artikel stellt die aktuell in Deutschland verfuegbaren Funktionen von Managementsystemen zur Erfassung und Auswertung von Metadaten zu radiologischen Untersuchungen insbesondere im DICOM-Umfeld (Digital Imaging and Communications in Medicine) vor. Ausserdem werden die in diesem Bereich voraussichtlich relevanten Entwicklungen von Strahlenschutzgesetzgebung ueber Terminologie und Standardisierung bis zu informationstechnischen Aspekten dargestellt. (orig.)

  4. Technical Evaluation Report 40: The International Learning Object Metadata Survey

    Directory of Open Access Journals (Sweden)

    Norm Friesen

    2004-11-01

    Full Text Available A wide range of projects and organizations is currently making digital learning resources (learning objects available to instructors, students, and designers via systematic, standards-based infrastructures. One standard that is central to many of these efforts and infrastructures is known as Learning Object Metadata (IEEE 1484.12.1-2002, or LOM. This report builds on Report #11 in this series, and discusses the findings of the author's recent study of ways in which the LOM standard is being used internationally.

  5. A Metadata based Knowledge Discovery Methodology for Seeding Translational Research.

    Science.gov (United States)

    Kothari, Cartik R; Payne, Philip R O

    2015-01-01

    In this paper, we present a semantic, metadata based knowledge discovery methodology for identifying teams of researchers from diverse backgrounds who can collaborate on interdisciplinary research projects: projects in areas that have been identified as high-impact areas at The Ohio State University. This methodology involves the semantic annotation of keywords and the postulation of semantic metrics to improve the efficiency of the path exploration algorithm as well as to rank the results. Results indicate that our methodology can discover groups of experts from diverse areas who can collaborate on translational research projects.

  6. Metadata Design in the New PDS4 Standards - Something for Everybody

    Science.gov (United States)

    Raugh, Anne C.; Hughes, John S.

    2015-11-01

    The Planetary Data System (PDS) archives, supports, and distributes data of diverse targets, from diverse sources, to diverse users. One of the core problems addressed by the PDS4 data standard redesign was that of metadata - how to accommodate the increasingly sophisticated demands of search interfaces, analytical software, and observational documentation into label standards without imposing limits and constraints that would impinge on the quality or quantity of metadata that any particular observer or team could supply. And yet, as an archive, PDS must have detailed documentation for the metadata in the labels it supports, or the institutional knowledge encoded into those attributes will be lost - putting the data at risk.The PDS4 metadata solution is based on a three-step approach. First, it is built on two key ISO standards: ISO 11179 "Information Technology - Metadata Registries", which provides a common framework and vocabulary for defining metadata attributes; and ISO 14721 "Space Data and Information Transfer Systems - Open Archival Information System (OAIS) Reference Model", which provides the framework for the information architecture that enforces the object-oriented paradigm for metadata modeling. Second, PDS has defined a hierarchical system that allows it to divide its metadata universe into namespaces ("data dictionaries", conceptually), and more importantly to delegate stewardship for a single namespace to a local authority. This means that a mission can develop its own data model with a high degree of autonomy and effectively extend the PDS model to accommodate its own metadata needs within the common ISO 11179 framework. Finally, within a single namespace - even the core PDS namespace - existing metadata structures can be extended and new structures added to the model as new needs are identifiedThis poster illustrates the PDS4 approach to metadata management and highlights the expected return on the development investment for PDS, users and data

  7. Ecosystem services project: Brigalow region of Queensland

    OpenAIRE

    University of Southern Queensland (USQ). Land Use Study Centre

    2006-01-01

    Metadata only record The goals of this project were to develop and apply the concepts and methods of ecosystem services and ecosystem health to the management of Brigalow remnants with a view to better identifying and quantifying the range of productive and non-productive values found in this ecosystem. PES-1 (Payments for Environmental Services Associate Award)

  8. ARIADNE: a tracking system for relationships in LHCb metadata

    International Nuclear Information System (INIS)

    Shapoval, I; Clemencic, M; Cattaneo, M

    2014-01-01

    The data processing model of the LHCb experiment implies handling of an evolving set of heterogeneous metadata entities and relationships between them. The entities range from software and databases states to architecture specificators and software/data deployment locations. For instance, there is an important relationship between the LHCb Conditions Database (CondDB), which provides versioned, time dependent geometry and conditions data, and the LHCb software, which is the data processing applications (used for simulation, high level triggering, reconstruction and analysis of physics data). The evolution of CondDB and of the LHCb applications is a weakly-homomorphic process. It means that relationships between a CondDB state and LHCb application state may not be preserved across different database and application generations. These issues may lead to various kinds of problems in the LHCb production, varying from unexpected application crashes to incorrect data processing results. In this paper we present Ariadne – a generic metadata relationships tracking system based on the novel NoSQL Neo4j graph database. Its aim is to track and analyze many thousands of evolving relationships for cases such as the one described above, and several others, which would otherwise remain unmanaged and potentially harmful. The highlights of the paper include the system's implementation and management details, infrastructure needed for running it, security issues, first experience of usage in the LHCb production and potential of the system to be applied to a wider set of LHCb tasks.

  9. ARIADNE: a Tracking System for Relationships in LHCb Metadata

    Science.gov (United States)

    Shapoval, I.; Clemencic, M.; Cattaneo, M.

    2014-06-01

    The data processing model of the LHCb experiment implies handling of an evolving set of heterogeneous metadata entities and relationships between them. The entities range from software and databases states to architecture specificators and software/data deployment locations. For instance, there is an important relationship between the LHCb Conditions Database (CondDB), which provides versioned, time dependent geometry and conditions data, and the LHCb software, which is the data processing applications (used for simulation, high level triggering, reconstruction and analysis of physics data). The evolution of CondDB and of the LHCb applications is a weakly-homomorphic process. It means that relationships between a CondDB state and LHCb application state may not be preserved across different database and application generations. These issues may lead to various kinds of problems in the LHCb production, varying from unexpected application crashes to incorrect data processing results. In this paper we present Ariadne - a generic metadata relationships tracking system based on the novel NoSQL Neo4j graph database. Its aim is to track and analyze many thousands of evolving relationships for cases such as the one described above, and several others, which would otherwise remain unmanaged and potentially harmful. The highlights of the paper include the system's implementation and management details, infrastructure needed for running it, security issues, first experience of usage in the LHCb production and potential of the system to be applied to a wider set of LHCb tasks.

  10. Metadata in Multiple Dialects and the Rosetta Stone

    Science.gov (United States)

    Habermann, T.; Monteleone, K.; Armstrong, E. M.; White, B.

    2012-12-01

    As data are shared across multiple communities and re-used in unexpected ways, it is critical to be able to share metadata about who collected and stewarded the data; where the data are available; how the data were collected and processed; and, how they were used in the past. It is even more critical that the new tools can access this information and present it in ways that new users can understand and, if necessary, integrate into their analyses. Unfortunately, as communities develop and use conventions for these metadata, it becomes more and more difficult to share them across community boundaries. This is true even though these conventions are really dialects of a common documentation language that share many important concepts. Breaking down these barriers requires developing community consensus about these concepts and tools for translating between common representations. Ontologies and connections between them have been used to address this problem across datasets from multiple disciplines. Can these tools help solve similar problems with documentation?

  11. Event metadata records as a testbed for scalable data mining

    International Nuclear Information System (INIS)

    Gemmeren, P van; Malon, D

    2010-01-01

    At a data rate of 200 hertz, event metadata records ('TAGs,' in ATLAS parlance) provide fertile grounds for development and evaluation of tools for scalable data mining. It is easy, of course, to apply HEP-specific selection or classification rules to event records and to label such an exercise 'data mining,' but our interest is different. Advanced statistical methods and tools such as classification, association rule mining, and cluster analysis are common outside the high energy physics community. These tools can prove useful, not for discovery physics, but for learning about our data, our detector, and our software. A fixed and relatively simple schema makes TAG export to other storage technologies such as HDF5 straightforward. This simplifies the task of exploiting very-large-scale parallel platforms such as Argonne National Laboratory's BlueGene/P, currently the largest supercomputer in the world for open science, in the development of scalable tools for data mining. Using a domain-neutral scientific data format may also enable us to take advantage of existing data mining components from other communities. There is, further, a substantial literature on the topic of one-pass algorithms and stream mining techniques, and such tools may be inserted naturally at various points in the event data processing and distribution chain. This paper describes early experience with event metadata records from ATLAS simulation and commissioning as a testbed for scalable data mining tool development and evaluation.

  12. Metadata In, Library Out. A Simple, Robust Digital Library System

    Directory of Open Access Journals (Sweden)

    Tonio Loewald

    2010-06-01

    Full Text Available Tired of being held hostage to expensive systems that did not meet our needs, the University of Alabama Libraries developed an XML schema-agnostic, light-weight digital library delivery system based on the principles of "Keep It Simple, Stupid!" Metadata and derivatives reside in openly accessible web directories, which support the development of web agents and new usability software, as well as modification and complete retrieval at any time. The file name structure is echoed in the file system structure, enabling the delivery software to make inferences about relationships, sequencing, and complex object structure without having to encapsulate files in complex metadata schemas. The web delivery system, Acumen, is built of PHP, JSON, JavaScript and HTML5, using MySQL to support fielded searching. Recognizing that spreadsheets are more user-friendly than XML, an accompanying widget, Archivists Utility, transforms spreadsheets into MODS based on rules selected by the user. Acumen, Archivists Utility, and all supporting software scripts will be made available as open source.

  13. iLOG: A Framework for Automatic Annotation of Learning Objects with Empirical Usage Metadata

    Science.gov (United States)

    Miller, L. D.; Soh, Leen-Kiat; Samal, Ashok; Nugent, Gwen

    2012-01-01

    Learning objects (LOs) are digital or non-digital entities used for learning, education or training commonly stored in repositories searchable by their associated metadata. Unfortunately, based on the current standards, such metadata is often missing or incorrectly entered making search difficult or impossible. In this paper, we investigate…

  14. An Assistant for Loading Learning Object Metadata: An Ontology Based Approach

    Science.gov (United States)

    Casali, Ana; Deco, Claudia; Romano, Agustín; Tomé, Guillermo

    2013-01-01

    In the last years, the development of different Repositories of Learning Objects has been increased. Users can retrieve these resources for reuse and personalization through searches in web repositories. The importance of high quality metadata is key for a successful retrieval. Learning Objects are described with metadata usually in the standard…

  15. Document Classification in Support of Automated Metadata Extraction Form Heterogeneous Collections

    Science.gov (United States)

    Flynn, Paul K.

    2014-01-01

    A number of federal agencies, universities, laboratories, and companies are placing their documents online and making them searchable via metadata fields such as author, title, and publishing organization. To enable this, every document in the collection must be catalogued using the metadata fields. Though time consuming, the task of identifying…

  16. The potential of metadata for linked open data and its value for users and publishers

    NARCIS (Netherlands)

    Zuiderwijk, A.M.G.; Jeffery, K.G.; Janssen, M.F.W.H.A.

    2012-01-01

    Public and private organizations increasingly release their data to gain benefits such as transparency and economic growth. The use of these open data can be supported and stimulated by providing considerable metadata (data about the data), including discovery, contextual and detailed metadata. In

  17. Characterization of Educational Resources in e-Learning Systems Using an Educational Metadata Profile

    Science.gov (United States)

    Solomou, Georgia; Pierrakeas, Christos; Kameas, Achilles

    2015-01-01

    The ability to effectively administrate educational resources in terms of accessibility, reusability and interoperability lies in the adoption of an appropriate metadata schema, able of adequately describing them. A considerable number of different educational metadata schemas can be found in literature, with the IEEE LOM being the most widely…

  18. Metadata in the changing learning environment: developing skills to achieve the blue skies

    Directory of Open Access Journals (Sweden)

    Juanita Foster-Jones

    2002-12-01

    Full Text Available This short paper will examine the importance of metadata and its role in the changing learning environment, beginning with an introduction about what metadata is, and the benefits to be gained from applying it to all academic resources. Two Open University projects, Portfolio and the Reusable Educational Software Library, will be described and used to illustrate how the IMS Learning Resource Metadata scheme is being applied, and the issues that have been encountered by the Open University and how it is attempting to resolve them. The need for change in organizational culture so that metadata becomes part of the creation process, rather than an afterthought, will then be discussed The paper concludes with a glimpse into the blue skies of the future - where all resources will have metadata as standard practice, and institutions can share and utilize their resources effectively.

  19. Review of Metadata Elements within the Web Pages Resulting from Searching in General Search Engines

    Directory of Open Access Journals (Sweden)

    Sima Shafi’ie Alavijeh

    2009-12-01

    Full Text Available The present investigation was aimed to study the scope of presence of Dublin Core metadata elements and HTML meta tags in web pages. Ninety web pages were chosen by searching general search engines (Google, Yahoo and MSN. The scope of metadata elements (Dublin Core and HTML Meta tags present in these pages as well as existence of a significant correlation between presence of meta elements and type of search engines were investigated. Findings indicated very low presence of both Dublin Core metadata elements and HTML meta tags in the pages retrieved which in turn illustrates the very low usage of meta data elements in web pages. Furthermore, findings indicated that there are no significant correlation between the type of search engine used and presence of metadata elements. From the standpoint of including metadata in retrieval of web sources, search engines do not significantly differ from one another.

  20. The Earthscope USArray Array Network Facility (ANF): Metadata, Network and Data Monitoring, Quality Assurance During the Second Year of Operations

    Science.gov (United States)

    Eakins, J. A.; Vernon, F. L.; Martynov, V.; Newman, R. L.; Cox, T. A.; Lindquist, K. L.; Hindley, A.; Foley, S.

    2005-12-01

    The Array Network Facility (ANF) for the Earthscope USArray Transportable Array seismic network is responsible for: the delivery of all Transportable Array stations (400 at full deployment) and telemetered Flexible Array stations (up to 200) to the IRIS Data Management Center; station command and control; verification and distribution of metadata; providing useful remotely accessible world wide web interfaces for personnel at the Array Operations Facility (AOF) to access state of health information; and quality control for all data. To meet these goals, we use the Antelope software package to facilitate data collection and transfer, generation and merging of the metadata, real-time monitoring of dataloggers, generation of station noise spectra, and analyst review of individual events. Recently, an Antelope extension to the PHP scripting language has been implemented which facilitates the dynamic presentation of the real-time data to local web pages. Metadata transfers have been simplified by the use of orb transfer technologies at the ANF and receiver end points. Web services are being investigated as a means to make a potentially complicated set of operations easy to follow and reproduce for each newly installed or decommissioned station. As part of the quality control process, daily analyst review has highlighted areas where neither the regional network bulletins nor the USGS global bulletin have published solutions. Currently four regional networks (Anza, BDSN, SCSN, and UNR) contribute data to the Transportable Array with additional contributors expected. The first 100 stations (42 new Earthscope stations) were operational by September 2005 with all but one of the California stations installed. By year's end, weather permitting, the total number of stations deployed is expected to be around 145. Visit http://anf.ucsd.edu for more information on the project and current status.

  1. Statistical Metadata Analysis of the Variability of Latency, Device Transfer Time, and Coordinate Position from Smartphone-Recorded Infrasound Data

    Science.gov (United States)

    Garces, E. L.; Garces, M. A.; Christe, A.

    2017-12-01

    The RedVox infrasound recorder app uses microphones and barometers in smartphones to record infrasound, low-frequency sound below the threshold of human hearing. We study a device's metadata, which includes position, latency time, the differences between the device's internal times and the server times, and the machine time, searching for patterns and possible errors or discontinuities in these scaled parameters. We highlight metadata variability through scaled multivariate displays (histograms, distribution curves, scatter plots), all created and organized through software development in Python. This project is helpful in ascertaining variability and honing the accuracy of smartphones, aiding the emergence of portable devices as viable geophysical data collection instruments. It can also improve the app and cloud service by increasing efficiency and accuracy, allowing to better document and foresee drastic natural movements like tsunamis, earthquakes, volcanic eruptions, storms, rocket launches, and meteor impacts; recorded data can later be used for studies and analysis by a variety of professions. We expect our final results to produce insight on how to counteract problematic issues in data mining and improve accuracy in smartphone data-collection. By eliminating lurking variables and minimizing the effect of confounding variables, we hope to discover efficient processes to reduce superfluous precision, unnecessary errors, and data artifacts. These methods should conceivably be transferable to other areas of software development, data analytics, and statistics-based experiments, contributing a precedent of smartphone metadata studies from geophysical rather than societal data. The results should facilitate the rise of civilian-accessible, hand-held, data-gathering mobile sensor networks and yield more straightforward data mining techniques.

  2. A Semantically Enabled Metadata Repository for Solar Irradiance Data Products

    Science.gov (United States)

    Wilson, A.; Cox, M.; Lindholm, D. M.; Nadiadi, I.; Traver, T.

    2014-12-01

    The Laboratory for Atmospheric and Space Physics, LASP, has been conducting research in Atmospheric and Space science for over 60 years, and providing the associated data products to the public. LASP has a long history, in particular, of making space-based measurements of the solar irradiance, which serves as crucial input to several areas of scientific research, including solar-terrestrial interactions, atmospheric, and climate. LISIRD, the LASP Interactive Solar Irradiance Data Center, serves these datasets to the public, including solar spectral irradiance (SSI) and total solar irradiance (TSI) data. The LASP extended metadata repository, LEMR, is a database of information about the datasets served by LASP, such as parameters, uncertainties, temporal and spectral ranges, current version, alerts, etc. It serves as the definitive, single source of truth for that information. The database is populated with information garnered via web forms and automated processes. Dataset owners keep the information current and verified for datasets under their purview. This information can be pulled dynamically for many purposes. Web sites such as LISIRD can include this information in web page content as it is rendered, ensuring users get current, accurate information. It can also be pulled to create metadata records in various metadata formats, such as SPASE (for heliophysics) and ISO 19115. Once these records are be made available to the appropriate registries, our data will be discoverable by users coming in via those organizations. The database is implemented as a RDF triplestore, a collection of instances of subject-object-predicate data entities identifiable with a URI. This capability coupled with SPARQL over HTTP read access enables semantic queries over the repository contents. To create the repository we leveraged VIVO, an open source semantic web application, to manage and create new ontologies and populate repository content. A variety of ontologies were used in

  3. Metadata與OAI-PMH在新聞數位典藏之整合應用 Integrated Application of Metadata and OAI-PMH to Digital News Archive

    Directory of Open Access Journals (Sweden)

    Sinn-Cheng Lin

    2005-09-01

    Full Text Available 本文探討所建置的台灣棒球新聞資料庫與數位典藏聯合目錄的後設資料整合問題。首先,我們設計一個後設資料轉換系統,可轉出DC、NITF、RSS和DAC 等四種國內外普遍盛行之後設資料格式;本系統不僅能支援後設資料自動轉換,也能衍生出RSS訂閱和聯合目錄大批匯出等應用。其次,基於OAI-PMH架 構,我們以此新聞資料庫作為Repository,建置符合OAI-PMH協定之Data Provider。它可供包含數位典藏聯合目錄在內的任何Service Provider查詢,然後進行命令集剖析,再以特定的後設資料格式回應資料集給Service Provider。就實驗結果而言,本系統可有效促進數位化新聞的傳播與分享。This paper studies the metadata integration problem between the Taiwan Baseball News Database and the Union Catalog of National Digital Archives Program. First, we design a metadata transformation system which can transform the news articles to four popular metadata formats: DC, NITF, RSS and DAC. The system not only supports metadata transformation but also RSS subscription and batch exportation for Union Catalog. Secondly, based on the OAI-PMH architecture, the digital news archives plays a role as Repository, and then we extend the system to be a Data Provider. It can be queried by any Service Provider, parse the Verbs, and then respond the Record Sets to the Service Provider with certain metadata format. As a result, the digital news can be delivered more efficient.

  4. GSO: Designing a Well-Founded Service Ontology to Support Dynamic Service Discovery and Composition

    NARCIS (Netherlands)

    Bonino da Silva Santos, L.O.; Guizzardi, G.; Guizzardi-Silva Souza, R.; Goncalves da Silva, Eduardo; Ferreira Pires, Luis; van Sinderen, Marten J.

    A pragmatic and straightforward approach to semantic service discovery is to match inputs and outputs of user requests with the input and output requirements of registered service descriptions. This approach can be extended by using pre-conditions, effects and semantic annotations (meta-data) in an

  5. Metadata behind the Interoperability of Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Miguel Angel Manso Callejo

    2009-05-01

    Full Text Available Wireless Sensor Networks (WSNs produce changes of status that are frequent, dynamic and unpredictable, and cannot be represented using a linear cause-effect approach. Consequently, a new approach is needed to handle these changes in order to support dynamic interoperability. Our approach is to introduce the notion of context as an explicit representation of changes of a WSN status inferred from metadata elements, which in turn, leads towards a decision-making process about how to maintain dynamic interoperability. This paper describes the developed context model to represent and reason over different WSN status based on four types of contexts, which have been identified as sensing, node, network and organisational contexts. The reasoning has been addressed by developing contextualising and bridges rules. As a result, we were able to demonstrate how contextualising rules have been used to reason on changes of WSN status as a first step towards maintaining dynamic interoperability.

  6. Visualizing and Validating Metadata Traceability within the CDISC Standards

    Science.gov (United States)

    Hume, Sam; Sarnikar, Surendra; Becnel, Lauren; Bennett, Dorine

    2017-01-01

    The Food & Drug Administration has begun requiring that electronic submissions of regulated clinical studies utilize the Clinical Data Information Standards Consortium data standards. Within regulated clinical research, traceability is a requirement and indicates that the analysis results can be traced back to the original source data. Current solutions for clinical research data traceability are limited in terms of querying, validation and visualization capabilities. This paper describes (1) the development of metadata models to support computable traceability and traceability visualizations that are compatible with industry data standards for the regulated clinical research domain, (2) adaptation of graph traversal algorithms to make them capable of identifying traceability gaps and validating traceability across the clinical research data lifecycle, and (3) development of a traceability query capability for retrieval and visualization of traceability information. PMID:28815125

  7. QualityML: a dictionary for quality metadata encoding

    Science.gov (United States)

    Ninyerola, Miquel; Sevillano, Eva; Serral, Ivette; Pons, Xavier; Zabala, Alaitz; Bastin, Lucy; Masó, Joan

    2014-05-01

    The scenario of rapidly growing geodata catalogues requires tools focused on facilitate users the choice of products. Having quality fields populated in metadata allow the users to rank and then select the best fit-for-purpose products. In this direction, we have developed the QualityML (http://qualityml.geoviqua.org), a dictionary that contains hierarchically structured concepts to precisely define and relate quality levels: from quality classes to quality measurements. Generically, a quality element is the path that goes from the higher level (quality class) to the lowest levels (statistics or quality metrics). This path is used to encode quality of datasets in the corresponding metadata schemas. The benefits of having encoded quality, in the case of data producers, are related with improvements in their product discovery and better transmission of their characteristics. In the case of data users, particularly decision-makers, they would find quality and uncertainty measures to take the best decisions as well as perform dataset intercomparison. Also it allows other components (such as visualization, discovery, or comparison tools) to be quality-aware and interoperable. On one hand, the QualityML is a profile of the ISO geospatial metadata standards providing a set of rules for precisely documenting quality indicator parameters that is structured in 6 levels. On the other hand, QualityML includes semantics and vocabularies for the quality concepts. Whenever possible, if uses statistic expressions from the UncertML dictionary (http://www.uncertml.org) encoding. However it also extends UncertML to provide list of alternative metrics that are commonly used to quantify quality. A specific example, based on a temperature dataset, is shown below. The annual mean temperature map has been validated with independent in-situ measurements to obtain a global error of 0.5 ° C. Level 0: Quality class (e.g., Thematic accuracy) Level 1: Quality indicator (e.g., Quantitative

  8. ATLAS File and Dataset Metadata Collection and Use

    CERN Document Server

    Albrand, S; The ATLAS collaboration; Lambert, F; Gallas, E J

    2012-01-01

    The ATLAS Metadata Interface (“AMI”) was designed as a generic cataloguing system, and as such it has found many uses in the experiment including software release management, tracking of reconstructed event sizes and control of dataset nomenclature. The primary use of AMI is to provide a catalogue of datasets (file collections) which is searchable using physics criteria. In this paper we discuss the various mechanisms used for filling the AMI dataset and file catalogues. By correlating information from different sources we can derive aggregate information which is important for physics analysis; for example the total number of events contained in dataset, and possible reasons for missing events such as a lost file. Finally we will describe some specialized interfaces which were developed for the Data Preparation and reprocessing coordinators. These interfaces manipulate information from both the dataset domain held in AMI, and the run-indexed information held in the ATLAS COMA application (Conditions and ...

  9. Accountable Metadata-Hiding Escrow: A Group Signature Case Study

    Directory of Open Access Journals (Sweden)

    Kohlweiss Markulf

    2015-06-01

    Full Text Available A common approach to demands for lawful access to encrypted data is to allow a trusted third party (TTP to gain access to private data. However, there is no way to verify that this trust is well placed as the TTP may open all messages indiscriminately. Moreover, existing approaches do not scale well when, in addition to the content of the conversation, one wishes to hide one’s identity. Given the importance of metadata this is a major problem. We propose a new approach in which users can retroactively verify cryptographically whether they were wiretapped. As a case study, we propose a new signature scheme that can act as an accountable replacement for group signatures, accountable forward and backward tracing signatures.

  10. ARIADNE: a Tracking System for Relationships in LHCb Metadata

    CERN Document Server

    Shapoval, I; Cattaneo, M

    2014-01-01

    The data processing model of the LHCb experiment implies handling of an evolving set of heterogeneous metadata entities and relationships between them. The entities range from software and databases states to architecture specificators and software/data deployment locations. For instance, there is an important relationship between the LHCb Conditions Database (CondDB), which provides versioned, time dependent geometry and conditions data, and the LHCb software, which is the data processing applications (used for simulation, high level triggering, reconstruction and analysis of physics data). The evolution of CondDB and of the LHCb applications is a weakly-homomorphic process. It means that relationships between a CondDB state and LHCb application state may not be preserved across different database and application generations. These issues may lead to various kinds of problems in the LHCb production, varying from unexpected application crashes to incorrect data processing results. In this paper we present Ari...

  11. Metadata and their impact on processes in Building Information Modeling

    Directory of Open Access Journals (Sweden)

    Vladimir Nyvlt

    2014-04-01

    Full Text Available Building Information Modeling (BIM itself contains huge potential, how to increase effectiveness of every project in its all life cycle. It means from initial investment plan through project and building-up activities to long-term usage and property maintenance and finally demolition. Knowledge Management or better say Knowledge Sharing covers two sets of tools, managerial and technological. Manager`s needs are real expectations and desires of final users in terms of how could they benefit from managing long-term projects, covering whole life cycle in terms of sparing investment money and other resources. Technology employed can help BIM processes to support and deliver these benefits to users. How to use this technology for data and metadata collection, storage and sharing, which processes may these new technologies deploy. We will touch how to cover optimized processes proposal for better and smooth support of knowledge sharing within project time-scale, and covering all its life cycle.

  12. AMCO Scribe Sampling Data Map Service, Oakland CA, 2017, US EPA Region 9

    Data.gov (United States)

    U.S. Environmental Protection Agency — This map service contains a single layer: Groundwater Samples. The layer draws at all scales. Full FGDC metadata for the layer may be found by clicking the layer...

  13. AMCO Off-Site Air Monitoring Map Service, Oakland CA, 2017, US EPA Region 9

    Data.gov (United States)

    U.S. Environmental Protection Agency — This map service contains a single layer: Off-Site Air Monitors. The layer draws at all scales. Full FGDC metadata for the layer may be found by clicking the layer...

  14. AMCO On-Site Air Monitoring Map Service, Oakland CA, Live 2017, US EPA Region 9

    Data.gov (United States)

    U.S. Environmental Protection Agency — This map service contains the following layers: All On-Site Air Monitors, TCE, PCE, and Vinyl Chloride. The layers draws at all scales. Full FGDC metadata for the...

  15. Phonion: Practical Protection of Metadata in Telephony Networks

    Directory of Open Access Journals (Sweden)

    Heuser Stephan

    2017-01-01

    Full Text Available The majority of people across the globe rely on telephony networks as their primary means of communication. As such, many of the most sensitive personal, corporate and government related communications pass through these systems every day. Unsurprisingly, such connections are subject to a wide range of attacks. Of increasing concern is the use of metadata contained in Call Detail Records (CDRs, which contain source, destination, start time and duration of a call. This information is potentially dangerous as the very act of two parties communicating can reveal significant details about their relationship and put them in the focus of targeted observation or surveillance, which is highly critical especially for journalists and activists. To address this problem, we develop the Phonion architecture to frustrate such attacks by separating call setup functions from call delivery. Specifically, Phonion allows users to preemptively establish call circuits across multiple providers and technologies before dialing into the circuit and does not require constant Internet connectivity. Since no single carrier can determine the ultimate destination of the call, it provides unlinkability for its users and helps them to avoid passive surveillance. We define and discuss a range of adversary classes and analyze why current obfuscation technologies fail to protect users against such metadata attacks. In our extensive evaluation we further analyze advanced anonymity technologies (e.g., VoIP over Tor, which do not preserve our functional requirements for high voice quality in the absence of constant broadband Internet connectivity and compatibility with landline and feature phones. Phonion is the first practical system to provide guarantees of unlinkable communication against a range of practical adversaries in telephony systems.

  16. Development of health information search engine based on metadata and ontology.

    Science.gov (United States)

    Song, Tae-Min; Park, Hyeoun-Ae; Jin, Dal-Lae

    2014-04-01

    The aim of the study was to develop a metadata and ontology-based health information search engine ensuring semantic interoperability to collect and provide health information using different application programs. Health information metadata ontology was developed using a distributed semantic Web content publishing model based on vocabularies used to index the contents generated by the information producers as well as those used to search the contents by the users. Vocabulary for health information ontology was mapped to the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), and a list of about 1,500 terms was proposed. The metadata schema used in this study was developed by adding an element describing the target audience to the Dublin Core Metadata Element Set. A metadata schema and an ontology ensuring interoperability of health information available on the internet were developed. The metadata and ontology-based health information search engine developed in this study produced a better search result compared to existing search engines. Health information search engine based on metadata and ontology will provide reliable health information to both information producer and information consumers.

  17. Foundations of a metadata repository for databases of registers and trials.

    Science.gov (United States)

    Stausberg, Jürgen; Löbe, Matthias; Verplancke, Philippe; Drepper, Johannes; Herre, Heinrich; Löffler, Markus

    2009-01-01

    The planning of case report forms (CRFs) in clinical trials or databases in registers is mostly an informal process starting from scratch involving domain experts, biometricians, and documentation specialists. The Telematikplattform für Medizinische Forschungsnetze, an umbrella organization for medical research in Germany, aims at supporting and improving this process with a metadata repository, covering the variables and value lists used in databases of registers and trials. The use cases for the metadata repository range from a specification of case report forms to the harmonization of variable collections, variables, and value lists through a formal review. The warehouse used for the storage of the metadata should at least fulfill the definition of part 3 "Registry metamodel and basic attributes" of ISO/IEC 11179 Information technology - Metadata registries. An implementation of the metadata repository should offer an import and export of metadata in the Operational Data Model standard of the Clinical Data Interchange Standards Consortium. It will facilitate the creation of CRFs and data models, improve the quality of CRFs and data models, support the harmonization of variables and value lists, and support the mapping of metadata and data.

  18. Describing Geospatial Assets in the Web of Data: A Metadata Management Scenario

    Directory of Open Access Journals (Sweden)

    Cristiano Fugazza

    2016-12-01

    Full Text Available Metadata management is an essential enabling factor for geospatial assets because discovery, retrieval, and actual usage of the latter are tightly bound to the quality of these descriptions. Unfortunately, the multi-faceted landscape of metadata formats, requirements, and conventions makes it difficult to identify editing tools that can be easily tailored to the specificities of a given project, workgroup, and Community of Practice. Our solution is a template-driven metadata editing tool that can be customised to any XML-based schema. Its output is constituted by standards-compliant metadata records that also have a semantics-aware counterpart eliciting novel exploitation techniques. Moreover, external data sources can easily be plugged in to provide autocompletion functionalities on the basis of the data structures made available on the Web of Data. Beside presenting the essentials on customisation of the editor by means of two use cases, we extend the methodology to the whole life cycle of geospatial metadata. We demonstrate the novel capabilities enabled by RDF-based metadata representation with respect to traditional metadata management in the geospatial domain.

  19. The ANSS Station Information System: A Centralized Station Metadata Repository for Populating, Managing and Distributing Seismic Station Metadata

    Science.gov (United States)

    Thomas, V. I.; Yu, E.; Acharya, P.; Jaramillo, J.; Chowdhury, F.

    2015-12-01

    Maintaining and archiving accurate site metadata is critical for seismic network operations. The Advanced National Seismic System (ANSS) Station Information System (SIS) is a repository of seismic network field equipment, equipment response, and other site information. Currently, there are 187 different sensor models and 114 data-logger models in SIS. SIS has a web-based user interface that allows network operators to enter information about seismic equipment and assign response parameters to it. It allows users to log entries for sites, equipment, and data streams. Users can also track when equipment is installed, updated, and/or removed from sites. When seismic equipment configurations change for a site, SIS computes the overall gain of a data channel by combining the response parameters of the underlying hardware components. Users can then distribute this metadata in standardized formats such as FDSN StationXML or dataless SEED. One powerful advantage of SIS is that existing data in the repository can be leveraged: e.g., new instruments can be assigned response parameters from the Incorporated Research Institutions for Seismology (IRIS) Nominal Response Library (NRL), or from a similar instrument already in the inventory, thereby reducing the amount of time needed to determine parameters when new equipment (or models) are introduced into a network. SIS is also useful for managing field equipment that does not produce seismic data (eg power systems, telemetry devices or GPS receivers) and gives the network operator a comprehensive view of site field work. SIS allows users to generate field logs to document activities and inventory at sites. Thus, operators can also use SIS reporting capabilities to improve planning and maintenance of the network. Queries such as how many sensors of a certain model are installed or what pieces of equipment have active problem reports are just a few examples of the type of information that is available to SIS users.

  20. A Spatio-Temporal Enhanced Metadata Model for Interdisciplinary Instant Point Observations in Smart Cities

    Directory of Open Access Journals (Sweden)

    Nengcheng Chen

    2017-02-01

    Full Text Available Due to the incomprehensive and inconsistent description of spatial and temporal information for city data observed by sensors in various fields, it is a great challenge to share the massive, multi-source and heterogeneous interdisciplinary instant point observation data resources. In this paper, a spatio-temporal enhanced metadata model for point observation data sharing was proposed. The proposed Data Meta-Model (DMM focused on the spatio-temporal characteristics and formulated a ten-tuple information description structure to provide a unified and spatio-temporal enhanced description of the point observation data. To verify the feasibility of the point observation data sharing based on DMM, a prototype system was established, and the performance improvement of Sensor Observation Service (SOS for the instant access and insertion of point observation data was realized through the proposed MongoSOS, which is a Not Only SQL (NoSQL SOS based on the MongoDB database and has the capability of distributed storage. For example, the response time of the access and insertion for navigation and positioning data can be realized at the millisecond level. Case studies were conducted, including the gas concentrations monitoring for the gas leak emergency response and the smart city public vehicle monitoring based on BeiDou Navigation Satellite System (BDS used for recording the dynamic observation information. The results demonstrated the versatility and extensibility of the DMM, and the spatio-temporal enhanced sharing for interdisciplinary instant point observations in smart cities.

  1. RPPAML/RIMS: a metadata format and an information management system for reverse phase protein arrays.

    Science.gov (United States)

    Stanislaus, Romesh; Carey, Mark; Deus, Helena F; Coombes, Kevin; Hennessy, Bryan T; Mills, Gordon B; Almeida, Jonas S

    2008-12-22

    Reverse Phase Protein Arrays (RPPA) are convenient assay platforms to investigate the presence of biomarkers in tissue lysates. As with other high-throughput technologies, substantial amounts of analytical data are generated. Over 1,000 samples may be printed on a single nitrocellulose slide. Up to 100 different proteins may be assessed using immunoperoxidase or immunoflorescence techniques in order to determine relative amounts of protein expression in the samples of interest. In this report an RPPA Information Management System (RIMS) is described and made available with open source software. In order to implement the proposed system, we propose a metadata format known as reverse phase protein array markup language (RPPAML). RPPAML would enable researchers to describe, document and disseminate RPPA data. The complexity of the data structure needed to describe the results and the graphic tools necessary to visualize them require a software deployment distributed between a client and a server application. This was achieved without sacrificing interoperability between individual deployments through the use of an open source semantic database, S3DB. This data service backbone is available to multiple client side applications that can also access other server side deployments. The RIMS platform was designed to interoperate with other data analysis and data visualization tools such as Cytoscape. The proposed RPPAML data format hopes to standardize RPPA data. Standardization of data would result in diverse client applications being able to operate on the same set of data. Additionally, having data in a standard format would enable data dissemination and data analysis.

  2. Tags and self-organisation: a metadata ecology for learning resources in a multilingual context

    OpenAIRE

    Vuorikari, Riina Hannuli

    2010-01-01

    Vuorikari, R. (2009). Tags and self-organisation: a metadata ecology for learning resources in a multilingual context. Doctoral thesis. November, 13, 2009, Heerlen, The Netherlands: Open University of the Netherlands, CELSTEC.

  3. Tags and self-organisation: a metadata ecology for learning resources in a multilingual context

    NARCIS (Netherlands)

    Vuorikari, Riina

    2009-01-01

    Vuorikari, R. (2009). Tags and self-organisation: a metadata ecology for learning resources in a multilingual context. Doctoral thesis. November, 13, 2009, Heerlen, The Netherlands: Open University of the Netherlands, CELSTEC.

  4. Scalable Metadata Management for a Large Multi-Source Seismic Data Repository

    Energy Technology Data Exchange (ETDEWEB)

    Gaylord, J. M. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Dodge, D. A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Magana-Zook, S. A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Barno, J. G. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Knapp, D. R. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Thomas, J. M. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Sullivan, D. S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Ruppert, S. D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Mellors, R. J. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2017-05-26

    In this work, we implemented the key metadata management components of a scalable seismic data ingestion framework to address limitations in our existing system, and to position it for anticipated growth in volume and complexity.

  5. Preserving Geological Samples and Metadata from Polar Regions

    Science.gov (United States)

    Grunow, A.; Sjunneskog, C. M.

    2011-12-01

    The Office of Polar Programs at the National Science Foundation (NSF-OPP) has long recognized the value of preserving earth science collections due to the inherent logistical challenges and financial costs of collecting geological samples from Polar Regions. NSF-OPP established two national facilities to make Antarctic geological samples and drill cores openly and freely available for research. The Antarctic Marine Geology Research Facility (AMGRF) at Florida State University was established in 1963 and archives Antarctic marine sediment cores, dredge samples and smear slides along with ship logs. The United States Polar Rock Repository (USPRR) at Ohio State University was established in 2003 and archives polar rock samples, marine dredges, unconsolidated materials and terrestrial cores, along with associated materials such as field notes, maps, raw analytical data, paleomagnetic cores, thin sections, microfossil mounts, microslides and residues. The existence of the AMGRF and USPRR helps to minimize redundant sample collecting, lessen the environmental impact of doing polar field work, facilitates field logistics planning and complies with the data sharing requirement of the Antarctic Treaty. USPRR acquires collections through donations from institutions and scientists and then makes these samples available as no-cost loans for research, education and museum exhibits. The AMGRF acquires sediment cores from US based and international collaboration drilling projects in Antarctica. Destructive research techniques are allowed on the loaned samples and loan requests are accepted from any accredited scientific institution in the world. Currently, the USPRR has more than 22,000 cataloged rock samples available to scientists from around the world. All cataloged samples are relabeled with a USPRR number, weighed, photographed and measured for magnetic susceptibility. Many aspects of the sample metadata are included in the database, e.g. geographical location, sample

  6. Automated Creation of Datamarts from a Clinical Data Warehouse, Driven by an Active Metadata Repository

    Science.gov (United States)

    Rogerson, Charles L.; Kohlmiller, Paul H.; Stutman, Harris

    1998-01-01

    A methodology and toolkit are described which enable the automated metadata-driven creation of datamarts from clinical data warehouses. The software uses schema-to-schema transformation driven by an active metadata repository. Tools for assessing datamart data quality are described, as well as methods for assessing the feasibility of implementing specific datamarts. A methodology for data remediation and the re-engineering of operational data capture is described.

  7. New Tools to Document and Manage Data/Metadata: Example NGEE Arctic and ARM

    Science.gov (United States)

    Crow, M. C.; Devarakonda, R.; Killeffer, T.; Hook, L.; Boden, T.; Wullschleger, S.

    2017-12-01

    Tools used for documenting, archiving, cataloging, and searching data are critical pieces of informatics. This poster describes tools being used in several projects at Oak Ridge National Laboratory (ORNL), with a focus on the U.S. Department of Energy's Next Generation Ecosystem Experiment in the Arctic (NGEE Arctic) and Atmospheric Radiation Measurements (ARM) project, and their usage at different stages of the data lifecycle. The Online Metadata Editor (OME) is used for the documentation and archival stages while a Data Search tool supports indexing, cataloging, and searching. The NGEE Arctic OME Tool [1] provides a method by which researchers can upload their data and provide original metadata with each upload while adhering to standard metadata formats. The tool is built upon a Java SPRING framework to parse user input into, and from, XML output. Many aspects of the tool require use of a relational database including encrypted user-login, auto-fill functionality for predefined sites and plots, and file reference storage and sorting. The Data Search Tool conveniently displays each data record in a thumbnail containing the title, source, and date range, and features a quick view of the metadata associated with that record, as well as a direct link to the data. The search box incorporates autocomplete capabilities for search terms and sorted keyword filters are available on the side of the page, including a map for geo-searching. These tools are supported by the Mercury [2] consortium (funded by DOE, NASA, USGS, and ARM) and developed and managed at Oak Ridge National Laboratory. Mercury is a set of tools for collecting, searching, and retrieving metadata and data. Mercury collects metadata from contributing project servers, then indexes the metadata to make it searchable using Apache Solr, and provides access to retrieve it from the web page. Metadata standards that Mercury supports include: XML, Z39.50, FGDC, Dublin-Core, Darwin-Core, EML, and ISO-19115.

  8. Metadata Analytics, Visualization, and Optimization: Experiments in statistical analysis of the Digital Public Library of America (DPLA

    Directory of Open Access Journals (Sweden)

    Corey A. Harper

    2016-07-01

    Full Text Available This paper presents the concepts of metadata assessment and “quantification” and describes preliminary research results applying these concepts to metadata from the Digital Public Library of America (DPLA. The introductory sections provide a technical outline of data pre-processing, and propose visualization techniques that can help us understand metadata characteristics in a given context. Example visualizations are shown and discussed, leading up to the use of "metadata fingerprints" -- D3 Star Plots -- to summarize metadata characteristics across multiple fields for arbitrary groupings of resources. Fingerprints are shown comparing metadata characterisics for different DPLA "Hubs" and also for used versus not used resources based on Google Analytics "pageview" counts. The closing sections introduce the concept of metadata optimization and explore the use of machine learning techniques to optimize metadata in the context of large-scale metadata aggregators like DPLA. Various statistical models are used to predict whether a particular DPLA item is used based only on its metadata. The article concludes with a discussion of the broad potential for machine learning and data science in libraries, academic institutions, and cultural heritage.

  9. EDI – A Template-Driven Metadata Editor for Research Data

    Directory of Open Access Journals (Sweden)

    Fabio Pavesi

    2016-10-01

    Full Text Available EDI is a general purpose, template-driven metadata editor for creating XML-based descriptions. Originally aimed at defining rich and standard metadata for geospatial resources, It can be easily customised in order to comply with a broad range of schemata and domains. EDI creates HTML5 [9] metadata forms with advanced assisted editing capabilities and compiles them into XML files. The examples included in the distribution implement profiles of the ISO 19139 standard for geographic information [14], such as core INSPIRE metadata [10], as well as the OGC [8] standard for sensor description, SensorML [11]. Templates (the blueprints for a specific metadata format drive form behaviour by element data types and provide advanced features like codelists1 underlying combo boxes or autocompletion functionalities. Virtually, the editing of any metadata format can be supported by creating a specific template. EDI is stored on GitHub at https://github.com/SP7-Ritmare/EDI-NG_client and https://github.com/SP7-Ritmare/EDI-NG_server.

  10. Distributed Metadata Management of Mass Storage System in High Energy Physics

    Science.gov (United States)

    Huang, Qiulan; Du, Ran; Cheng, YaoDong; Shi, Jingyan; Chen, Gang; Kan, Wenxiao

    2017-10-01

    In this contribution, we design and implement a dynamic and scalable distributed metadata management system(StarFS) to High Energy Physics (HEP) mass storage system. Particularly, we discuss the key technologies of the distributed metadata management system. We propose a new algorithm named Adaptive Directory Sub-tree Partition(ADSP) for metadata distribution. ADSP divides the filesystem namespace into sub-trees with directory granularity. Sub-trees will be stored on storage devices in flat structure, whose location and file attributes are recorded as extended attributes. The placement of sub-tree is adjusted adaptively according to the workload of metadata cluster so that the load balance could be improved and metadata cluster could be extended dynamically. Experiments show that ADSP achieves higher metadata performance and scalability compared to Gluster and Lustre. We also propose a new algorithm called Distributed Unified Layout(DULA) to improve dynamic scalability and efficiency of data positioning. A system with DULA could provide uniform data distribution and efficient data positioning. DULA is an improved consistent hashing algorithm which is able to locate data in O(1) without the help of routing information. Experiments prove that the better uniform data distribution and efficient data access can be achieved by DULA. This work is validated in YangBaJing Cosmic Ray(YBJ) experiment.

  11. Automating the Extraction of Metadata from Archaeological Data Using iRods Rules

    Directory of Open Access Journals (Sweden)

    David Walling

    2011-10-01

    Full Text Available The Texas Advanced Computing Center and the Institute for Classical Archaeology at the University of Texas at Austin developed a method that uses iRods rules and a Jython script to automate the extraction of metadata from digital archaeological data. The first step was to create a record-keeping system to classify the data. The record-keeping system employs file and directory hierarchy naming conventions designed specifically to maintain the relationship between the data objects and map the archaeological documentation process. The metadata implicit in the record-keeping system is automatically extracted upon ingest, combined with additional sources of metadata, and stored alongside the data in the iRods preservation environment. This method enables a more organized workflow for the researchers, helps them archive their data close to the moment of data creation, and avoids error prone manual metadata input. We describe the types of metadata extracted and provide technical details of the extraction process and storage of the data and metadata.

  12. Survey data and metadata modelling using document-oriented NoSQL

    Science.gov (United States)

    Rahmatuti Maghfiroh, Lutfi; Gusti Bagus Baskara Nugraha, I.

    2018-03-01

    Survey data that are collected from year to year have metadata change. However it need to be stored integratedly to get statistical data faster and easier. Data warehouse (DW) can be used to solve this limitation. However there is a change of variables in every period that can not be accommodated by DW. Traditional DW can not handle variable change via Slowly Changing Dimension (SCD). Previous research handle the change of variables in DW to manage metadata by using multiversion DW (MVDW). MVDW is designed using relational model. Some researches also found that developing nonrelational model in NoSQL database has reading time faster than the relational model. Therefore, we propose changes to metadata management by using NoSQL. This study proposes a model DW to manage change and algorithms to retrieve data with metadata changes. Evaluation of the proposed models and algorithms result in that database with the proposed design can retrieve data with metadata changes properly. This paper has contribution in comprehensive data analysis with metadata changes (especially data survey) in integrated storage.

  13. Recipes for Semantic Web Dog Food — The ESWC and ISWC Metadata Projects

    Science.gov (United States)

    Möller, Knud; Heath, Tom; Handschuh, Siegfried; Domingue, John

    Semantic Web conferences such as ESWC and ISWC offer prime opportunities to test and showcase semantic technologies. Conference metadata about people, papers and talks is diverse in nature and neither too small to be uninteresting or too big to be unmanageable. Many metadata-related challenges that may arise in the Semantic Web at large are also present here. Metadata must be generated from sources which are often unstructured and hard to process, and may originate from many different players, therefore suitable workflows must be established. Moreover, the generated metadata must use appropriate formats and vocabularies, and be served in a way that is consistent with the principles of linked data. This paper reports on the metadata efforts from ESWC and ISWC, identifies specific issues and barriers encountered during the projects, and discusses how these were approached. Recommendations are made as to how these may be addressed in the future, and we discuss how these solutions may generalize to metadata production for the Semantic Web at large.

  14. Structure constrained by metadata in networks of chess players.

    Science.gov (United States)

    Almeira, Nahuel; Schaigorodsky, Ana L; Perotti, Juan I; Billoni, Orlando V

    2017-11-09

    Chess is an emblematic sport that stands out because of its age, popularity and complexity. It has served to study human behavior from the perspective of a wide number of disciplines, from cognitive skills such as memory and learning, to aspects like innovation and decision-making. Given that an extensive documentation of chess games played throughout history is available, it is possible to perform detailed and statistically significant studies about this sport. Here we use one of the most extensive chess databases in the world to construct two networks of chess players. One of the networks includes games that were played over-the-board and the other contains games played on the Internet. We study the main topological characteristics of the networks, such as degree distribution and correlations, transitivity and community structure. We complement the structural analysis by incorporating players' level of play as node metadata. Although both networks are topologically different, we show that in both cases players gather in communities according to their expertise and that an emergent rich-club structure, composed by the top-rated players, is also present.

  15. File and metadata management for BESIII distributed computing

    International Nuclear Information System (INIS)

    Nicholson, C; Zheng, Y H; Lin, L; Deng, Z Y; Li, W D; Zhang, X M

    2012-01-01

    The BESIII experiment at the Institute of High Energy Physics (IHEP), Beijing, uses the high-luminosity BEPCII e + e − collider to study physics in the π-charm energy region around 3.7 GeV; BEPCII has produced the worlds largest samples of J/φ and φ’ events to date. An order of magnitude increase in the data sample size over the 2011-2012 data-taking period demanded a move from a very centralized to a distributed computing environment, as well as the development of an efficient file and metadata management system. While BESIII is on a smaller scale than some other HEP experiments, this poses particular challenges for its distributed computing and data management system. These constraints include limited resources and manpower, and low quality of network connections to IHEP. Drawing on the rich experience of the HEP community, a system has been developed which meets these constraints. The design and development of the BESIII distributed data management system, including its integration with other BESIII distributed computing components, such as job management, are presented here.

  16. Data to Pictures to Data: Outreach Imaging Software and Metadata

    Science.gov (United States)

    Levay, Z.

    2011-07-01

    A convergence between astronomy science and digital photography has enabled a steady stream of visually rich imagery from state-of-the-art data. The accessibility of hardware and software has facilitated an explosion of astronomical images for outreach, from space-based observatories, ground-based professional facilities and among the vibrant amateur astrophotography community. Producing imagery from science data involves a combination of custom software to understand FITS data (FITS Liberator), off-the-shelf, industry-standard software to composite multi-wavelength data and edit digital photographs (Adobe Photoshop), and application of photo/image-processing techniques. Some additional effort is needed to close the loop and enable this imagery to be conveniently available for various purposes beyond web and print publication. The metadata paradigms in digital photography are now complying with FITS and science software to carry information such as keyword tags and world coordinates, enabling these images to be usable in more sophisticated, imaginative ways exemplified by Sky in Google Earth and World Wide Telescope.

  17. Social tagging in the life sciences: characterizing a new metadata resource for bioinformatics.

    Science.gov (United States)

    Good, Benjamin M; Tennis, Joseph T; Wilkinson, Mark D

    2009-09-25

    Academic social tagging systems, such as Connotea and CiteULike, provide researchers with a means to organize personal collections of online references with keywords (tags) and to share these collections with others. One of the side-effects of the operation of these systems is the generation of large, publicly accessible metadata repositories describing the resources in the collections. In light of the well-known expansion of information in the life sciences and the need for metadata to enhance its value, these repositories present a potentially valuable new resource for application developers. Here we characterize the current contents of two scientifically relevant metadata repositories created through social tagging. This investigation helps to establish how such socially constructed metadata might be used as it stands currently and to suggest ways that new social tagging systems might be designed that would yield better aggregate products. We assessed the metadata that users of CiteULike and Connotea associated with citations in PubMed with the following metrics: coverage of the document space, density of metadata (tags) per document, rates of inter-annotator agreement, and rates of agreement with MeSH indexing. CiteULike and Connotea were very similar on all of the measurements. In comparison to PubMed, document coverage and per-document metadata density were much lower for the social tagging systems. Inter-annotator agreement within the social tagging systems and the agreement between the aggregated social tagging metadata and MeSH indexing was low though the latter could be increased through voting. The most promising uses of metadata from current academic social tagging repositories will be those that find ways to utilize the novel relationships between users, tags, and documents exposed through these systems. For more traditional kinds of indexing-based applications (such as keyword-based search) to benefit substantially from socially generated metadata in

  18. Social tagging in the life sciences: characterizing a new metadata resource for bioinformatics

    Directory of Open Access Journals (Sweden)

    Tennis Joseph T

    2009-09-01

    Full Text Available Abstract Background Academic social tagging systems, such as Connotea and CiteULike, provide researchers with a means to organize personal collections of online references with keywords (tags and to share these collections with others. One of the side-effects of the operation of these systems is the generation of large, publicly accessible metadata repositories describing the resources in the collections. In light of the well-known expansion of information in the life sciences and the need for metadata to enhance its value, these repositories present a potentially valuable new resource for application developers. Here we characterize the current contents of two scientifically relevant metadata repositories created through social tagging. This investigation helps to establish how such socially constructed metadata might be used as it stands currently and to suggest ways that new social tagging systems might be designed that would yield better aggregate products. Results We assessed the metadata that users of CiteULike and Connotea associated with citations in PubMed with the following metrics: coverage of the document space, density of metadata (tags per document, rates of inter-annotator agreement, and rates of agreement with MeSH indexing. CiteULike and Connotea were very similar on all of the measurements. In comparison to PubMed, document coverage and per-document metadata density were much lower for the social tagging systems. Inter-annotator agreement within the social tagging systems and the agreement between the aggregated social tagging metadata and MeSH indexing was low though the latter could be increased through voting. Conclusion The most promising uses of metadata from current academic social tagging repositories will be those that find ways to utilize the novel relationships between users, tags, and documents exposed through these systems. For more traditional kinds of indexing-based applications (such as keyword-based search to

  19. Effective use of metadata in the integration and analysis of multi-dimensional optical data

    Science.gov (United States)

    Pastorello, G. Z.; Gamon, J. A.

    2012-12-01

    Data discovery and integration relies on adequate metadata. However, creating and maintaining metadata is time consuming and often poorly addressed or avoided altogether, leading to problems in later data analysis and exchange. This is particularly true for research fields in which metadata standards do not yet exist or are under development, or within smaller research groups without enough resources. Vegetation monitoring using in-situ and remote optical sensing is an example of such a domain. In this area, data are inherently multi-dimensional, with spatial, temporal and spectral dimensions usually being well characterized. Other equally important aspects, however, might be inadequately translated into metadata. Examples include equipment specifications and calibrations, field/lab notes and field/lab protocols (e.g., sampling regimen, spectral calibration, atmospheric correction, sensor view angle, illumination angle), data processing choices (e.g., methods for gap filling, filtering and aggregation of data), quality assurance, and documentation of data sources, ownership and licensing. Each of these aspects can be important as metadata for search and discovery, but they can also be used as key data fields in their own right. If each of these aspects is also understood as an "extra dimension," it is possible to take advantage of them to simplify the data acquisition, integration, analysis, visualization and exchange cycle. Simple examples include selecting data sets of interest early in the integration process (e.g., only data collected according to a specific field sampling protocol) or applying appropriate data processing operations to different parts of a data set (e.g., adaptive processing for data collected under different sky conditions). More interesting scenarios involve guided navigation and visualization of data sets based on these extra dimensions, as well as partitioning data sets to highlight relevant subsets to be made available for exchange. The

  20. Metadata Quality in Institutional Repositories May be Improved by Addressing Staffing Issues

    Directory of Open Access Journals (Sweden)

    Elizabeth Stovold

    2016-09-01

    Full Text Available A Review of: Moulaison, S. H., & Dykas, F. (2016. High-quality metadata and repository staffing: Perceptions of United States–based OpenDOAR participants. Cataloging & Classification Quarterly, 54(2, 101-116. http://dx.doi.org/10.1080/01639374.2015.1116480 Objective – To investigate the quality of institutional repository metadata, metadata practices, and identify barriers to quality. Design – Survey questionnaire. Setting – The OpenDOAR online registry of worldwide repositories. Subjects – A random sample of 50 from 358 administrators of institutional repositories in the United States of America listed in the OpenDOAR registry. Methods – The authors surveyed a random sample of administrators of American institutional repositories included in the OpenDOAR registry. The survey was distributed electronically. Recipients were asked to forward the email if they felt someone else was better suited to respond. There were questions about the demographics of the repository, the metadata creation environment, metadata quality, standards and practices, and obstacles to quality. Results were analyzed in Excel, and qualitative responses were coded by two researchers together. Main results – There was a 42% (n=21 response rate to the section on metadata quality, a 40% (n=20 response rate to the metadata creation section, and 40% (n=20 to the section on obstacles to quality. The majority of respondents rated their metadata quality as average (65%, n=13 or above average (30%, n=5. No one rated the quality as high or poor, while 10% (n=2 rated the quality as below average. The survey found that the majority of descriptive metadata was created by professional (84%, n=16 or paraprofessional (53%, n=10 library staff. Professional staff were commonly involved in creating administrative metadata, reviewing the metadata, and selecting standards and documentation. Department heads and advisory committees were also involved in standards and documentation

  1. Predicting age groups of Twitter users based on language and metadata features.

    Directory of Open Access Journals (Sweden)

    Antonio A Morgan-Lopez

    Full Text Available Health organizations are increasingly using social media, such as Twitter, to disseminate health messages to target audiences. Determining the extent to which the target audience (e.g., age groups was reached is critical to evaluating the impact of social media education campaigns. The main objective of this study was to examine the separate and joint predictive validity of linguistic and metadata features in predicting the age of Twitter users. We created a labeled dataset of Twitter users across different age groups (youth, young adults, adults by collecting publicly available birthday announcement tweets using the Twitter Search application programming interface. We manually reviewed results and, for each age-labeled handle, collected the 200 most recent publicly available tweets and user handles' metadata. The labeled data were split into training and test datasets. We created separate models to examine the predictive validity of language features only, metadata features only, language and metadata features, and words/phrases from another age-validated dataset. We estimated accuracy, precision, recall, and F1 metrics for each model. An L1-regularized logistic regression model was conducted for each age group, and predicted probabilities between the training and test sets were compared for each age group. Cohen's d effect sizes were calculated to examine the relative importance of significant features. Models containing both Tweet language features and metadata features performed the best (74% precision, 74% recall, 74% F1 while the model containing only Twitter metadata features were least accurate (58% precision, 60% recall, and 57% F1 score. Top predictive features included use of terms such as "school" for youth and "college" for young adults. Overall, it was more challenging to predict older adults accurately. These results suggest that examining linguistic and Twitter metadata features to predict youth and young adult Twitter users may

  2. Geospatial data infrastructure: The development of metadata for geo-information in China

    International Nuclear Information System (INIS)

    Xu, Baiquan; Yan, Shiqiang; Wang, Qianju; Lian, Jian; Wu, Xiaoping; Ding, Keyong

    2014-01-01

    service, convergence, database, product, policy, technology, standard and infrastructure systems. The development of geoinformation stores and services put forward a need for Geospatial Data Infrastructure (GDI) in China. In this paper, some of the ideas envisaged into the development of metadata in China are discussed

  3. Geospatial data infrastructure: The development of metadata for geo-information in China

    Science.gov (United States)

    Xu, Baiquan; Yan, Shiqiang; Wang, Qianju; Lian, Jian; Wu, Xiaoping; Ding, Keyong

    2014-03-01

    , database, product, policy, technology, standard and infrastructure systems. The development of geoinformation stores and services put forward a need for Geospatial Data Infrastructure (GDI) in China. In this paper, some of the ideas envisaged into the development of metadata in China are discussed.

  4. Separation of metadata and pixel data to speed DICOM tag morphing.

    Science.gov (United States)

    Ismail, Mahmoud; Philbin, James

    2013-01-01

    The DICOM information model combines pixel data and metadata in single DICOM object. It is not possible to access the metadata separately from the pixel data. There are use cases where only metadata is accessed. The current DICOM object format increases the running time of those use cases. Tag morphing is one of those use cases. Tag morphing includes deletion, insertion or manipulation of one or more of the metadata attributes. It is typically used for order reconciliation on study acquisition or to localize the issuer of patient ID (IPID) and the patient ID attributes when data from one domain is transferred to a different domain. In this work, we propose using Multi-Series DICOM (MSD) objects, which separate metadata from pixel data and remove duplicate attributes, to reduce the time required for Tag Morphing. The time required to update a set of study attributes in each format is compared. The results show that the MSD format significantly reduces the time required for tag morphing.

  5. CCR+: Metadata Based Extended Personal Health Record Data Model Interoperable with the ASTM CCR Standard.

    Science.gov (United States)

    Park, Yu Rang; Yoon, Young Jo; Jang, Tae Hun; Seo, Hwa Jeong; Kim, Ju Han

    2014-01-01

    Extension of the standard model while retaining compliance with it is a challenging issue because there is currently no method for semantically or syntactically verifying an extended data model. A metadata-based extended model, named CCR+, was designed and implemented to achieve interoperability between standard and extended models. Furthermore, a multilayered validation method was devised to validate the standard and extended models. The American Society for Testing and Materials (ASTM) Community Care Record (CCR) standard was selected to evaluate the CCR+ model; two CCR and one CCR+ XML files were evaluated. In total, 188 metadata were extracted from the ASTM CCR standard; these metadata are semantically interconnected and registered in the metadata registry. An extended-data-model-specific validation file was generated from these metadata. This file can be used in a smartphone application (Health Avatar CCR+) as a part of a multilayered validation. The new CCR+ model was successfully evaluated via a patient-centric exchange scenario involving multiple hospitals, with the results supporting both syntactic and semantic interoperability between the standard CCR and extended, CCR+, model. A feasible method for delivering an extended model that complies with the standard model is presented herein. There is a great need to extend static standard models such as the ASTM CCR in various domains: the methods presented here represent an important reference for achieving interoperability between standard and extended models.

  6. Virtual Environments for Visualizing Structural Health Monitoring Sensor Networks, Data, and Metadata

    Science.gov (United States)

    Napolitano, Rebecca; Blyth, Anna; Glisic, Branko

    2018-01-01

    Visualization of sensor networks, data, and metadata is becoming one of the most pivotal aspects of the structural health monitoring (SHM) process. Without the ability to communicate efficiently and effectively between disparate groups working on a project, an SHM system can be underused, misunderstood, or even abandoned. For this reason, this work seeks to evaluate visualization techniques in the field, identify flaws in current practices, and devise a new method for visualizing and accessing SHM data and metadata in 3D. More precisely, the work presented here reflects a method and digital workflow for integrating SHM sensor networks, data, and metadata into a virtual reality environment by combining spherical imaging and informational modeling. Both intuitive and interactive, this method fosters communication on a project enabling diverse practitioners of SHM to efficiently consult and use the sensor networks, data, and metadata. The method is presented through its implementation on a case study, Streicker Bridge at Princeton University campus. To illustrate the efficiency of the new method, the time and data file size were compared to other potential methods used for visualizing and accessing SHM sensor networks, data, and metadata in 3D. Additionally, feedback from civil engineering students familiar with SHM is used for validation. Recommendations on how different groups working together on an SHM project can create SHM virtual environment and convey data to proper audiences, are also included. PMID:29337877

  7. Geo-Enrichment and Semantic Enhancement of Metadata Sets to Augment Discovery in Geoportals

    Directory of Open Access Journals (Sweden)

    Bernhard Vockner

    2014-03-01

    Full Text Available Geoportals are established to function as main gateways to find, evaluate, and start “using” geographic information. Still, current geoportal implementations face problems in optimizing the discovery process due to semantic heterogeneity issues, which leads to low recall and low precision in performing text-based searches. Therefore, we propose an enhanced semantic discovery approach that supports multilingualism and information domain context. Thus, we present workflow that enriches existing structured metadata with synonyms, toponyms, and translated terms derived from user-defined keywords based on multilingual thesauri and ontologies. To make the results easier and understandable, we also provide automated translation capabilities for the resource metadata to support the user in conceiving the thematic content of the descriptive metadata, even if it has been documented using a language the user is not familiar with. In addition, to text-enable spatial filtering capabilities, we add additional location name keywords to metadata sets. These are based on the existing bounding box and shall tweak discovery scores when performing single text line queries. In order to improve the user’s search experience, we tailor faceted search strategies presenting an enhanced query interface for geo-metadata discovery that are transparently leveraging the underlying thesauri and ontologies.

  8. RPPAML/RIMS: A metadata format and an information management system for reverse phase protein arrays

    Directory of Open Access Journals (Sweden)

    Hennessy Bryan T

    2008-12-01

    Full Text Available Abstract Background Reverse Phase Protein Arrays (RPPA are convenient assay platforms to investigate the presence of biomarkers in tissue lysates. As with other high-throughput technologies, substantial amounts of analytical data are generated. Over 1000 samples may be printed on a single nitrocellulose slide. Up to 100 different proteins may be assessed using immunoperoxidase or immunoflorescence techniques in order to determine relative amounts of protein expression in the samples of interest. Results In this report an RPPA Information Management System (RIMS is described and made available with open source software. In order to implement the proposed system, we propose a metadata format known as reverse phase protein array markup language (RPPAML. RPPAML would enable researchers to describe, document and disseminate RPPA data. The complexity of the data structure needed to describe the results and the graphic tools necessary to visualize them require a software deployment distributed between a client and a server application. This was achieved without sacrificing interoperability between individual deployments through the use of an open source semantic database, S3DB. This data service backbone is available to multiple client side applications that can also access other server side deployments. The RIMS platform was designed to interoperate with other data analysis and data visualization tools such as Cytoscape. Conclusion The proposed RPPAML data format hopes to standardize RPPA data. Standardization of data would result in diverse client applications being able to operate on the same set of data. Additionally, having data in a standard format would enable data dissemination and data analysis.

  9. CMR Catalog Service for the Web

    Science.gov (United States)

    Newman, Doug; Mitchell, Andrew

    2016-01-01

    With the impending retirement of Global Change Master Directory (GCMD) Application Programming Interfaces (APIs) the Common Metadata Repository (CMR) was charged with providing a collection-level Catalog Service for the Web (CSW) that provided the same level of functionality as GCMD. This talk describes the capabilities of the CMR CSW API with particular reference to the support of the Committee on Earth Observation Satellites (CEOS) Working Group on Information Systems and Services (WGISS) Integrated Catalog (CWIC).

  10. Spatiotemporal Frequent Pattern Discovery from Solar Event Metadata

    Science.gov (United States)

    Aydin, B.; Angryk, R.; Filali Boubrahimi, S.; Hamdi, S. M.

    2016-12-01

    Solar physics researchers entered the big data era with the launch of NASA's Solar Dynamics Observatory (SDO) mission, which captures approximately 60,000 high-resolution images every day and generates 0.55 petabytes of raster data each year. The big data trend in solar data is anticipated to be sustained by the ground-based DKIST telescope, which is expected to generate three to five petabytes of data each year. Many software modules continuously work on SDO's image data to detect spatial boundaries of solar events. Recently, a solar event tracking algorithm and interpolation methodologies have been proposed for creating large-scale solar event vector data sets in GSU's Data Mining Lab. The solar event tracking algorithm utilizes the spatial locations and corresponding image parameters for linking the polygon-based instances; therefore, creates spatiotemporal trajectory objects with extended geometric representations. Thus, we can access and make use of vector-based solar event metadata, which is in the form of continuously evolving region trajectories. Spatial and temporal patterns such as co-occurrences, sequences, periodicity and convergences frequently transpire among solar event instances. Here, we will concentrate on spatiotemporal co-occurrences and event sequences. Spatiotemporal co-occurrences are the spatial and temporal overlap among two or more solar event instances. On the other hand, spatiotemporal event sequences appear among the events that are temporally following each other and spatially in close-by locations. Our study includes approximately 120,000 trajectory-based instances of seven solar event types (Active Regions, Coronal Holes, Emerging Flux, Filaments, Flares, Sigmoids, and Sunspots) that occurred between January 2012 and December 2014. The tracked solar events are interpolated at each 10-minute interval. We will present the results of our spatiotemporal co-occurrence pattern mining and spatiotemporal event sequence mining algorithms

  11. The CARMEN software as a service infrastructure.

    Science.gov (United States)

    Weeks, Michael; Jessop, Mark; Fletcher, Martyn; Hodge, Victoria; Jackson, Tom; Austin, Jim

    2013-01-28

    The CARMEN platform allows neuroscientists to share data, metadata, services and workflows, and to execute these services and workflows remotely via a Web portal. This paper describes how we implemented a service-based infrastructure into the CARMEN Virtual Laboratory. A Software as a Service framework was developed to allow generic new and legacy code to be deployed as services on a heterogeneous execution framework. Users can submit analysis code typically written in Matlab, Python, C/C++ and R as non-interactive standalone command-line applications and wrap them as services in a form suitable for deployment on the platform. The CARMEN Service Builder tool enables neuroscientists to quickly wrap their analysis software for deployment to the CARMEN platform, as a service without knowledge of the service framework or the CARMEN system. A metadata schema describes each service in terms of both system and user requirements. The search functionality allows services to be quickly discovered from the many services available. Within the platform, services may be combined into more complicated analyses using the workflow tool. CARMEN and the service infrastructure are targeted towards the neuroscience community; however, it is a generic platform, and can be targeted towards any discipline.

  12. Parallel file system with metadata distributed across partitioned key-value store c

    Science.gov (United States)

    Bent, John M.; Faibish, Sorin; Grider, Gary; Torres, Aaron

    2017-09-19

    Improved techniques are provided for storing metadata associated with a plurality of sub-files associated with a single shared file in a parallel file system. The shared file is generated by a plurality of applications executing on a plurality of compute nodes. A compute node implements a Parallel Log Structured File System (PLFS) library to store at least one portion of the shared file generated by an application executing on the compute node and metadata for the at least one portion of the shared file on one or more object storage servers. The compute node is also configured to implement a partitioned data store for storing a partition of the metadata for the shared file, wherein the partitioned data store communicates with partitioned data stores on other compute nodes using a message passing interface. The partitioned data store can be implemented, for example, using Multidimensional Data Hashing Indexing Middleware (MDHIM).

  13. High-performance metadata indexing and search in petascale data storage systems

    International Nuclear Information System (INIS)

    Leung, A W; Miller, E L; Shao, M; Bisson, T; Pasupathy, S

    2008-01-01

    Large-scale storage systems used for scientific applications can store petabytes of data and billions of files, making the organization and management of data in these systems a difficult, time-consuming task. The ability to search file metadata in a storage system can address this problem by allowing scientists to quickly navigate experiment data and code while allowing storage administrators to gather the information they need to properly manage the system. In this paper, we present Spyglass, a file metadata search system that achieves scalability by exploiting storage system properties, providing the scalability that existing file metadata search tools lack. In doing so, Spyglass can achieve search performance up to several thousand times faster than existing database solutions. We show that Spyglass enables important functionality that can aid data management for scientists and storage administrators

  14. Detection of Vandalism in Wikipedia using Metadata Features – Implementation in Simple English and Albanian sections

    Directory of Open Access Journals (Sweden)

    Arsim Susuri

    2017-03-01

    Full Text Available In this paper, we evaluate a list of classifiers in order to use them in the detection of vandalism by focusing on metadata features. Our work is focused on two low resource data sets (Simple English and Albanian from Wikipedia. The aim of this research is to prove that this form of vandalism detection applied in one data set (language can be extended into another data set (language. Article views data sets in Wikipedia have been used rarely for the purpose of detecting vandalism. We will show the benefits of using article views data set with features from the article revisions data set with the aim of improving the detection of vandalism. The key advantage of using metadata features is that these metadata features are language independent and simple to extract because they require minimal processing. This paper shows that application of vandalism models across low resource languages is possible, and vandalism can be detected through view patterns of articles.

  15. Managing data warehouse metadata using the Web: A Web-based DBA maintenance tool suite

    Energy Technology Data Exchange (ETDEWEB)

    Yow, T. [Oak Ridge National Lab., TN (United States); Grubb, J.; Jennings, S. [Univ. of Tennessee, Knoxville, TN (United States)

    1998-12-31

    The Oak Ridge National Laboratory (ORNL) Distributed Active Archive Center (DAAC), which is associated with NASA`s Earth Observing System Data and Information System (EOSDIS), provides access to datasets used in environmental research. As a data warehouse for NASA, the ORNL DAAC archives and distributes data from NASA`s ground-based field experiments. In order to manage its large and diverse data holdings, the DAAC has mined metadata that is stored in several Sybase databases. However, the task of managing the metadata itself has become such a complicated task that the DAAC has developed a Web-based Graphical User Interface (GUI) called the DBA maintenance Tool Suite. This Web-based tool allows the DBA to maintain the DAAC`s metadata databases with the click of a mouse button. This tool greatly reduces the complexities of database maintenance and facilitates the task of data delivery to the DAAC`s user community.

  16. Pathogen metadata platform: software for accessing and analyzing pathogen strain information.

    Science.gov (United States)

    Chang, Wenling E; Peterson, Matthew W; Garay, Christopher D; Korves, Tonia

    2016-09-15

    Pathogen metadata includes information about where and when a pathogen was collected and the type of environment it came from. Along with genomic nucleotide sequence data, this metadata is growing rapidly and becoming a valuable resource not only for research but for biosurveillance and public health. However, current freely available tools for analyzing this data are geared towards bioinformaticians and/or do not provide summaries and visualizations needed to readily interpret results. We designed a platform to easily access and summarize data about pathogen samples. The software includes a PostgreSQL database that captures metadata useful for disease outbreak investigations, and scripts for downloading and parsing data from NCBI BioSample and BioProject into the database. The software provides a user interface to query metadata and obtain standardized results in an exportable, tab-delimited format. To visually summarize results, the user interface provides a 2D histogram for user-selected metadata types and mapping of geolocated entries. The software is built on the LabKey data platform, an open-source data management platform, which enables developers to add functionalities. We demonstrate the use of the software in querying for a pathogen serovar and for genome sequence identifiers. This software enables users to create a local database for pathogen metadata, populate it with data from NCBI, easily query the data, and obtain visual summaries. Some of the components, such as the database, are modular and can be incorporated into other data platforms. The source code is freely available for download at https://github.com/wchangmitre/bioattribution .

  17. Towards a best practice of modeling unit of measure and related statistical metadata

    CERN Document Server

    Grossmann, Wilfried

    2011-01-01

    Data and metadata exchange between organizations requires a common language for describing structure and content of statistical data and metadata. The SDMX consortium develops content oriented guidelines (COG) recommending harmonized cross-domain concepts and terminology to increase the efficiency of (meta-) data exchange. A recent challenge is a recommended code list for the unit of measure. Based on examples from SDMX sponsor organizations this paper analyses the diversity of ""unit of measure"" as used in practice, including potential breakdowns and interdependencies of the respective meta-

  18. Title, Description, and Subject are the Most Important Metadata Fields for Keyword Discoverability

    Directory of Open Access Journals (Sweden)

    Laura Costello

    2016-09-01

    Full Text Available A Review of: Yang, L. (2016. Metadata effectiveness in internet discovery: An analysis of digital collection metadata elements and internet search engine keywords. College & Research Libraries, 77(1, 7-19. http://doi.org/10.5860/crl.77.1.7 Objective – To determine which metadata elements best facilitate discovery of digital collections. Design – Case study. Setting – A public research university serving over 32,000 graduate and undergraduate students in the Southwestern United States of America. Subjects – A sample of 22,559 keyword searches leading to the institution’s digital repository between August 1, 2013, and July 31, 2014. Methods – The author used Google Analytics to analyze 73,341 visits to the institution’s digital repository. He determined that 22,559 of these visits were due to keyword searches. Using Random Integer Generator, the author identified a random sample of 378 keyword searches. The author then matched the keywords with the Dublin Core and VRA Core metadata elements on the landing page in the digital repository to determine which metadata field had drawn the keyword searcher to that particular page. Many of these keywords matched to more than one metadata field, so the author also analyzed the metadata elements that generated unique keyword hits and those fields that were frequently matched together. Main Results – Title was the most matched metadata field with 279 matched keywords from searches. Description and Subject were also significant fields with 208 and 79 matches respectively. Slightly more than half of the results, 195 keywords, matched the institutional repository in one field only. Both Title and Description had significant match rates both independently and in conjunction with other elements, but Subject keywords were the sole match in only three of the sampled cases. Conclusion – The Dublin Core elements of Title, Description, and Subject were the most frequently matched fields in keyword

  19. Cleaning by clustering: methodology for addressing data quality issues in biomedical metadata.

    Science.gov (United States)

    Hu, Wei; Zaveri, Amrapali; Qiu, Honglei; Dumontier, Michel

    2017-09-18

    The ability to efficiently search and filter datasets depends on access to high quality metadata. While most biomedical repositories require data submitters to provide a minimal set of metadata, some such as the Gene Expression Omnibus (GEO) allows users to specify additional metadata in the form of textual key-value pairs (e.g. sex: female). However, since there is no structured vocabulary to guide the submitter regarding the metadata terms to use, consequently, the 44,000,000+ key-value pairs in GEO suffer from numerous quality issues including redundancy, heterogeneity, inconsistency, and incompleteness. Such issues hinder the ability of scientists to hone in on datasets that meet their requirements and point to a need for accurate, structured and complete description of the data. In this study, we propose a clustering-based approach to address data quality issues in biomedical, specifically gene expression, metadata. First, we present three different kinds of similarity measures to compare metadata keys. Second, we design a scalable agglomerative clustering algorithm to cluster similar keys together. Our agglomerative cluster algorithm identified metadata keys that were similar, based on (i) name, (ii) core concept and (iii) value similarities, to each other and grouped them together. We evaluated our method using a manually created gold standard in which 359 keys were grouped into 27 clusters based on six types of characteristics: (i) age, (ii) cell line, (iii) disease, (iv) strain, (v) tissue and (vi) treatment. As a result, the algorithm generated 18 clusters containing 355 keys (four clusters with only one key were excluded). In the 18 clusters, there were keys that were identified correctly to be related to that cluster, but there were 13 keys which were not related to that cluster. We compared our approach with four other published methods. Our approach significantly outperformed them for most metadata keys and achieved the best average F-Score (0

  20. ­The Geospatial Metadata Manager’s Toolbox: Three Techniques for Maintaining Records

    Directory of Open Access Journals (Sweden)

    Bruce Godfrey

    2015-07-01

    Full Text Available Managing geospatial metadata records requires a range of techniques. At the University of Idaho Library, we have tens of thousands of records which need to be maintained as well as the addition of new records which need to be normalized and added to the collections. We show a graphical user interface (GUI tool that was developed to make simple modifications, a simple XSLT that operates on complex metadata, and a Python script with enables parallel processing to make maintenance tasks more efficient. Throughout, we compare these techniques and discuss when they may be useful.

  1. Indeksering av og søking i hierarkiske metadata i XML-database

    OpenAIRE

    Ingebretsen, Knut Bjørke

    2005-01-01

    Denne oppgaven ser på hvordan en XML-database kan brukes til indeksering av og søking i hierarkiske metadata. Dette inngår som en del av arbeidet med å gjøre informasjon fra forskjellige samlinger tilgjengelig for informasjonssøkere. Problemstillingen det har vært arbeidet med er delt i to. Den ene delen var å finne ut hvordan en XML-database kan brukes som lokal indeks for metadata på et hierarkisk format. Den andre delen var å finne ut hvordan et søkegrensesnitt kan utvides til å utnytte h...

  2. Towards an Interoperable Field Spectroscopy Metadata Standard with Extended Support for Marine Specific Applications

    Directory of Open Access Journals (Sweden)

    Barbara A. Rasaiah

    2015-11-01

    Full Text Available This paper presents an approach to developing robust metadata standards for specific applications that serves to ensure a high level of reliability and interoperability for a spectroscopy dataset. The challenges of designing a metadata standard that meets the unique requirements of specific user communities are examined, including in situ measurement of reflectance underwater, using coral as a case in point. Metadata schema mappings from seven existing metadata standards demonstrate that they consistently fail to meet the needs of field spectroscopy scientists for general and specific applications (μ = 22%, σ = 32% conformance with the core metadata requirements and μ = 19%, σ = 18% for the special case of a benthic (e.g., coral reflectance metadataset. Issues such as field measurement methods, instrument calibration, and data representativeness for marine field spectroscopy campaigns are investigated within the context of submerged benthic measurements. The implication of semantics and syntax for a robust and flexible metadata standard are also considered. A hybrid standard that serves as a “best of breed” incorporating useful modules and parameters within the standards is proposed. This paper is Part 3 in a series of papers in this journal, examining the issues central to a metadata standard for field spectroscopy datasets. The results presented in this paper are an important step towards field spectroscopy metadata standards that address the specific needs of field spectroscopy data stakeholders while facilitating dataset documentation, quality assurance, discoverability and data exchange within large-scale information sharing platforms.

  3. Semantic web-based intelligent geospatial web services

    CERN Document Server

    Yue, Peng

    2013-01-01

    By introducing Semantic Web technologies into geospatial Web services, this book addresses the semantic description of geospatial data and standards-based Web services, discovery of geospatial data and services, and generation of composite services. Semantic descriptions for geospatial data, services, and geoprocessing service chains are structured, organized, and registered in geospatial catalogue services. The ontology-based approach helps to improve the recall and precision of data and services discovery. Semantics-enabled metadata tracking and satisfaction allows analysts to focus on the g

  4. NetCDF4/HDF5 and Linked Data in the Real World - Enriching Geoscientific Metadata without Bloat

    Science.gov (United States)

    Ip, Alex; Car, Nicholas; Druken, Kelsey; Poudjom-Djomani, Yvette; Butcher, Stirling; Evans, Ben; Wyborn, Lesley

    2017-04-01

    NetCDF4 has become the dominant generic format for many forms of geoscientific data, leveraging (and constraining) the versatile HDF5 container format, while providing metadata conventions for interoperability. However, the encapsulation of detailed metadata within each file can lead to metadata "bloat", and difficulty in maintaining consistency where metadata is replicated to multiple locations. Complex conceptual relationships are also difficult to represent in simple key-value netCDF metadata. Linked Data provides a practical mechanism to address these issues by associating the netCDF files and their internal variables with complex metadata stored in Semantic Web vocabularies and ontologies, while complying with and complementing existing metadata conventions. One of the stated objectives of the netCDF4/HDF5 formats is that they should be self-describing: containing metadata sufficient for cataloguing and using the data. However, this objective can be regarded as only partially-met where details of conventions and definitions are maintained externally to the data files. For example, one of the most widely used netCDF community standards, the Climate and Forecasting (CF) Metadata Convention, maintains standard vocabularies for a broad range of disciplines across the geosciences, but this metadata is currently neither readily discoverable nor machine-readable. We have previously implemented useful Linked Data and netCDF tooling (ncskos) that associates netCDF files, and individual variables within those files, with concepts in vocabularies formulated using the Simple Knowledge Organization System (SKOS) ontology. NetCDF files contain Uniform Resource Identifier (URI) links to terms represented as SKOS Concepts, rather than plain-text representations of those terms, so we can use simple, standardised web queries to collect and use rich metadata for the terms from any Linked Data-presented SKOS vocabulary. Geoscience Australia (GA) manages a large volume of diverse

  5. The open research system: a web-based metadata and data repository for collaborative research

    Science.gov (United States)

    Charles M. Schweik; Alexander Stepanov; J. Morgan Grove

    2005-01-01

    Beginning in 1999, a web-based metadata and data repository we call the "open research system" (ORS) was designed and built to assist geographically distributed scientific research teams. The purpose of this innovation was to promote the open sharing of data within and across organizational lines and across geographic distances. As the use of the system...

  6. On the communication of scientific data: The Full-Metadata Format

    DEFF Research Database (Denmark)

    Riede, Moritz; Schueppel, Rico; Sylvester-Hvid, Kristian O.

    2010-01-01

    In this paper, we introduce a scientific format for text-based data files, which facilitates storing and communicating tabular data sets. The so-called Full-Metadata Format builds on the widely used INI-standard and is based on four principles: readable self-documentation, flexible structure, fail...

  7. Study on contexts in tracking usage and attention metadata in multilingual Technology Enhanced Learning

    NARCIS (Netherlands)

    Vuorikari, Riina; Berendt, Bettina

    2009-01-01

    Vuorikari, R., & Berendt, B. (2009). Study on contexts in tracking usage and attention metadata in multilingual Technology Enhanced Learning. In S. Fischer, E. Maehle & R. Reischuk (Eds.), Im Focus das Leben, Lecture Notes in Informatics (LNI) (Vol. 154, pp. 181, 1654-1663). Informatik 2009, Lübeck,

  8. Python, Google Sheets, and the Thesaurus for Graphic Materials for Efficient Metadata Project Workflows

    Directory of Open Access Journals (Sweden)

    Jeremy Bartczak

    2017-01-01

    Full Text Available In 2017, the University of Virginia (U.Va. will launch a two year initiative to celebrate the bicentennial anniversary of the University’s founding in 1819. The U.Va. Library is participating in this event by digitizing some 20,000 photographs and negatives that document student life on the U.Va. grounds in the 1960s and 1970s. Metadata librarians and archivists are well-versed in the challenges associated with generating digital content and accompanying description within the context of limited resources. This paper describes how technology and new approaches to metadata design have enabled the University of Virginia’s Metadata Analysis and Design Department to rapidly and successfully generate accurate description for these digital objects. Python’s pandas module improves efficiency by cleaning and repurposing data recorded at digitization, while the lxml module builds MODS XML programmatically from CSV tables. A simplified technique for subject heading selection and assignment in Google Sheets provides a collaborative environment for streamlined metadata creation and data quality control.

  9. That obscure object of desire: multimedia metadata on the Web, part 1

    NARCIS (Netherlands)

    F.-M. Nack (Frank); J.R. van Ossenbruggen (Jacco); L. Hardman (Lynda)

    2003-01-01

    textabstractThis article discusses the state of the art in metadata for audio-visual media in large semantic networks, such as the Semantic Web. Our discussion is predominantly motivated by the two most widely known approaches towards machine-processable and semantic-based content description,

  10. That Obscure Object of Desire: Multimedia Metadata on the Web (Part II)

    NARCIS (Netherlands)

    F.-M. Nack (Frank); J.R. van Ossenbruggen (Jacco); L. Hardman (Lynda)

    2005-01-01

    textabstractThis article discusses the state of the art in metadata for audio-visual media in large semantic networks, such as the Semantic Web. Our discussion is predominantly motivated by the two most widely known approaches towards machine-processable and semantic-based content description,

  11. The Semantic Mapping of Archival Metadata to the CIDOC CRM Ontology

    Science.gov (United States)

    Bountouri, Lina; Gergatsoulis, Manolis

    2011-01-01

    In this article we analyze the main semantics of archival description, expressed through Encoded Archival Description (EAD). Our main target is to map the semantics of EAD to the CIDOC Conceptual Reference Model (CIDOC CRM) ontology as part of a wider integration architecture of cultural heritage metadata. Through this analysis, it is concluded…

  12. A renaissance in library metadata? The importance of community collaboration in a digital world

    Directory of Open Access Journals (Sweden)

    Sarah Bull

    2016-07-01

    Full Text Available This article summarizes a presentation given by Sarah Bull as part of the Association of Learned and Professional Society Publishers (ALPSP seminar ‘Setting the Standard’ in November 2015. Representing the library community at the wide-ranging seminar, Sarah was tasked with making the topic of library metadata an engaging and informative one for a largely publisher audience. With help from co-author Amanda Quimby, this article is an attempt to achieve the same aim! It covers the importance of library metadata and standards in the supply chain and also reflects on the role of the community in successful standards development and maintenance. Special emphasis is given to the importance of quality in e-book metadata and the need for publisher and library collaboration to improve discovery, usage and the student experience. The article details the University of Birmingham experience of e-book metadata from a workflow perspective to highlight the complex integration issues which remain between content procurement and discovery.

  13. Scaling the walls of discovery: using semantic metadata for integrative problem solving.

    Science.gov (United States)

    Manning, Maurice; Aggarwal, Amit; Gao, Kevin; Tucker-Kellogg, Greg

    2009-03-01

    Current data integration approaches by bioinformaticians frequently involve extracting data from a wide variety of public and private data repositories, each with a unique vocabulary and schema, via scripts. These separate data sets must then be normalized through the tedious and lengthy process of resolving naming differences and collecting information into a single view. Attempts to consolidate such diverse data using data warehouses or federated queries add significant complexity and have shown limitations in flexibility. The alternative of complete semantic integration of data requires a massive, sustained effort in mapping data types and maintaining ontologies. We focused instead on creating a data architecture that leverages semantic mapping of experimental metadata, to support the rapid prototyping of scientific discovery applications with the twin goals of reducing architectural complexity while still leveraging semantic technologies to provide flexibility, efficiency and more fully characterized data relationships. A metadata ontology was developed to describe our discovery process. A metadata repository was then created by mapping metadata from existing data sources into this ontology, generating RDF triples to describe the entities. Finally an interface to the repository was designed which provided not only search and browse capabilities but complex query templates that aggregate data from both RDF and RDBMS sources. We describe how this approach (i) allows scientists to discover and link relevant data across diverse data sources and (ii) provides a platform for development of integrative informatics applications.

  14. A metadata catalog for organization and systemization of fusion simulation data

    International Nuclear Information System (INIS)

    Greenwald, M.; Fredian, T.; Schissel, D.; Stillerman, J.

    2012-01-01

    Highlights: ► We find that modeling and simulation data need better systemization. ► Workflow, data provenance and relations among data items need to be captured. ► We have begun a design for a simulation metadata catalog that meets these needs. ► The catalog design also supports creation of science notebooks for simulation. - Abstract: Careful management of data and associated metadata is a critical part of any scientific enterprise. Unfortunately, most current fusion simulation efforts lack systematic, project-wide organization of their data. This paper describes an approach to managing simulation data through creation of a comprehensive metadata catalog, currently under development. The catalog is intended to document all past and current simulation activities (including data provenance); to enable global data location and to facilitate data access, analysis and visualization through uniform provision of metadata. The catalog will capture workflow, holding entries for each simulation activity including, at least, data importing and staging, data pre-processing and input preparation, code execution, data storage, post-processing and exporting. The overall aim is that between the catalog and the main data archive, the system would hold a complete and accessible description of the data, all of its attributes and the processes used to generate the data. The catalog will describe data collections, including those representing simulation workflows as well as any other useful groupings. Finally it would be populated with user supplied comments to explain the motivation and results of any activity documented by the catalog.

  15. There's Trouble in Paradise: Problems with Educational Metadata Encountered during the MALTED Project.

    Science.gov (United States)

    Monthienvichienchai, Rachada; Sasse, M. Angela; Wheeldon, Richard

    This paper investigates the usability of educational metadata schemas with respect to the case of the MALTED (Multimedia Authoring Language Teachers and Educational Developers) project at University College London (UCL). The project aims to facilitate authoring of multimedia materials for language learning by allowing teachers to share multimedia…

  16. Open Access Metadata, Catalogers, and Vendors: The Future of Cataloging Records

    Science.gov (United States)

    Flynn, Emily Alinder

    2013-01-01

    The open access (OA) movement is working to transform scholarly communication around the world, but this philosophy can also apply to metadata and cataloging records. While some notable, large academic libraries, such as Harvard University, the University of Michigan, and the University of Cambridge, released their cataloging records under OA…

  17. Mapping, Cross-walking, Converting and Exchanging Oceanographic Metadata Information in Video Data Management System

    Science.gov (United States)

    2010-06-01

    exploration and media products are preserved. Technical metadata reflecting the media format and size, codec type, frame size, encoding speed and...web-based portal from which diverse OER ocean data, including video, still image, and audio files will be accessible via text-driven searches or

  18. A Metadata Model for E-Learning Coordination through Semantic Web Languages

    Science.gov (United States)

    Elci, Atilla

    2005-01-01

    This paper reports on a study aiming to develop a metadata model for e-learning coordination based on semantic web languages. A survey of e-learning modes are done initially in order to identify content such as phases, activities, data schema, rules and relations, etc. relevant for a coordination model. In this respect, the study looks into the…

  19. Towards more transparent and reproducible omics studies through a common metadata checklist and data publications

    Energy Technology Data Exchange (ETDEWEB)

    Kolker, Eugene; Ozdemir, Vural; Martens , Lennart; Hancock, William S.; Anderson, Gordon A.; Anderson, Nathaniel; Aynacioglu, Sukru; Baranova, Ancha; Campagna, Shawn R.; Chen, Rui; Choiniere, John; Dearth, Stephen P.; Feng, Wu-Chun; Ferguson, Lynnette; Fox, Geoffrey; Frishman, Dmitrij; Grossman, Robert; Heath, Allison; Higdon, Roger; Hutz, Mara; Janko, Imre; Jiang, Lihua; Joshi, Sanjay; Kel, Alexander; Kemnitz, Joseph W.; Kohane, Isaac; Kolker, Natali; Lancet, Doron; Lee, Elaine; Li, Weizhong; Lisitsa, Andrey; Llerena, Adrian; MacNealy-Koch, Courtney; Marhsall, Jean-Claude; Masuzzo, Paolo; May, Amanda; Mias, George; Monroe, Matthew E.; Montague, Elizabeth; Monney, Sean; Nesvizhskii, Alexey; Noronha, Santosh; Omenn, Gilbert; Rajasimha, Harsha; Ramamoorthy, Preveen; Sheehan, Jerry; Smarr, Larry; Smith, Charles V.; Smith, Todd; Snyder, Michael; Rapole, Srikanth; Srivastava, Sanjeeva; Stanberry, Larissa; Stewart, Elizabeth; Toppo, Stefano; Uetz, Peter; Verheggen, Kenneth; Voy, Brynn H.; Warnich, Louise; Wilhelm, Steven W.; Yandl, Gregory

    2014-01-01

    Biological processes are fundamentally driven by complex interactions between biomolecules. Integrated high-throughput omics studies enable multifaceted views of cells, organisms, or their communities. With the advent of new post-genomics technologies omics studies are becoming increasingly prevalent yet the full impact of these studies can only be realized through data harmonization, sharing, meta-analysis, and integrated research,. These three essential steps require consistent generation, capture, and distribution of the metadata. To ensure transparency, facilitate data harmonization, and maximize reproducibility and usability of life sciences studies, we propose a simple common omics metadata checklist. The proposed checklist is built on the rich ontologies and standards already in use by the life sciences community. The checklist will serve as a common denominator to guide experimental design, capture important parameters, and be used as a standard format for stand-alone data publications. This omics metadata checklist and data publications will create efficient linkages between omics data and knowledge-based life sciences innovation and importantly, allow for appropriate attribution to data generators and infrastructure science builders in the post-genomics era. We ask that the life sciences community test the proposed omics metadata checklist and data publications and provide feedback for their use and improvement.

  20. Placing Music Artists and Songs in Time Using Editorial Metadata and Web Mining Techniques

    NARCIS (Netherlands)

    Bountouridis, D.; Veltkamp, R.C.; Balen, J.M.H. van

    2013-01-01

    This paper investigates the novel task of situating music artists and songs in time, thereby adding contextual information that typically correlates with an artist’s similarities, collaborations and influences. The proposed method makes use of editorial metadata in conjunction with web mining

  1. That obscure object of desire: multimedia metadata on the Web, part 2

    NARCIS (Netherlands)

    F.-M. Nack (Frank); J.R. van Ossenbruggen (Jacco); L. Hardman (Lynda)

    2003-01-01

    textabstractThis article discusses the state of the art in metadata for audio-visual media in large semantic networks, such as the Semantic Web. Our discussion is predominantly motivated by the two most widely known approaches towards machine-processable and semantic-based content description,

  2. Advancements in Large-Scale Data/Metadata Management for Scientific Data.

    Science.gov (United States)

    Guntupally, K.; Devarakonda, R.; Palanisamy, G.; Frame, M. T.

    2017-12-01

    Scientific data often comes with complex and diverse metadata which are critical for data discovery and users. The Online Metadata Editor (OME) tool, which was developed by an Oak Ridge National Laboratory team, effectively manages diverse scientific datasets across several federal data centers, such as DOE's Atmospheric Radiation Measurement (ARM) Data Center and USGS's Core Science Analytics, Synthesis, and Libraries (CSAS&L) project. This presentation will focus mainly on recent developments and future strategies for refining OME tool within these centers. The ARM OME is a standard based tool (https://www.archive.arm.gov/armome) that allows scientists to create and maintain metadata about their data products. The tool has been improved with new workflows that help metadata coordinators and submitting investigators to submit and review their data more efficiently. The ARM Data Center's newly upgraded Data Discovery Tool (http://www.archive.arm.gov/discovery) uses rich metadata generated by the OME to enable search and discovery of thousands of datasets, while also providing a citation generator and modern order-delivery techniques like Globus (using GridFTP), Dropbox and THREDDS. The Data Discovery Tool also supports incremental indexing, which allows users to find new data as and when they are added. The USGS CSAS&L search catalog employs a custom version of the OME (https://www1.usgs.gov/csas/ome), which has been upgraded with high-level Federal Geographic Data Committee (FGDC) validations and the ability to reserve and mint Digital Object Identifiers (DOIs). The USGS's Science Data Catalog (SDC) (https://data.usgs.gov/datacatalog) allows users to discover a myriad of science data holdings through a web portal. Recent major upgrades to the SDC and ARM Data Discovery Tool include improved harvesting performance and migration using new search software, such as Apache Solr 6.0 for serving up data/metadata to scientific communities. Our presentation will highlight

  3. Towards a preservation framework for Spatial Data Infrastructures: a metadata profile approach

    Science.gov (United States)

    Shaon, Arif; Woolf, Andrew

    2010-05-01

    , to apply existing widely adopted preservation mechanisms and standards, such as the Open Archival Information System (OAIS) reference model (a wide adopted ISO standard for digital preservation) to the long-term preservation of geospatial data. And Web services technologies enable harmonized and interoperable accessibility of data, which may be useful for developing effective long-term preservation approaches for geospatial data based on the existing approaches. This talk discusses the applicability of the OAIS reference model to develop a preservation-aware Spatial Data Infrastructure (SDI) and presents a preservation-profile of ISO 19115 metadata profile for supporting such a SDI. Notably, the work presented may be seen as a preliminary effort for the European Space Agency (ESA) Long-term Digital Preservation (LTDP) initiative that aims to formulate a coordinated and coherent approach to the long-term preservation of the EO space data archives across different member states of the European Union.

  4. Implementing the Army NetCentric Data Strategy in a ServiceOriented Environment

    Science.gov (United States)

    2009-04-23

    Data Discovery Artifact Discovery n Resource Deployment Metadata Registration e n t n a t a D i s c o v e r y n D a t a S e r v i c e s...Governance Services Data Discovery & Access Shared Services Metadata Profiles Query Profiles t Profiles t Profiles Task Profiles Transfor m...Profiles Data   Discovery  & Access • The  Data   Discovery  and Access is a family of services 

  5. Scalable Metadata Management for a Large Multi-Source Seismic Data Repository

    Energy Technology Data Exchange (ETDEWEB)

    Gaylord, J. M. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Dodge, D. A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Magana-Zook, S. A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Barno, J. G. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Knapp, D. R. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2017-04-11

    In this work, we implemented the key metadata management components of a scalable seismic data ingestion framework to address limitations in our existing system, and to position it for anticipated growth in volume and complexity. We began the effort with an assessment of open source data flow tools from the Hadoop ecosystem. We then began the construction of a layered architecture that is specifically designed to address many of the scalability and data quality issues we experience with our current pipeline. This included implementing basic functionality in each of the layers, such as establishing a data lake, designing a unified metadata schema, tracking provenance, and calculating data quality metrics. Our original intent was to test and validate the new ingestion framework with data from a large-scale field deployment in a temporary network. This delivered somewhat unsatisfying results, since the new system immediately identified fatal flaws in the data relatively early in the pipeline. Although this is a correct result it did not allow us to sufficiently exercise the whole framework. We then widened our scope to process all available metadata from over a dozen online seismic data sources to further test the implementation and validate the design. This experiment also uncovered a higher than expected frequency of certain types of metadata issues that challenged us to further tune our data management strategy to handle them. Our result from this project is a greatly improved understanding of real world data issues, a validated design, and prototype implementations of major components of an eventual production framework. This successfully forms the basis of future development for the Geophysical Monitoring Program data pipeline, which is a critical asset supporting multiple programs. It also positions us very well to deliver valuable metadata management expertise to our sponsors, and has already resulted in an NNSA Office of Defense Nuclear Nonproliferation

  6. Exploring NASA GES DISC Data with Interoperable Services

    Science.gov (United States)

    Zhao, Peisheng; Yang, Wenli; Hegde, Mahabal; Wei, Jennifer C.; Kempler, Steven; Pham, Long; Teng, William; Savtchenko, Andrey

    2015-01-01

    Overview of NASA GES DISC (NASA Goddard Earth Science Data and Information Services Center) data with interoperable services: Open-standard and Interoperable Services Improve data discoverability, accessibility, and usability with metadata, catalogue and portal standards Achieve data, information and knowledge sharing across applications with standardized interfaces and protocols Open Geospatial Consortium (OGC) Data Services and Specifications Web Coverage Service (WCS) -- data Web Map Service (WMS) -- pictures of data Web Map Tile Service (WMTS) --- pictures of data tiles Styled Layer Descriptors (SLD) --- rendered styles.

  7. Developing a Common Metadata Model for Competencies Description

    NARCIS (Netherlands)

    Sampson, Demetrios; Karampiperis, Pythagoras; Fytros, Demetrios

    2007-01-01

    Competence-based approaches are frequently adopted as the key paradigm in both formal or non-formal education and training. To support the provision of competence-based learning services, it is necessary to be able to maintain a record of an individual’s competences in a persistent and standard way.

  8. Context-Adaptive Learning Designs by Using Semantic Web Services

    Science.gov (United States)

    Dietze, Stefan; Gugliotta, Alessio; Domingue, John

    2007-01-01

    IMS Learning Design (IMS-LD) is a promising technology aimed at supporting learning processes. IMS-LD packages contain the learning process metadata as well as the learning resources. However, the allocation of resources--whether data or services--within the learning design is done manually at design-time on the basis of the subjective appraisals…

  9. File-Metadata Management System for the LHCb Experiment

    CERN Document Server

    Cioffi, C

    2004-01-01

    The LHCb experiment needs to store all the information about the datasets and their processing history of recorded data resulting from particle collisions at the LHC collider at CERN as well as of simulated data. To achieve this functionality a design based on data warehousing techniques was chosen, where several user-services can be implemented and optimized individually without losing functionality nor performance. This approach results in an experiment- independent and flexible system. It allows fast access to the catalogue of available data, to detailed history information and to the catalogue of data replicas. Queries can be made based on these three sets of information. A flexible underlying database schema allows the implementation and evolution of these services without the need to change the basic database schema. The consequent implementation of interfaces based on XML-RPC allows to access and to modify the stored information using a well defined encapsulating API.

  10. Unified Science Information Model for SoilSCAPE using the Mercury Metadata Search System

    Science.gov (United States)

    Devarakonda, Ranjeet; Lu, Kefa; Palanisamy, Giri; Cook, Robert; Santhana Vannan, Suresh; Moghaddam, Mahta Clewley, Dan; Silva, Agnelo; Akbar, Ruzbeh

    2013-12-01

    SoilSCAPE (Soil moisture Sensing Controller And oPtimal Estimator) introduces a new concept for a smart wireless sensor web technology for optimal measurements of surface-to-depth profiles of soil moisture using in-situ sensors. The objective is to enable a guided and adaptive sampling strategy for the in-situ sensor network to meet the measurement validation objectives of spaceborne soil moisture sensors such as the Soil Moisture Active Passive (SMAP) mission. This work is being carried out at the University of Michigan, the Massachusetts Institute of Technology, University of Southern California, and Oak Ridge National Laboratory. At Oak Ridge National Laboratory we are using Mercury metadata search system [1] for building a Unified Information System for the SoilSCAPE project. This unified portal primarily comprises three key pieces: Distributed Search/Discovery; Data Collections and Integration; and Data Dissemination. Mercury, a Federally funded software for metadata harvesting, indexing, and searching would be used for this module. Soil moisture data sources identified as part of this activity such as SoilSCAPE and FLUXNET (in-situ sensors), AirMOSS (airborne retrieval), SMAP (spaceborne retrieval), and are being indexed and maintained by Mercury. Mercury would be the central repository of data sources for cal/val for soil moisture studies and would provide a mechanism to identify additional data sources. Relevant metadata from existing inventories such as ORNL DAAC, USGS Clearinghouse, ARM, NASA ECHO, GCMD etc. would be brought in to this soil-moisture data search/discovery module. The SoilSCAPE [2] metadata records will also be published in broader metadata repositories such as GCMD, data.gov. Mercury can be configured to provide a single portal to soil moisture information contained in disparate data management systems located anywhere on the Internet. Mercury is able to extract, metadata systematically from HTML pages or XML files using a variety of

  11. Engaging a community towards marine cyberinfrastructure: Lessons Learned from The Marine Metadata Interoperability initiative

    Science.gov (United States)

    Galbraith, N. R.; Graybeal, J.; Bermudez, L. E.; Wright, D.

    2005-12-01

    The Marine Metadata Interoperability (MMI) initiative promotes the exchange, integration and use of marine data through enhanced data publishing, discovery, documentation and accessibility. The project, operating since late 2004, presents several cultural organizational challenges because of the diversity of participants: scientists, technical experts, and data managers from around the world, all working in organizations with different corporate cultures, funding structures, and systems of decision-making. MMI provides educational resources at several levels. For instance, short introductions to metadata concepts are available, as well as guides and "cookbooks" for the quick and efficient preparation of marine metadata. For those who are building major marine data systems, including ocean-observing capabilities, there are training materials, marine metadata content examples, and resources for mapping elements between different metadata standards. The MMI also provides examples of good metadata practices in existing data systems, including the EU's Marine XML project, and functioning ocean/coastal clearinghouses and atlases developed by MMI team members. Communication tools that help build community: 1) Website, used to introduce the initiative to new visitors, and to provide in-depth guidance and resources to members and visitors. The site is built using Plone, an open source web content management system. Plone allows the site to serve as a wiki, to which every user can contribute material. This keeps the membership engaged and spreads the responsibility for the tasks of updating and expanding the site. 2) Email-lists, to engage the broad ocean sciences community. The discussion forums "news," "ask," and "site-help" are available for receiving regular updates on MMI activities, seeking advice or support on projects and standards, or for assistance with using the MMI site. Internal email lists are provided for the Technical Team, the Steering Committee and

  12. Combined use of semantics and metadata to manage Research Data Life Cycle in Environmental Sciences

    Science.gov (United States)

    Aguilar Gómez, Fernando; de Lucas, Jesús Marco; Pertinez, Esther; Palacio, Aida

    2017-04-01

    The use of metadata to contextualize datasets is quite extended in Earth System Sciences. There are some initiatives and available tools to help data managers to choose the best metadata standard that fit their use cases, like the DCC Metadata Directory (http://www.dcc.ac.uk/resources/metadata-standards). In our use case, we have been gathering physical, chemical and biological data from a water reservoir since 2010. A well metadata definition is crucial not only to contextualize our own data but also to integrate datasets from other sources like satellites or meteorological agencies. That is why we have chosen EML (Ecological Metadata Language), which integrates many different elements to define a dataset, including the project context, instrumentation and parameters definition, and the software used to process, provide quality controls and include the publication details. Those metadata elements can contribute to help both human and machines to understand and process the dataset. However, the use of metadata is not enough to fully support the data life cycle, from the Data Management Plan definition to the Publication and Re-use. To do so, we need to define not only metadata and attributes but also the relationships between them, so semantics are needed. Ontologies, being a knowledge representation, can contribute to define the elements of a research data life cycle, including DMP, datasets, software, etc. They also can define how the different elements are related between them and how they interact. The first advantage of developing an ontology of a knowledge domain is that they provide a common vocabulary hierarchy (i.e. a conceptual schema) that can be used and standardized by all the agents interested in the domain (either humans or machines). This way of using ontologies is one of the basis of the Semantic Web, where ontologies are set to play a key role in establishing a common terminology between agents. To develop an ontology we are using a graphical tool

  13. Biomedical word sense disambiguation with ontologies and metadata: automation meets accuracy

    Directory of Open Access Journals (Sweden)

    Hakenberg Jörg

    2009-01-01

    Full Text Available Abstract Background Ontology term labels can be ambiguous and have multiple senses. While this is no problem for human annotators, it is a challenge to automated methods, which identify ontology terms in text. Classical approaches to word sense disambiguation use co-occurring words or terms. However, most treat ontologies as simple terminologies, without making use of the ontology structure or the semantic similarity between terms. Another useful source of information for disambiguation are metadata. Here, we systematically compare three approaches to word sense disambiguation, which use ontologies and metadata, respectively. Results The 'Closest Sense' method assumes that the ontology defines multiple senses of the term. It computes the shortest path of co-occurring terms in the document to one of these senses. The 'Term Cooc' method defines a log-odds ratio for co-occurring terms including co-occurrences inferred from the ontology structure. The 'MetaData' approach trains a classifier on metadata. It does not require any ontology, but requires training data, which the other methods do not. To evaluate these approaches we defined a manually curated training corpus of 2600 documents for seven ambiguous terms from the Gene Ontology and MeSH. All approaches over all conditions achieve 80% success rate on average. The 'MetaData' approach performed best with 96%, when trained on high-quality data. Its performance deteriorates as quality of the training data decreases. The 'Term Cooc' approach performs better on Gene Ontology (92% success than on MeSH (73% success as MeSH is not a strict is-a/part-of, but rather a loose is-related-to hierarchy. The 'Closest Sense' approach achieves on average 80% success rate. Conclusion Metadata is valuable for disambiguation, but requires high quality training data. Closest Sense requires no training, but a large, consistently modelled ontology, which are two opposing conditions. Term Cooc achieves greater 90

  14. Using Semantic Web technologies for the generation of domain-specific templates to support clinical study metadata standards.

    Science.gov (United States)

    Jiang, Guoqian; Evans, Julie; Endle, Cory M; Solbrig, Harold R; Chute, Christopher G

    2016-01-01

    The Biomedical Research Integrated Domain Group (BRIDG) model is a formal domain analysis model for protocol-driven biomedical research, and serves as a semantic foundation for application and message development in the standards developing organizations (SDOs). The increasing sophistication and complexity of the BRIDG model requires new approaches to the management and utilization of the underlying semantics to harmonize domain-specific standards. The objective of this study is to develop and evaluate a Semantic Web-based approach that integrates the BRIDG model with ISO 21090 data types to generate domain-specific templates to support clinical study metadata standards development. We developed a template generation and visualization system based on an open source Resource Description Framework (RDF) store backend, a SmartGWT-based web user interface, and a "mind map" based tool for the visualization of generated domain-specific templates. We also developed a RESTful Web Service informed by the Clinical Information Modeling Initiative (CIMI) reference model for access to the generated domain-specific templates. A preliminary usability study is performed and all reviewers (n = 3) had very positive responses for the evaluation questions in terms of the usability and the capability of meeting the system requirements (with the average score of 4.6). Semantic Web technologies provide a scalable infrastructure and have great potential to enable computable semantic interoperability of models in the intersection of health care and clinical research.

  15. The ISS SOLAR payload data preservation in the frame of the PERICLES FP-7 project: metadata aspects.

    Science.gov (United States)

    Muller, Christian; Pandey, Praveen

    2016-04-01

    PERICLES (Promoting and Enhancing the Reuse of Information throughout the Content Lifecycle exploiting Evolving Semantics) is an FP7 project started on February 2013. It aims at preserving by design large and complex data sets. PERICLES is coordinated by King's College London, UK and its partners are University of Borås (Sweden), CERTH- ITI (Greece), DotSoft (Greece), Georg-August-Universität Göttingen (Germany), University of Liverpool (UK), Space Application Services (Belgium), XEROX France and University of Edinburgh (UK). Two additional partners provide the two case studies: Tate Gallery (UK) brings the digital art and media case study and B.USOC (Belgian Users Support and Operations Centre) brings the space science case study . PERICLES addresses the life-cycle of large and complex data sets in order to cater for the evolution of context of data sets and user communities, including groups unanticipated when the data was created. Semantics of data sets are thus also expected to evolve and the project includes elements which could address the reuse of data sets at periods where the data providers and even their institutions are not available any more. PERICLES uses the Linked Resources Model (LRM) which will be compared with the OAIS standard. In this study we present the space science case associated with PERICLES. B.USOC supports experiments on the International Space Station and is the curator of the collected data and operations history. B.USOC has chosen to analyse the SOLAR payload flying since 2008 on the ESA COLUMBUS module of the ISS as the PERICLES prime space science case. Solar observation data are prime candidates for long term data preservation as variabilities of the solar spectral irradiance have an influence on earth climate. The nature of the data to be preserved for the reuse of the current SOLAR series is much more extended than a simple set of time tagged tables of spectral irradiances, it is an important inventory of more than 50 classes

  16. The Genomic Observatories Metadatabase (GeOMe): A new repository for field and sampling event metadata associated with genetic samples

    Science.gov (United States)

    Deck, John; Gaither, Michelle R.; Ewing, Rodney; Bird, Christopher E.; Davies, Neil; Meyer, Christopher; Riginos, Cynthia; Toonen, Robert J.; Crandall, Eric D.

    2017-01-01

    The Genomic Observatories Metadatabase (GeOMe, http://www.geome-db.org/) is an open access repository for geographic and ecological metadata associated with biosamples and genetic data. Whereas public databases have served as vital repositories for nucleotide sequences, they do not accession all the metadata required for ecological or evolutionary analyses. GeOMe fills this need, providing a user-friendly, web-based interface for both data contributors and data recipients. The interface allows data contributors to create a customized yet standard-compliant spreadsheet that captures the temporal and geospatial context of each biosample. These metadata are then validated and permanently linked to archived genetic data stored in the National Center for Biotechnology Information’s (NCBI’s) Sequence Read Archive (SRA) via unique persistent identifiers. By linking ecologically and evolutionarily relevant metadata with publicly archived sequence data in a structured manner, GeOMe sets a gold standard for data management in biodiversity science. PMID:28771471

  17. The Genomic Observatories Metadatabase (GeOMe: A new repository for field and sampling event metadata associated with genetic samples.

    Directory of Open Access Journals (Sweden)

    John Deck

    2017-08-01

    Full Text Available The Genomic Observatories Metadatabase (GeOMe, http://www.geome-db.org/ is an open access repository for geographic and ecological metadata associated with biosamples and genetic data. Whereas public databases have served as vital repositories for nucleotide sequences, they do not accession all the metadata required for ecological or evolutionary analyses. GeOMe fills this need, providing a user-friendly, web-based interface for both data contributors and data recipients. The interface allows data contributors to create a customized yet standard-compliant spreadsheet that captures the temporal and geospatial context of each biosample. These metadata are then validated and permanently linked to archived genetic data stored in the National Center for Biotechnology Information's (NCBI's Sequence Read Archive (SRA via unique persistent identifiers. By linking ecologically and evolutionarily relevant metadata with publicly archived sequence data in a structured manner, GeOMe sets a gold standard for data management in biodiversity science.

  18. Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Metadata, Usage Metrics, and User Feedback to Improve Data Discovery and Access

    Data.gov (United States)

    National Aeronautics and Space Administration — We propose to mine and utilize the combination of Earth Science dataset, metadata with usage metrics and user feedback to objectively extract relevance for improved...

  19. Assuring the Quality of Agricultural Learning Repositories: Issues for the Learning Object Metadata Creation Process of the CGIAR

    Science.gov (United States)

    Zschocke, Thomas; Beniest, Jan

    The Consultative Group on International Agricultural Re- search (CGIAR) has established a digital repository to share its teaching and learning resources along with descriptive educational information based on the IEEE Learning Object Metadata (LOM) standard. As a critical component of any digital repository, quality metadata are critical not only to enable users to find more easily the resources they require, but also for the operation and interoperability of the repository itself. Studies show that repositories have difficulties in obtaining good quality metadata from their contributors, especially when this process involves many different stakeholders as is the case with the CGIAR as an international organization. To address this issue the CGIAR began investigating the Open ECBCheck as well as the ISO/IEC 19796-1 standard to establish quality protocols for its training. The paper highlights the implications and challenges posed by strengthening the metadata creation workflow for disseminating learning objects of the CGIAR.

  20. Compact graphical representation of phylogenetic data and metadata with GraPhlAn

    Directory of Open Access Journals (Sweden)

    Francesco Asnicar

    2015-06-01

    Full Text Available The increased availability of genomic and metagenomic data poses challenges at multiple analysis levels, including visualization of very large-scale microbial and microbial community data paired with rich metadata. We developed GraPhlAn (Graphical Phylogenetic Analysis, a computational tool that produces high-quality, compact visualizations of microbial genomes and metagenomes. This includes phylogenies spanning up to thousands of taxa, annotated with metadata ranging from microbial community abundances to microbial physiology or host and environmental phenotypes. GraPhlAn has been developed as an open-source command-driven tool in order to be easily integrated into complex, publication-quality bioinformatics pipelines. It can be executed either locally or through an online Galaxy web application. We present several examples including taxonomic and phylogenetic visualization of microbial communities, metabolic functions, and biomarker discovery that illustrate GraPhlAn’s potential for modern microbial and community genomics.

  1. Implementation of a metadata architecture and knowledge collection to support semantic interoperability in an enterprise data warehouse.

    Science.gov (United States)

    Dhaval, Rakesh; Borlawsky, Tara; Ostrander, Michael; Santangelo, Jennifer; Kamal, Jyoti; Payne, Philip R O

    2008-11-06

    In order to enhance interoperability between enterprise systems, and improve data validity and reliability throughout The Ohio State University Medical Center (OSUMC), we have initiated the development of an ontology-anchored metadata architecture and knowledge collection for our enterprise data warehouse. The metadata and corresponding semantic relationships stored in the OSUMC knowledge collection are intended to promote consistency and interoperability across the heterogeneous clinical, research, business and education information managed within the data warehouse.

  2. Benefits of Record Management For Scientific Writing (Study of Metadata Reception of Zotero Reference Management Software in UIN Malang

    Directory of Open Access Journals (Sweden)

    Moch Fikriansyah Wicaksono

    2018-01-01

    Full Text Available Record creation and management by individuals or organizations grows rapidly, particularly the change from print to electronics, and the smallest part of record (metadata. Therefore, there is a need to perform record management metadata, particularly for students who have the needs of recording references and citation. Reference management software (RMS is a software to help reference management, one of them named zotero. The purpose of this article is to describe the benefits of record management for the writing of scientific papers for students, especially on biology study program in UIN Malik Ibrahim Malang. The type of research used is descriptive with quantitative approach. To increase the depth of respondents' answers, we used additional data by conducting interviews. The selected population is 322 students, class of 2012 to 2014, using random sampling. The selection criteria were chosen because the introduction and use of reference management software, zotero have started since three years ago.  Respondents in this study as many as 80 people, which is obtained from the formula Yamane. The results showed that 70% agreed that using reference management software saved time and energy in managing digital file metadata, 71% agreed that if digital metadata can be quickly stored into RMS, 65% agreed on the ease of storing metadata into the reference management software, 70% agreed when it was easy to configure metadata to quote and bibliography, 56.6% agreed that the metadata stored in reference management software could be edited, 73.8% agreed that using metadata will make it easier to write quotes and bibliography.

  3. Using Google Tag Manager and Google Analytics to track DSpace metadata fields as custom dimensions

    OpenAIRE

    Suzanna Conrad

    2015-01-01

    DSpace can be problematic for those interested in tracking download and pageview statistics granularly. Some libraries have implemented code to track events on websites and some have experimented with using Google Tag Manager to automate event tagging in DSpace. While these approaches make it possible to track download statistics, granular details such as authors, content types, titles, advisors, and other fields for which metadata exist are generally not tracked in DSpace or Google Analytics...

  4. Autonomous Underwater Vehicle Data Management and Metadata Interoperability for Coastal Ocean Studies

    Science.gov (United States)

    McCann, M. P.; Ryan, J. P.; Chavez, F. P.; Rienecker, E.

    2004-12-01

    Data from over 1000 km of Autonomous Underwater Vehicle (AUV) surveys of Monterey Bay have been collected and cataloged in an ocean observatory data management system. The Monterey Bay Aquarium Institute's AUV is equipped with a suite of instruments that include a conductivity, temperature, depth (CTD) instrument, transmissometers, a fluorometer, a nitrate sensor, and an inertial navigation system. Data are logged on the vehicle and upon completion of a survey XML descriptions of the data are submitted to the Shore Side Data System (SSDS). Instrument data are then processed on shore to apply calibrations and produce scientifically useful data products. The SSDS employs a data model that tracks data from the instrument that created it through all the consuming processes that generate derived products. SSDS employs OPeNDAP and netCDF to provide data set interoperability at the data level. The core of SSDS is the metadata that is the catalog of these data sets and their relation to all other relevant data. The metadata is managed in a relational database and governed by a Enterprise Java Bean (EJB) server application. Cross-platform Java applications have been written to manage and visualize these data. A Java Swing application - the Hierarchical Ocean Observatory Visualization and Editing System (HOOVES) - has been developed to provide visualization of data set pedigree and data set variables. Because the SSDS data model is generalized according to "Data Producers" and "Data Containers" many different types of data can be represented in SSDS allowing for interoperability at a metadata level. Comparisons of appropriate data sets, whether they are from an autonomous underwater vehicle or from a fixed mooring are easily made using SSDS. The authors will present the SSDS data model and show examples of how the model helps organize data set metadata allowing for data discovery and interoperability. With improved discovery and interoperability the system is helping us

  5. ETICS meta-data software editing - from check out to commit operations

    International Nuclear Information System (INIS)

    Begin, M-E; Sancho, G D-A; Ronco, S D; Gentilini, M; Ronchieri, E; Selmi, M

    2008-01-01

    People involved in modular projects need to improve the build software process, planning the correct execution order and detecting circular dependencies. The lack of suitable tools may cause delays in the development, deployment and maintenance of the software. Experience in such projects has shown that the use of version control and build systems is not able to support the development of the software efficiently, due to a large number of errors each of which causes the breaking of the build process. Common causes of errors are for example the adoption of new libraries, libraries incompatibility, the extension of the current project in order to support new software modules. In this paper, we describe a possible solution implemented in ETICS, an integrated infrastructure for the automated configuration, build and test of Grid and distributed software. ETICS has defined meta-data software abstractions, from which it is possible to download, build and test software projects, setting for instance dependencies, environment variables and properties. Furthermore, the meta-data information is managed by ETICS reflecting the version control system philosophy, because of the existence of a meta-data repository and the handling of a list of operations, such as check out and commit. All the information related to a specific software are stored in the repository only when they are considered to be correct. By means of this solution, we introduce a sort of flexibility inside the ETICS system, allowing users to work accordingly to their needs. Moreover, by introducing this functionality, ETICS will be a version control system like for the management of the meta-data

  6. Unique in the shopping mall: On the reidentifiability of credit card metadata

    DEFF Research Database (Denmark)

    de Montjoye, Yves-Alexandre; Radaelli, Laura; Singh, Vivek Kumar

    2015-01-01

    . We study 3 months of credit card records for 1.1 million people and show that four spatiotemporal points are enough to uniquely reidentify 90% of individuals. We show that knowing the price of a transaction increases the risk of reidentification by 22%, on average. Finally, we show that even data...... sets that provide coarse information at any or all of the dimensions provide little anonymity and that women are more reidentifiable than men in credit card metadata....

  7. Online information services in the social sciences

    CERN Document Server

    Jacobs, Neil

    2004-01-01

    Information professionals are increasingly responsible not only for running traditional information and library services but also for providing an online presence for their organisation. This book shows how best practice in delivering online information services should be based on actual user needs and behaviour. A series of case studies provide real life examples of how social science information is being used in the community. The book then draws on these case studies to outline the main issues facing service providers: such as usability, metadata and management. The book concludes with a lo

  8. Asymmetric Programming: A Highly Reliable Metadata Allocation Strategy for MLC NAND Flash Memory-Based Sensor Systems

    Directory of Open Access Journals (Sweden)

    Min Huang

    2014-10-01

    Full Text Available While the NAND flash memory is widely used as the storage medium in modern sensor systems, the aggressive shrinking of process geometry and an increase in the number of bits stored in each memory cell will inevitably degrade the reliability of NAND flash memory. In particular, it’s critical to enhance metadata reliability, which occupies only a small portion of the storage space, but maintains the critical information of the file system and the address translations of the storage system. Metadata damage will cause the system to crash or a large amount of data to be lost. This paper presents Asymmetric Programming, a highly reliable metadata allocation strategy for MLC NAND flash memory storage systems. Our technique exploits for the first time the property of the multi-page architecture of MLC NAND flash memory to improve the reliability of metadata. The basic idea is to keep metadata in most significant bit (MSB pages which are more reliable than least significant bit (LSB pages. Thus, we can achieve relatively low bit error rates for metadata. Based on this idea, we propose two strategies to optimize address mapping and garbage collection. We have implemented Asymmetric Programming on a real hardware platform. The experimental results show that Asymmetric Programming can achieve a reduction in the number of page errors of up to 99.05% with the baseline error correction scheme.

  9. Asymmetric programming: a highly reliable metadata allocation strategy for MLC NAND flash memory-based sensor systems.

    Science.gov (United States)

    Huang, Min; Liu, Zhaoqing; Qiao, Liyan

    2014-10-10

    While the NAND flash memory is widely used as the storage medium in modern sensor systems, the aggressive shrinking of process geometry and an increase in the number of bits stored in each memory cell will inevitably degrade the reliability of NAND flash memory. In particular, it's critical to enhance metadata reliability, which occupies only a small portion of the storage space, but maintains the critical information of the file system and the address translations of the storage system. Metadata damage will cause the system to crash or a large amount of data to be lost. This paper presents Asymmetric Programming, a highly reliable metadata allocation strategy for MLC NAND flash memory storage systems. Our technique exploits for the first time the property of the multi-page architecture of MLC NAND flash memory to improve the reliability of metadata. The basic idea is to keep metadata in most significant bit (MSB) pages which are more reliable than least significant bit (LSB) pages. Thus, we can achieve relatively low bit error rates for metadata. Based on this idea, we propose two strategies to optimize address mapping and garbage collection. We have implemented Asymmetric Programming on a real hardware platform. The experimental results show that Asymmetric Programming can achieve a reduction in the number of page errors of up to 99.05% with the baseline error correction scheme.

  10. Asymmetric Programming: A Highly Reliable Metadata Allocation Strategy for MLC NAND Flash Memory-Based Sensor Systems

    Science.gov (United States)

    Huang, Min; Liu, Zhaoqing; Qiao, Liyan

    2014-01-01

    While the NAND flash memory is widely used as the storage medium in modern sensor systems, the aggressive shrinking of process geometry and an increase in the number of bits stored in each memory cell will inevitably degrade the reliability of NAND flash memory. In particular, it's critical to enhance metadata reliability, which occupies only a small portion of the storage space, but maintains the critical information of the file system and the address translations of the storage system. Metadata damage will cause the system to crash or a large amount of data to be lost. This paper presents Asymmetric Programming, a highly reliable metadata allocation strategy for MLC NAND flash memory storage systems. Our technique exploits for the first time the property of the multi-page architecture of MLC NAND flash memory to improve the reliability of metadata. The basic idea is to keep metadata in most significant bit (MSB) pages which are more reliable than least significant bit (LSB) pages. Thus, we can achieve relatively low bit error rates for metadata. Based on this idea, we propose two strategies to optimize address mapping and garbage collection. We have implemented Asymmetric Programming on a real hardware platform. The experimental results show that Asymmetric Programming can achieve a reduction in the number of page errors of up to 99.05% with the baseline error correction scheme. PMID:25310473

  11. Metadata and Tools for Integration and Preservation of Cultural Heritage 3D Information

    Directory of Open Access Journals (Sweden)

    Achille Felicetti

    2011-12-01

    Full Text Available In this paper we investigate many of the various storage, portability and interoperability issues arising among archaeologists and cultural heritage people when dealing with 3D technologies. On the one side, the available digital repositories look often unable to guarantee affordable features in the management of 3D models and their metadata; on the other side the nature of most of the available data format for 3D encoding seem to be not satisfactory for the necessary portability required nowadays by 3D information across different systems. We propose a set of possible solutions to show how integration can be achieved through the use of well known and wide accepted standards for data encoding and data storage. Using a set of 3D models acquired during various archaeological campaigns and a number of open source tools, we have implemented a straightforward encoding process to generate meaningful semantic data and metadata. We will also present the interoperability process carried out to integrate the encoded 3D models and the geographic features produced by the archaeologists. Finally we will report the preliminary (rather encouraging development of a semantic enabled and persistent digital repository, where 3D models (but also any kind of digital data and metadata can easily be stored, retrieved and shared with the content of other digital archives.

  12. Java Library for Input and Output of Image Data and Metadata

    Science.gov (United States)

    Deen, Robert; Levoe, Steven

    2003-01-01

    A Java-language library supports input and output (I/O) of image data and metadata (label data) in the format of the Video Image Communication and Retrieval (VICAR) image-processing software and in several similar formats, including a subset of the Planetary Data System (PDS) image file format. The library does the following: It provides low-level, direct access layer, enabling an application subprogram to read and write specific image files, lines, or pixels, and manipulate metadata directly. Two coding/decoding subprograms ("codecs" for short) based on the Java Advanced Imaging (JAI) software provide access to VICAR and PDS images in a file-format-independent manner. The VICAR and PDS codecs enable any program that conforms to the specification of the JAI codec to use VICAR or PDS images automatically, without specific knowledge of the VICAR or PDS format. The library also includes Image I/O plugin subprograms for VICAR and PDS formats. Application programs that conform to the Image I/O specification of Java version 1.4 can utilize any image format for which such a plug-in subprogram exists, without specific knowledge of the format itself. Like the aforementioned codecs, the VICAR and PDS Image I/O plug-in subprograms support reading and writing of metadata.

  13. Using a Core Scientific Metadata Model in Large-Scale Facilities

    Directory of Open Access Journals (Sweden)

    Brian Matthews

    2010-07-01

    Full Text Available In this paper, we present the Core Scientific Metadata Model (CSMD, a model for the representation of scientific study metadata developed within the Science & Technology Facilities Council (STFC to represent the data generated from scientific facilities. The model has been developed to allow management of and access to the data resources of the facilities in a uniform way, although we believe that the model has wider application, especially in areas of “structural science” such as chemistry, materials science and earth sciences. We give some motivations behind the development of the model, and an overview of its major structural elements, centred on the notion of a scientific study formed by a collection of specific investigations. We give some details of the model, with the description of each investigation associated with a particular experiment on a sample generating data, and the associated data holdings are then mapped to the investigation with the appropriate parameters. We then go on to discuss the instantiation of the metadata model within a production quality data management infrastructure, the Information CATalogue (ICAT, which has been developed within STFC for use in large-scale photon and neutron sources. Finally, we give an overview of the relationship between CSMD, and other initiatives, and give some directions for future developments.    

  14. Practical management of heterogeneous neuroimaging metadata by global neuroimaging data repositories.

    Science.gov (United States)

    Neu, Scott C; Crawford, Karen L; Toga, Arthur W

    2012-01-01

    Rapidly evolving neuroimaging techniques are producing unprecedented quantities of digital data at the same time that many research studies are evolving into global, multi-disciplinary collaborations between geographically distributed scientists. While networked computers have made it almost trivial to transmit data across long distances, collecting and analyzing this data requires extensive metadata if the data is to be maximally shared. Though it is typically straightforward to encode text and numerical values into files and send content between different locations, it is often difficult to attach context and implicit assumptions to the content. As the number of and geographic separation between data contributors grows to national and global scales, the heterogeneity of the collected metadata increases and conformance to a single standardization becomes implausible. Neuroimaging data repositories must then not only accumulate data but must also consolidate disparate metadata into an integrated view. In this article, using specific examples from our experiences, we demonstrate how standardization alone cannot achieve full integration of neuroimaging data from multiple heterogeneous sources and why a fundamental change in the architecture of neuroimaging data repositories is needed instead.

  15. ECHO - Search and Order Metadata Registry Post EDG to WIST Transition

    Science.gov (United States)

    Bories, C.; Pilone, D.

    2009-12-01

    NASA’s EOS Data and Information System (EOSDIS) seeks to support the data and information management system services that are required to further NASA’s Science Missions Directorate (SMD) mission. This mission is to develop a scientific understanding of the Earth system and its response to natural and human-induced causes. Two of EOSDIS central components is the EOSDIS Core System (ECS) and its EOS Clearing House (ECHO).ECHO reached a new level of operational demands in 2008, when the legacy EOS Data Gateway (EDG) client and related components were retired in favor of a new paradigm for search and order . Under this new approach, searches over the 2.5+ petabytes of NASA remote sensing images can be done by querying the associated metadata held in ECHO’s database. Since the completion of the EDG retirement in Feb 2009 ECHO has assumed an enhanced operational posture, meeting or surpassing all former EDG performance, capabilities and levels of usage. This operational readiness is demonstrated by a number of relevant metrics as outlined in the following paragraphs. The first data center to retire their EDG instance for use of ECHO’s web-based search-and-order client, Warehouse Inventory Search Tool (WIST), was the National Snow and Ice Data Center (NSIDC) on August 2008. Land Processes LPDAAC, ECHO’s biggest data provider, followed in September that same year. The last provider (Langley Research Center - LaRC) completed its migration on February. 2009. During this transition period over 5000 EDG registered users were migrated successfully by deploying a custom tool that allowed the seamless replication of EDG accounts into the ECHO system. Since the retirement of EDG, ECHO has seen a steady increase of users at a rate of over 100 new users per week on average. As providers migrated out of the legacy system, ECHO saw a steady increase in orders and searches. Within a couple of months after the LPDAAC migration, ECHO had absorbed all of the legacy usage

  16. Physical Samples and Persistent Identifiers: The Implementation of the International Geo Sample Number (IGSN) Registration Service in CSIRO, Australia

    Science.gov (United States)

    Devaraju, Anusuriya; Klump, Jens; Tey, Victor; Fraser, Ryan

    2016-04-01

    Physical samples such as minerals, soil, rocks, water, air and plants are important observational units for understanding the complexity of our environment and its resources. They are usually collected and curated by different entities, e.g., individual researchers, laboratories, state agencies, or museums. Persistent identifiers may facilitate access to physical samples that are scattered across various repositories. They are essential to locate samples unambiguously and to share their associated metadata and data systematically across the Web. The International Geo Sample Number (IGSN) is a persistent, globally unique label for identifying physical samples. The IGSNs of physical samples are registered by end-users (e.g., individual researchers, data centers and projects) through allocating agents. Allocating agents are the institutions acting on behalf of the implementing organization (IGSN e.V.). The Commonwealth Scientific and Industrial Research Organisation CSIRO) is one of the allocating agents in Australia. To implement IGSN in our organisation, we developed a RESTful service and a metadata model. The web service enables a client to register sub-namespaces and multiple samples, and retrieve samples' metadata programmatically. The metadata model provides a framework in which different types of samples may be represented. It is generic and extensible, therefore it may be applied in the context of multi-disciplinary projects. The metadata model has been implemented as an XML schema and a PostgreSQL database. The schema is used to handle sample registrations requests and to disseminate their metadata, whereas the relational database is used to preserve the metadata records. The metadata schema leverages existing controlled vocabularies to minimize the scope for error and incorporates some simplifications to reduce complexity of the schema implementation. The solutions developed have been applied and tested in the context of two sample repositories in CSIRO, the

  17. gCube Grid services

    CERN Document Server

    Andrade, Pedro

    2008-01-01

    gCube is a service-based framework for eScience applications requiring collaboratory, on-demand, and intensive information processing. It provides to these communities Virtual Research Environments (VREs) to support their activities. gCube is build on top of standard technologies for computational Grids, namely the gLite middleware. The software was produced by the DILIGENT project and will continue to be supported and further developed by the D4Science project. gCube reflects within its name a three-sided interpretation of the Grid vision of resource sharing: sharing of computational resources, sharing of structured data, and sharing of application services. As such, gCube embodies the defining characteristics of computational Grids, data Grids, and virtual data Grids. Precisely, it builds on gLite middleware for managing distributed computations and unstructured data, includes dedicated services for managing data and metadata, provides services for distributed information retrieval, allows the orchestration...

  18. On the Design of a Comprehensive Authorisation Framework for Service Oriented Architecture (SOA)

    Science.gov (United States)

    2013-07-01

    Service Manager XML eXtensible Markup Language UNCLASSIFIED DSTO-TN-1193 UNCLASSIFIED This page is...metadata providing additional description for ws. wsm is the identity of the Web Service Manager responsible for managing the ws object. sm is the...section, we discuss the administration support provided by the WSAA to manage a collection of Web services. A Web Service Manager (WSM) manages Web

  19. Getting Double the Work Done with Half the Effort: Provenance and Metadata with Semantic Workflows

    Science.gov (United States)

    Gil, Y.

    2012-12-01

    The variety, velocity, and volume of big data are dwarfing our ability to analyze it using the computational tools and models at our disposal. Studies report that researchers spend more than 60% of their time just preparing the data for model input or data-model inter-comparison just to start a baseline in a given science project. Computational workflow systems can assist with these tasks by automating the execution of complex computations. When metadata is available, semantic workflow systems can use it to make intelligent decisions based on the type of data and models requirements. This talk will discuss the importance of provenance-aware software that both generates and uses metadata as the data is being processed, and what new capabilities are enabled for researchers. This combined system was used to develop and test a near-real time scientific workflow to facilitate the observation of the spatio-temporal distribution of whole-stream metabolism estimates using available monitoring station flow and water quality data. The data integration steps combined data from public government repositories and local sensors with the implication of different associated properties (data integrity, sampling intervals, units), and (2) the variability of the interim flows requires adaptive model selection within the framework of the metabolism calculations. These challenges are addressed by using a data integration system in which metadata and provenance are generated as the data is prepared and then subsequently used by a semantic workflow system to automatically select and configure models, effectively customizing the analysis to the daily data. Data preparation involves the extraction, cleaning, normalization and integration of the data coming from sensors and third-party data sources. In this process, the metadata and provenance captured includes sensor specifications, data types, data properties, and process documentation, and is passed along with the data on to the workflow

  20. A Metadata Management Framework for Collaborative Review of Science Data Products

    Science.gov (United States)

    Hart, A. F.; Cinquini, L.; Mattmann, C. A.; Thompson, D. R.; Wagstaff, K.; Zimdars, P. A.; Jones, D. L.; Lazio, J.; Preston, R. A.

    2012-12-01

    Data volumes generated by modern scientific instruments often preclude archiving the complete observational record. To compensate, science teams have developed a variety of "triage" techniques for identifying data of potential scientific interest and marking it for prioritized processing or permanent storage. This may involve multiple stages of filtering with both automated and manual components operating at different timescales. A promising approach exploits a fast, fully automated first stage followed by a more reliable offline manual review of candidate events. This hybrid approach permits a 24-hour rapid real-time response while also preserving the high accuracy of manual review. To support this type of second-level validation effort, we have developed a metadata-driven framework for the collaborative review of candidate data products. The framework consists of a metadata processing pipeline and a browser-based user interface that together provide a configurable mechanism for reviewing data products via the web, and capturing the full stack of associated metadata in a robust, searchable archive. Our system heavily leverages software from the Apache Object Oriented Data Technology (OODT) project, an open source data integration framework that facilitates the construction of scalable data systems and places a heavy emphasis on the utilization of metadata to coordinate processing activities. OODT provides a suite of core data management components for file management and metadata cataloging that form the foundation for this effort. The system has been deployed at JPL in support of the V-FASTR experiment [1], a software-based radio transient detection experiment that operates commensally at the Very Long Baseline Array (VLBA), and has a science team that is geographically distributed across several countries. Daily review of automatically flagged data is a shared responsibility for the team, and is essential to keep the project within its resource constraints. We

  1. Another face of the metadata: information for management of the digital preservation 10.5007/1518-2924.2010v15n30p1

    Directory of Open Access Journals (Sweden)

    Luís Fernando Sayão

    2010-10-01

    Full Text Available The traditional concept of metadata can be expanded to provide a set of information to support the management activities of the preservation of digital materials. This type of metadata, called preservation metadata, is designed to inform and document the process of digital preservation of long-term, assuring that digital content can be accessed and interpreted in the future. In recent years many metadata schemes and infrastructure oriented for digital preservation have been developed; the greatest challenge they face has been to anticipate what information is actually required to support a particular process of digital preservation. The most important and comprehensive initiative in this field is the PREMIS Data Dictionary, developed based on the conceptual infrastructure defined by the OAIS ISO standard. The basic idea of this paper is to review the main concepts, standards and technologies involved in the development of metadata schemes of preservation metadata.

  2. The Geodetic Seamless Archive Centers Service Layer: A System Architecture for Federating Geodesy Data Repositories

    Science.gov (United States)

    McWhirter, J.; Boler, F. M.; Bock, Y.; Jamason, P.; Squibb, M. B.; Noll, C. E.; Blewitt, G.; Kreemer, C. W.

    2010-12-01

    Three geodesy Archive Centers, Scripps Orbit and Permanent Array Center (SOPAC), NASA's Crustal Dynamics Data Information System (CDDIS) and UNAVCO are engaged in a joint effort to define and develop a common Web Service Application Programming Interface (API) for accessing geodetic data holdings. This effort is funded by the NASA ROSES ACCESS Program to modernize the original GPS Seamless Archive Centers (GSAC) technology which was developed in the 1990s. A new web service interface, the GSAC-WS, is being developed to provide uniform and expanded mechanisms through which users can access our data repositories. In total, our respective archives hold tens of millions of files and contain a rich collection of site/station metadata. Though we serve similar user communities, we currently provide a range of different access methods, query services and metadata formats. This leads to a lack of consistency in the userís experience and a duplication of engineering efforts. The GSAC-WS API and its reference implementation in an underlying Java-based GSAC Service Layer (GSL) supports metadata and data queries into site/station oriented data archives. The general nature of this API makes it applicable to a broad range of data systems. The overall goals of this project include providing consistent and rich query interfaces for end users and client programs, the development of enabling technology to facilitate third party repositories in developing these web service capabilities and to enable the ability to perform data queries across a collection of federated GSAC-WS enabled repositories. A fundamental challenge faced in this project is to provide a common suite of query services across a heterogeneous collection of data yet enabling each repository to expose their specific metadata holdings. To address this challenge we are developing a "capabilities" based service where a repository can describe its specific query and metadata capabilities. Furthermore, the architecture of

  3. NASA Reverb: Standards-Driven Earth Science Data and Service Discovery

    Science.gov (United States)

    Cechini, M. F.; Mitchell, A.; Pilone, D.

    2011-12-01

    NASA's Earth Observing System Data and Information System (EOSDIS) is a core capability in NASA's Earth Science Data Systems Program. NASA's EOS ClearingHOuse (ECHO) is a metadata catalog for the EOSDIS, providing a centralized catalog of data products and registry of related data services. Working closely with the EOSDIS community, the ECHO team identified a need to develop the next generation EOS data and service discovery tool. This development effort relied on the following principles: + Metadata Driven User Interface - Users should be presented with data and service discovery capabilities based on dynamic processing of metadata describing the targeted data. + Integrated Data & Service Discovery - Users should be able to discovery data and associated data services that facilitate their research objectives. + Leverage Common Standards - Users should be able to discover and invoke services that utilize common interface standards. Metadata plays a vital role facilitating data discovery and access. As data providers enhance their metadata, more advanced search capabilities become available enriching a user's search experience. Maturing metadata formats such as ISO 19115 provide the necessary depth of metadata that facilitates advanced data discovery capabilities. Data discovery and access is not limited to simply the retrieval of data granules, but is growing into the more complex discovery of data services. These services include, but are not limited to, services facilitating additional data discovery, subsetting, reformatting, and re-projecting. The discovery and invocation of these data services is made significantly simpler through the use of consistent and interoperable standards. By utilizing an adopted standard, developing standard-specific adapters can be utilized to communicate with multiple services implementing a specific protocol. The emergence of metadata standards such as ISO 19119 plays a similarly important role in discovery as the 19115 standard

  4. Building a semantic web-based metadata repository for facilitating detailed clinical modeling in cancer genome studies.

    Science.gov (United States)

    Sharma, Deepak K; Solbrig, Harold R; Tao, Cui; Weng, Chunhua; Chute, Christopher G; Jiang, Guoqian

    2017-06-05

    Detailed Clinical Models (DCMs) have been regarded as the basis for retaining computable meaning when data are exchanged between heterogeneous computer systems. To better support clinical cancer data capturing and reporting, there is an emerging need to develop informatics solutions for standards-based clinical models in cancer study domains. The objective of the study is to develop and evaluate a cancer genome study metadata management system that serves as a key infrastructure in supporting clinical information modeling in cancer genome study domains. We leveraged a Semantic Web-based metadata repository enhanced with both ISO11179 metadata standard and Clinical Information Modeling Initiative (CIMI) Reference Model. We used the common data elements (CDEs) defined in The Cancer Genome Atlas (TCGA) data dictionary, and extracted the metadata of the CDEs using the NCI Cancer Data Standards Repository (caDSR) CDE dataset rendered in the Resource Description Framework (RDF). The ITEM/ITEM_GROUP pattern defined in the latest CIMI Reference Model is used to represent reusable model elements (mini-Archetypes). We produced a metadata repository with 38 clinical cancer genome study domains, comprising a rich collection of mini-Archetype pattern instances. We performed a case study of the domain "clinical pharmaceutical" in the TCGA data dictionary and demonstrated enriched data elements in the metadata repository are very useful in support of building detailed clinical models. Our informatics approach leveraging Semantic Web technologies provides an effective way to build a CIMI-compliant metadata repository that would facilitate the detailed clinical modeling to support use cases beyond TCGA in clinical cancer study domains.

  5. UKRVO Astronomical WEB Services

    Directory of Open Access Journals (Sweden)

    Mazhaev, O.E.

    2017-01-01

    Full Text Available Ukraine Virtual Observatory (UkrVO has been a member of the International Virtual Observatory Alliance (IVOA since 2011. The virtual observatory (VO is not a magic solution to all problems of data storing and processing, but it provides certain standards for building infrastructure of astronomical data center. The astronomical databases help data mining and offer to users an easy access to observation metadata, images within celestial sphere and results of image processing. The astronomical web services (AWS of UkrVO give to users handy tools for data selection from large astronomical catalogues for a relatively small region of interest in the sky. Examples of the AWS usage are showed.

  6. Ready to put metadata on the post-2015 development agenda? Linking data publications to responsible innovation and science diplomacy.

    Science.gov (United States)

    Özdemir, Vural; Kolker, Eugene; Hotez, Peter J; Mohin, Sophie; Prainsack, Barbara; Wynne, Brian; Vayena, Effy; Coşkun, Yavuz; Dereli, Türkay; Huzair, Farah; Borda-Rodriguez, Alexander; Bragazzi, Nicola Luigi; Faris, Jack; Ramesar, Raj; Wonkam, Ambroise; Dandara, Collet; Nair, Bipin; Llerena, Adrián; Kılıç, Koray; Jain, Rekha; Reddy, Panga Jaipal; Gollapalli, Kishore; Srivastava, Sanjeeva; Kickbusch, Ilona

    2014-01-01

    Metadata refer to descriptions about data or as some put it, "data about data." Metadata capture what happens on the backstage of science, on the trajectory from study conception, design, funding, implementation, and analysis to reporting. Definitions of metadata vary, but they can include the context information surrounding the practice of science, or data generated as one uses a technology, including transactional information about the user. As the pursuit of knowledge broadens in the 21(st) century from traditional "science of whats" (data) to include "science of hows" (metadata), we analyze the ways in which metadata serve as a catalyst for responsible and open innovation, and by extension, science diplomacy. In 2015, the United Nations Millennium Development Goals (MDGs) will formally come to an end. Therefore, we propose that metadata, as an ingredient of responsible innovation, can help achieve the Sustainable Development Goals (SDGs) on the post-2015 agenda. Such responsible innovation, as a collective learning process, has become a key component, for example, of the European Union's 80 billion Euro Horizon 2020 R&D Program from 2014-2020. Looking ahead, OMICS: A Journal of Integrative Biology, is launching an initiative for a multi-omics metadata checklist that is flexible yet comprehensive, and will enable more complete utilization of single and multi-omics data sets through data harmonization and greater visibility and accessibility. The generation of metadata that shed light on how omics research is carried out, by whom and under what circumstances, will create an "intervention space" for integration of science with its socio-technical context. This will go a long way to addressing responsible innovation for a fairer and more transparent society. If we believe in science, then such reflexive qualities and commitments attained by availability of omics metadata are preconditions for a robust and socially attuned science, which can then remain broadly

  7. Web Search Engines and Indexing and Ranking the Content Object Including Metadata Elements Available at the Dynamic Information Environments

    Directory of Open Access Journals (Sweden)

    Faezeh sadat Tabatabai Amiri

    2012-10-01

    Full Text Available The purpose of this research was to make exam the indexing and ranking of XML content objects containing Dublin Core and MARC 21 metadata elements in dynamic online information environments by general search engines and comparing them together in a comparative-analytical approach. 100 XML content objects in two groups were analyzed: those with DCXML elements and those with MARCXML elements were published in website http://www.marcdcmi.ir. from late Mordad 1388 till Khordad 1389. Then the website was introduced to Google and Yahoo search engines. Google search engine was able to retrieve fully all the content objects during the study period through their Dublin Core and MARC 21 metadata elements; Yahoo search engine, however, did not respond at all. The indexing of metadata elements embedded in content objects in dynamic online information environments and different between indexing and ranking of them were examined. Findings showed all Dublin Core and MARC 21 metadata elements by Google search engine were indexed. And there was not observed difference between indexing and ranking DCXML and MARCXML metadata elements in dynamic online information environments by Google search engine.

  8. SnoVault and encodeD: A novel object-based storage system and applications to ENCODE metadata.

    Science.gov (United States)

    Hitz, Benjamin C; Rowe, Laurence D; Podduturi, Nikhil R; Glick, David I; Baymuradov, Ulugbek K; Malladi, Venkat S; Chan, Esther T; Davidson, Jean M; Gabdank, Idan; Narayana, Aditi K; Onate, Kathrina C; Hilton, Jason; Ho, Marcus C; Lee, Brian T; Miyasato, Stuart R; Dreszer, Timothy R; Sloan, Cricket A; Strattan, J Seth; Tanaka, Forrest Y; Hong, Eurie L; Cherry, J Michael

    2017-01-01

    The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a comprehensive catalog of functional elements initiated shortly after the completion of the Human Genome Project. The current database exceeds 6500 experiments across more than 450 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the H. sapiens and M. musculus genomes. All ENCODE experimental data, metadata, and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage, unified processing, and distribution to community resources and the scientific community. As the volume of data increases, the identification and organization of experimental details becomes increasingly intricate and demands careful curation. The ENCODE DCC has created a general purpose software system, known as SnoVault, that supports metadata and file submission, a database used for metadata storage, web pages for displaying the metadata and a robust API for querying the metadata. The software is fully open-source, code and installation instructions can be found at: http://github.com/ENCODE-DCC/snovault/ (for the generic database) and http://github.com/ENCODE-DCC/encoded/ to store genomic data in the manner of ENCODE. The core database engine, SnoVault (which is completely independent of ENCODE, genomic data, or bioinformatic data) has been released as a separate Python package.

  9. SnoVault and encodeD: A novel object-based storage system and applications to ENCODE metadata.

    Directory of Open Access Journals (Sweden)

    Benjamin C Hitz

    Full Text Available The Encyclopedia of DNA elements (ENCODE project is an ongoing collaborative effort to create a comprehensive catalog of functional elements initiated shortly after the completion of the Human Genome Project. The current database exceeds 6500 experiments across more than 450 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the H. sapiens and M. musculus genomes. All ENCODE experimental data, metadata, and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC for validation, tracking, storage, unified processing, and distribution to community resources and the scientific community. As the volume of data increases, the identification and organization of experimental details becomes increasingly intricate and demands careful curation. The ENCODE DCC has created a general purpose software system, known as SnoVault, that supports metadata and file submission, a database used for metadata storage, web pages for displaying the metadata and a robust API for querying the metadata. The software is fully open-source, code and installation instructions can be found at: http://github.com/ENCODE-DCC/snovault/ (for the generic database and http://github.com/ENCODE-DCC/encoded/ to store genomic data in the manner of ENCODE. The core database engine, SnoVault (which is completely independent of ENCODE, genomic data, or bioinformatic data has been released as a separate Python package.

  10. The Spirit Is Willing, But the Flesh is Weak: Why Young People Drink More Than Intended on Weekend Nights-An Event-Level Study.

    Science.gov (United States)

    Labhart, Florian; Anderson, Kristen G; Kuntsche, Emmanuel

    2017-11-01

    Heavy alcohol use is common among young adults on weekend nights and is assumed to be intentional. However, little is known about the extent to which heavy consumption is planned prior to the onset of drinking and what factors contribute to drinking more than intended. This study investigates drinking intentions at the beginning of an evening and individual and situational factors associated with a subsequent consumption over the course of multiple nights. Using a smartphone application, 176 young people aged 16 to 25 (mean age = 19.1; 49% women) completed questionnaires on drinking intentions, consumption, and drinking environments before, during, and after multiple Friday and Saturday nights (n = 757). Multilevel regressions were used to investigate individual-level and night-level factors associated with previous drinking intentions and subsequent deviations from intentions. Participants intended to consume 2.5 drinks (SD = 2.8) per night yet consumed 3.8 drinks (SD = 3.9) on average. Drinking intentions were higher among those who frequently went out at night and engaged in more frequent predrinking. Participants drank more than intended on 361 nights (47.7%). For both genders, the number of drinks consumed before 8 pm, attending multiple locations, and being with larger groups of friends contributed to higher consumption than intended at the individual and the night levels. Heavier consumption than intended also occurred when drinking away from home for men and when going to nightclubs for women. Making young adults aware of the tendency to drink more than intended, particularly when drinking begins early in the evening, moves from location to location, and includes large groups of friends, may be a fruitful prevention target. Structural measures, including responsible beverage service, may also help in preventing excessive drinking at multiple locations. Copyright © 2017 by the Research Society on Alcoholism.

  11. An Observation Capability Metadata Model for EO Sensor Discovery in Sensor Web Enablement Environments

    Directory of Open Access Journals (Sweden)

    Chuli Hu

    2014-10-01

    Full Text Available Accurate and fine-grained discovery by diverse Earth observation (EO sensors ensures a comprehensive response to collaborative observation-required emergency tasks. This discovery remains a challenge in an EO sensor web environment. In this study, we propose an EO sensor observation capability metadata model that reuses and extends the existing sensor observation-related metadata standards to enable the accurate and fine-grained discovery of EO sensors. The proposed model is composed of five sub-modules, namely, ObservationBreadth, ObservationDepth, ObservationFrequency, ObservationQuality and ObservationData. The model is applied to different types of EO sensors and is formalized by the Open Geospatial Consortium Sensor Model Language 1.0. The GeosensorQuery prototype retrieves the qualified EO sensors based on the provided geo-event. An actual application to flood emergency observation in the Yangtze River Basin in China is conducted, and the results indicate that sensor inquiry can accurately achieve fine-grained discovery of qualified EO sensors and obtain enriched observation capability information. In summary, the proposed model enables an efficient encoding system that ensures minimum unification to represent the observation capabilities of EO sensors. The model functions as a foundation for the efficient discovery of EO sensors. In addition, the definition and development of this proposed EO sensor observation capability metadata model is a helpful step in extending the Sensor Model Language (SensorML 2.0 Profile for the description of the observation capabilities of EO sensors.

  12. The PDS4 Data Dictionary Tool - Metadata Design for Data Preparers

    Science.gov (United States)

    Raugh, A.; Hughes, J. S.

    2017-12-01

    One of the major design goals of the PDS4 development effort was to create an extendable Information Model (IM) for the archive, and to allow mission data designers/preparers to create extensions for metadata definitions specific to their own contexts. This capability is critical for the Planetary Data System - an archive that deals with a data collection that is diverse along virtually every conceivable axis. Amid such diversity in the data itself, it is in the best interests of the PDS archive and its users that all extensions to the IM follow the same design techniques, conventions, and restrictions as the core implementation itself. But it is unrealistic to expect mission data designers to acquire expertise in information modeling, model-driven design, ontology, schema formulation, and PDS4 design conventions and philosophy in order to define their own metadata. To bridge that expertise gap and bring the power of information modeling to the data label designer, the PDS Engineering Node has developed the data dictionary creation tool known as "LDDTool". This tool incorporates the same software used to maintain and extend the core IM, packaged with an interface that enables a developer to create his extension to the IM using the same, standards-based metadata framework PDS itself uses. Through this interface, the novice dictionary developer has immediate access to the common set of data types and unit classes for defining attributes, and a straight-forward method for constructing classes. The more experienced developer, using the same tool, has access to more sophisticated modeling methods like abstraction and extension, and can define context-specific validation rules. We present the key features of the PDS Local Data Dictionary Tool, which both supports the development of extensions to the PDS4 IM, and ensures their compatibility with the IM.

  13. Explicet: graphical user interface software for metadata-driven management, analysis and visualization of microbiome data.

    Science.gov (United States)

    Robertson, Charles E; Harris, J Kirk; Wagner, Brandie D; Granger, David; Browne, Kathy; Tatem, Beth; Feazel, Leah M; Park, Kristin; Pace, Norman R; Frank, Daniel N

    2013-12-01

    Studies of the human microbiome, and microbial community ecology in general, have blossomed of late and are now a burgeoning source of exciting research findings. Along with the advent of next-generation sequencing platforms, which have dramatically increased the scope of microbiome-related projects, several high-performance sequence analysis pipelines (e.g. QIIME, MOTHUR, VAMPS) are now available to investigators for microbiome analysis. The subject of our manuscript, the graphical user interface-based Explicet software package, fills a previously unmet need for a robust, yet intuitive means of integrating the outputs of the software pipelines with user-specified metadata and then visualizing the combined data.

  14. A semantically rich and standardised approach enhancing discovery of sensor data and metadata

    Science.gov (United States)

    Kokkinaki, Alexandra; Buck, Justin; Darroch, Louise

    2016-04-01

    The marine environment plays an essential role in the earth's climate. To enhance the ability to monitor the health of this important system, innovative sensors are being produced and combined with state of the art sensor technology. As the number of sensors deployed is continually increasing,, it is a challenge for data users to find the data that meet their specific needs. Furthermore, users need to integrate diverse ocean datasets originating from the same or even different systems. Standards provide a solution to the above mentioned challenges. The Open Geospatial Consortium (OGC) has created Sensor Web Enablement (SWE) standards that enable different sensor networks to establish syntactic interoperability. When combined with widely accepted controlled vocabularies, they become semantically rich and semantic interoperability is achievable. In addition, Linked Data is the recommended best practice for exposing, sharing and connecting information on the Semantic Web using Uniform Resource Identifiers (URIs), Resource Description Framework (RDF) and RDF Query Language (SPARQL). As part of the EU-funded SenseOCEAN project, the British Oceanographic Data Centre (BODC) is working on the standardisation of sensor metadata enabling 'plug and play' sensor integration. Our approach combines standards, controlled vocabularies and persistent URIs to publish sensor descriptions, their data and associated metadata as 5 star Linked Data and OGC SWE (SensorML, Observations & Measurements) standard. Thus sensors become readily discoverable, accessible and useable via the web. Content and context based searching is also enabled since sensors descriptions are understood by machines. Additionally, sensor data can be combined with other sensor or Linked Data datasets to form knowledge. This presentation will describe the work done in BODC to achieve syntactic and semantic interoperability in the sensor domain. It will illustrate the reuse and extension of the Semantic Sensor

  15. The Risa R/Bioconductor package: integrative data analysis from experimental metadata and back again.

    Science.gov (United States)

    González-Beltrán, Alejandra; Neumann, Steffen; Maguire, Eamonn; Sansone, Susanna-Assunta; Rocca-Serra, Philippe

    2014-01-01

    The ISA-Tab format and software suite have been developed to break the silo effect induced by technology-specific formats for a variety of data types and to better support experimental metadata tracking. Experimentalists seldom use a single technique to monitor biological signals. Providing a multi-purpose, pragmatic and accessible format that abstracts away common constructs for describing Investigations, Studies and Assays, ISA is increasingly popular. To attract further interest towards the format and extend support to ensure reproducible research and reusable data, we present the Risa package, which delivers a central component to support the ISA format by enabling effortless integration with R, the popular, open source data crunching environment. The Risa package bridges the gap between the metadata collection and curation in an ISA-compliant way and the data analysis using the widely used statistical computing environment R. The package offers functionality for: i) parsing ISA-Tab datasets into R objects, ii) augmenting annotation with extra metadata not explicitly stated in the ISA syntax; iii) interfacing with domain specific R packages iv) suggesting potentially useful R packages available in Bioconductor for subsequent processing of the experimental data described in the ISA format; and finally v) saving back to ISA-Tab files augmented with analysis specific metadata from R. We demonstrate these features by presenting use cases for mass spectrometry data and DNA microarray data. The Risa package is open source (with LGPL license) and freely available through Bioconductor. By making Risa available, we aim to facilitate the task of processing experimental data, encouraging a uniform representation of experimental information and results while delivering tools for ensuring traceability and provenance tracking. The Risa package is available since Bioconductor 2.11 (version 1.0.0) and version 1.2.1 appeared in Bioconductor 2.12, both along with documentation

  16. GOLD - MicrobeDB.jp | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us MicrobeDB.jp GOLD Data detail Data name GOLD DOI 10.18908/lsdba.nbdc01181-008.V002 Version V...henotypes of genome-sequenced microbes in JGI GOLD by using MPO. Data file File name: gold.tar.gz File URL: ... Simple search URL - Data acquisition method Metadata of genome-sequenced microbes were obtained from JGI GOLD... Download License Update History of This Database Site Policy | Contact Us GOLD - MicrobeDB.jp | LSDB Archive ...

  17. Estimating pediatric entrance skin dose from digital radiography examination using DICOM metadata: A quality assurance tool.

    Science.gov (United States)

    Brady, S L; Kaufman, R A

    2015-05-01

    To develop an automated methodology to estimate patient examination dose in digital radiography (DR) imaging using DICOM metadata as a quality assurance (QA) tool. Patient examination and demographical information were gathered from metadata analysis of DICOM header data. The x-ray system radiation output (i.e., air KERMA) was characterized for all filter combinations used for patient examinations. Average patient thicknesses were measured for head, chest, abdomen, knees, and hands using volumetric images from CT. Backscatter factors (BSFs) were calculated from examination kVp. Patient entrance skin air KERMA (ESAK) was calculated by (1) looking up examination technique factors taken from DICOM header metadata (i.e., kVp and mA s) to derive an air KERMA (k air) value based on an x-ray characteristic radiation output curve; (2) scaling k air with a BSF value; and (3) correcting k air for patient thickness. Finally, patient entrance skin dose (ESD) was calculated by multiplying a mass-energy attenuation coefficient ratio by ESAK. Patient ESD calculations were computed for common DR examinations at our institution: dual view chest, anteroposterior (AP) abdomen, lateral (LAT) skull, dual view knee, and bone age (left hand only) examinations. ESD was calculated for a total of 3794 patients; mean age was 11 ± 8 yr (range: 2 months to 55 yr). The mean ESD range was 0.19-0.42 mGy for dual view chest, 0.28-1.2 mGy for AP abdomen, 0.18-0.65 mGy for LAT view skull, 0.15-0.63 mGy for dual view knee, and 0.10-0.12 mGy for bone age (left hand) examinations. A methodology combining DICOM header metadata and basic x-ray tube characterization curves was demonstrated. In a regulatory era where patient dose reporting has become increasingly in demand, this methodology will allow a knowledgeable user the means to establish an automatable dose reporting program for DR and perform patient dose related QA testing for digital x-ray imaging.

  18. The Information Resources in Arcetri Astrophysics Observatory: Between Metadata and Semantic Web

    Science.gov (United States)

    Baglioni, Roberto; Gasperini, Antonella

    It is becoming apparent that libraries are going to play a key role in the new W3C's (World Wide Web Consortium) paradigm for the semantic web. For this reason, the Arcetri library is investigating methods for publishing different kinds of electronic documents on the net and a way of enriching them with semantic metadata. For the first phase, we are focusing on the library catalogue; and, in a second phase, we will consider bibliographies, preprints, technical reports, web pages, archives of astronomical data, and photographic and historical archives.

  19. The VIS-AD data model: Integrating metadata and polymorphic display with a scientific programming language

    Science.gov (United States)

    Hibbard, William L.; Dyer, Charles R.; Paul, Brian E.

    1994-01-01

    The VIS-AD data model integrates metadata about the precision of values, including missing data indicators and the way that arrays sample continuous functions, with the data objects of a scientific programming language. The data objects of this data model form a lattice, ordered by the precision with which they approximate mathematical objects. We define a similar lattice of displays and study visualization processes as functions from data lattices to display lattices. Such functions can be applied to visualize data objects of all data types and are thus polymorphic.

  20. From the inside-out: Retrospectives on a metadata improvement process to advance the discoverability of NASÁs earth science data

    Science.gov (United States)

    Hernández, B. E.; Bugbee, K.; le Roux, J.; Beaty, T.; Hansen, M.; Staton, P.; Sisco, A. W.

    2017-12-01

    Earth observation (EO) data collected as part of NASA's Earth Observing System Data and Information System (EOSDIS) is now searchable via the Common Metadata Repository (CMR). The Analysis and Review of CMR (ARC) Team at Marshall Space Flight Center has been tasked with reviewing all NASA metadata records in the CMR ( 7,000 records). Each collection level record and constituent granule level metadata are reviewed for both completeness as well as compliance with the CMR's set of metadata standards, as specified in the Unified Metadata Model (UMM). NASA's Distributed Active Archive Centers (DAACs) have been harmonizing priority metadata records within the context of the inter-agency federal Big Earth Data Initiative (BEDI), which seeks to improve the discoverability, accessibility, and usability of EO data. Thus, the first phase of this project constitutes reviewing BEDI metadata records, while the second phase will constitute reviewing the remaining non-BEDI records in CMR. This presentation will discuss the ARC team's findings in terms of the overall quality of BEDI records across all DAACs as well as compliance with UMM standards. For instance, only a fifth of the collection-level metadata fields needed correction, compared to a quarter of the granule-level fields. It should be noted that the degree to which DAACs' metadata did not comply with the UMM standards may reflect multiple factors, such as recent changes in the UMM standards, and the utilization of different metadata formats (e.g. DIF 10, ECHO 10, ISO 19115-1) across the DAACs. Insights, constructive criticism, and lessons learned from this metadata review process will be contributed from both ORNL and SEDAC. Further inquiry along such lines may lead to insights which may improve the metadata curation process moving forward. In terms of the broader implications for metadata compliance with the UMM standards, this research has shown that a large proportion of the prioritized collections have already been

  1. Are Tags from Mars and Descriptors from Venus? A Study on the Ecology of Educational Resource Metadata

    Science.gov (United States)

    Vuorikari, Riina; Sillaots, Martin; Panzavolta, Silvia; Koper, Rob

    In this study, over a period of six months, we gathered empirical data from more than 200 users on a learning resource portal with a social bookmarking and tagging feature. Our aim was to study the interrelation of conventional metadata and social tags on the one hand, and their interaction with the environment, which can be understood as the repository, its resources and all stakeholders that included the managers, metadata indexers and the whole community of users. We found an interplay between tags and descriptors and showed how tags can enrich and add value to multilingual controlled vocabularies in various ways. We also showed that, even if many tags can be seen as redundant in terms of the existing LOM, some of them can become a useful source of metadata for repository owners, and help them better understand users’ needs and demands.

  2. ClinData Express--a metadata driven clinical research data management system for secondary use of clinical data.

    Science.gov (United States)

    Li, Zuofeng; Wen, Jingran; Zhang, Xiaoyan; Wu, Chunxiao; Li, Zuogao; Liu, Lei

    2012-01-01

    Aim to ease the secondary use of clinical data in clinical research, we introduce a metadata driven web-based clinical data management system named ClinData Express. ClinData Express is made up of two parts: 1) m-designer, a standalone software for metadata definition; 2) a web based data warehouse system for data management. With ClinData Express, what the researchers need to do is to define the metadata and data model in the m-designer. The web interface for data collection and specific database for data storage will be automatically generated. The standards used in the system and the data export modular make sure of the data reuse. The system has been tested on seven disease-data collection in Chinese and one form from dbGap. The flexibility of system makes its great potential usage in clinical research. The system is available at http://code.google.com/p/clindataexpress.

  3. Grid Enabled Geospatial Catalogue Web Service

    Science.gov (United States)

    Chen, Ai-Jun; Di, Li-Ping; Wei, Ya-Xing; Liu, Yang; Bui, Yu-Qi; Hu, Chau-Min; Mehrotra, Piyush

    2004-01-01

    Geospatial Catalogue Web Service is a vital service for sharing and interoperating volumes of distributed heterogeneous geospatial resources, such as data, services, applications, and their replicas over the web. Based on the Grid technology and the Open Geospatial Consortium (0GC) s Catalogue Service - Web Information Model, this paper proposes a new information model for Geospatial Catalogue Web Service, named as GCWS which can securely provides Grid-based publishing, managing and querying geospatial data and services, and the transparent access to the replica data and related services under the Grid environment. This information model integrates the information model of the Grid Replica Location Service (RLS)/Monitoring & Discovery Service (MDS) with the information model of OGC Catalogue Service (CSW), and refers to the geospatial data metadata standards from IS0 19115, FGDC and NASA EOS Core System and service metadata standards from IS0 191 19 to extend itself for expressing geospatial resources. Using GCWS, any valid geospatial user, who belongs to an authorized Virtual Organization (VO), can securely publish and manage geospatial resources, especially query on-demand data in the virtual community and get back it through the data-related services which provide functions such as subsetting, reformatting, reprojection etc. This work facilitates the geospatial resources sharing and interoperating under the Grid environment, and implements geospatial resources Grid enabled and Grid technologies geospatial enabled. It 2!so makes researcher to focus on science, 2nd not cn issues with computing ability, data locztic, processir,g and management. GCWS also is a key component for workflow-based virtual geospatial data producing.

  4. Using Google Tag Manager and Google Analytics to track DSpace metadata fields as custom dimensions

    Directory of Open Access Journals (Sweden)

    Suzanna Conrad

    2015-01-01

    Full Text Available DSpace can be problematic for those interested in tracking download and pageview statistics granularly. Some libraries have implemented code to track events on websites and some have experimented with using Google Tag Manager to automate event tagging in DSpace. While these approaches make it possible to track download statistics, granular details such as authors, content types, titles, advisors, and other fields for which metadata exist are generally not tracked in DSpace or Google Analytics without coding. Moreover, it can be time consuming to track and assess pageview data and relate that data back to particular metadata fields. This article will detail the learning process of incorporating custom dimensions for tracking these detailed fields including trial and error attempts to use the data import function manually in Google Analytics, to automate the data import using Google APIs, and finally to automate the collection of dimension data in Google Tag Manager by mimicking SEO practices for capturing meta tags. This specific case study refers to using Google Tag Manager and Google Analytics with DSpace; however, this method may also be applied to other types of websites or systems.

  5. The ATLAS EventIndex: data flow and inclusion of other metadata

    CERN Document Server

    AUTHOR|(INSPIRE)INSPIRE-00064378; Cardenas Zarate, Simon Ernesto; Favareto, Andrea; Fernandez Casani, Alvaro; Gallas, Elizabeth; Garcia Montoro, Carlos; Gonzalez de la Hoz, Santiago; Hrivnac, Julius; Malon, David; Prokoshin, Fedor; Salt, Jose; Sanchez, Javier; Toebbicke, Rainer; Yuan, Ruijun

    2016-01-01

    The ATLAS EventIndex is the catalogue of the event-related metadata for the information collected from the ATLAS detector. The basic unit of this information is the event record, containing the event identification parameters, pointers to the files containing this event as well as trigger decision information. The main use case for the EventIndex is event picking, as well as data consistency checks for large production campaigns. The EventIndex employs the Hadoop platform for data storage and handling, as well as a messaging system for the collection of information. The information for the EventIndex is collected both at Tier-0, when the data are first produced, and from the Grid, when various types of derived data are produced. The EventIndex uses various types of auxiliary information from other ATLAS sources for data collection and processing: trigger tables from the condition metadata database (COMA), dataset information from the data catalogue AMI and the Rucio data management system and information on p...

  6. A federated semantic metadata registry framework for enabling interoperability across clinical research and care domains.

    Science.gov (United States)

    Sinaci, A Anil; Laleci Erturkmen, Gokce B

    2013-10-01

    In order to enable secondary use of Electronic Health Records (EHRs) by bridging the interoperability gap between clinical care and research domains, in this paper, a unified methodology and the supporting framework is introduced which brings together the power of metadata registries (MDR) and semantic web technologies. We introduce a federated semantic metadata registry framework by extending the ISO/IEC 11179 standard, and enable integration of data element registries through Linked Open Data (LOD) principles where each Common Data Element (CDE) can be uniquely referenced, queried and processed to enable the syntactic and semantic interoperability. Each CDE and their components are maintained as LOD resources enabling semantic links with other CDEs, terminology systems and with implementation dependent content models; hence facilitating semantic search, much effective reuse and semantic interoperability across different application domains. There are several important efforts addressing the semantic interoperability in healthcare domain such as IHE DEX profile proposal, CDISC SHARE and CDISC2RDF. Our architecture complements these by providing a framework to interlink existing data element registries and repositories for multiplying their potential for semantic interoperability to a greater extent. Open source implementation of the federated semantic MDR framework presented in this paper is the core of the semantic interoperability layer of the SALUS project which enables the execution of the post marketing safety analysis studies on top of existing EHR systems. Copyright © 2013 Elsevier Inc. All rights reserved.

  7. The NOAA OneStop System: From Well-Curated Metadata to Data Discovery

    Science.gov (United States)

    McQuinn, E.; Jakositz, A.; Caldwell, A.; Delk, Z.; Neufeld, D.; Shapiro, J.; Partee, R.; Milan, A.

    2017-12-01

    The NOAA OneStop project is a pathfinder in the realm of enabling users to search for, discover, and access NOAA data. As the project continues along its path to maturity, it has become evident that three areas are of utmost importance to its success in the Earth science community: ensuring quality metadata, building a robust and scalable backend architecture, and keeping the user interface simple to use. Why is this the case? Because, simply put, we are dealing with all aspects of a Big Data problem: large volumes of disparate data needing to be quickly and easily processed and retrieved. In this presentation we discuss the three key aspects of OneStop architecture and how development in each area must be done through cross-team collaboration in order to succeed. We cover aspects of the web-based user interface and OneStop API and how metadata curators and software engineers have worked together to continually iterate on an ever-improving data discovery tool meant to be used by a variety of users searching across a broad assortment of data types.

  8. A Geospatial Data Recommender System based on Metadata and User Behaviour

    Science.gov (United States)

    Li, Y.; Jiang, Y.; Yang, C. P.; Armstrong, E. M.; Huang, T.; Moroni, D. F.; Finch, C. J.; McGibbney, L. J.

    2017-12-01

    Earth observations are produced in a fast velocity through real time sensors, reaching tera- to peta- bytes of geospatial data daily. Discovering and accessing the right data from the massive geospatial data is like finding needle in the haystack. To help researchers find the right data for study and decision support, quite a lot of research focusing on improving search performance have been proposed including recommendation algorithm. However, few papers have discussed the way to implement a recommendation algorithm in geospatial data retrieval system. In order to address this problem, we propose a recommendation engine to improve discovering relevant geospatial data by mining and utilizing metadata and user behavior data: 1) metadata based recommendation considers the correlation of each attribute (i.e., spatiotemporal, categorical, and ordinal) to data to be found. In particular, phrase extraction method is used to improve the accuracy of the description similarity; 2) user behavior data are utilized to predict the interest of a user through collaborative filtering; 3) an integration method is designed to combine the results of the above two methods to achieve better recommendation Experiments show that in the hybrid recommendation list, the all the precisions are larger than 0.8 from position 1 to 10.

  9. FIR: An Effective Scheme for Extracting Useful Metadata from Social Media.

    Science.gov (United States)

    Chen, Long-Sheng; Lin, Zue-Cheng; Chang, Jing-Rong

    2015-11-01

    Recently, the use of social media for health information exchange is expanding among patients, physicians, and other health care professionals. In medical areas, social media allows non-experts to access, interpret, and generate medical information for their own care and the care of others. Researchers paid much attention on social media in medical educations, patient-pharmacist communications, adverse drug reactions detection, impacts of social media on medicine and healthcare, and so on. However, relatively few papers discuss how to extract useful knowledge from a huge amount of textual comments in social media effectively. Therefore, this study aims to propose a Fuzzy adaptive resonance theory network based Information Retrieval (FIR) scheme by combining Fuzzy adaptive resonance theory (ART) network, Latent Semantic Indexing (LSI), and association rules (AR) discovery to extract knowledge from social media. In our FIR scheme, Fuzzy ART network firstly has been employed to segment comments. Next, for each customer segment, we use LSI technique to retrieve important keywords. Then, in order to make the extracted keywords understandable, association rules mining is presented to organize these extracted keywords to build metadata. These extracted useful voices of customers will be transformed into design needs by using Quality Function Deployment (QFD) for further decision making. Unlike conventional information retrieval techniques which acquire too many keywords to get key points, our FIR scheme can extract understandable metadata from social media.

  10. Exploring the CMIP5 multi-model archive with structured meta-data

    Science.gov (United States)

    Juckes, M.; Pascoe, C.; Guilyardi, E.; Lawrence, B. N.; Da Costa, E.

    2012-04-01

    The climate model archive of the Climate Model Inter-comparison Project, Phase 5 (CMIP5), contains results from a broad range of models. At time of submission of this abstract, simulations of the 20th century from 26 models have been delivered. Some of these models have been run both in "atmosphere-ocean" mode, with prescribed atmospheric concentrations of greenhouse gases and in "Earth-system" mode, with prescribed emissions of greenhouse gases. Resolutions of models also vary from close to half a degree to 3 degrees. Not all models are independent, with some modeling groups submitting results from a range of models with varying degrees of complexity or from a model with a range of different parameterisation options. Fortunately, this vast and complex archive is provided with a repository of structured meta-data exploiting the METAFOR Common Information Model. This presentation will exploit this structured meta-data in an exploration of the CMIP5 archive, analysing the dependency of a range of indices and climatologies (both of model fields and derived fields, such as the consecutive dry day index) on the details of model architecture held in the METAFOR repository.

  11. Automated DICOM metadata and volumetric anatomical information extraction for radiation dosimetry

    Science.gov (United States)

    Papamichail, D.; Ploussi, A.; Kordolaimi, S.; Karavasilis, E.; Papadimitroulas, P.; Syrgiamiotis, V.; Efstathopoulos, E.

    2015-09-01

    Patient-specific dosimetry calculations based on simulation techniques have as a prerequisite the modeling of the modality system and the creation of voxelized phantoms. This procedure requires the knowledge of scanning parameters and patients’ information included in a DICOM file as well as image segmentation. However, the extraction of this information is complicated and time-consuming. The objective of this study was to develop a simple graphical user interface (GUI) to (i) automatically extract metadata from every slice image of a DICOM file in a single query and (ii) interactively specify the regions of interest (ROI) without explicit access to the radiology information system. The user-friendly application developed in Matlab environment. The user can select a series of DICOM files and manage their text and graphical data. The metadata are automatically formatted and presented to the user as a Microsoft Excel file. The volumetric maps are formed by interactively specifying the ROIs and by assigning a specific value in every ROI. The result is stored in DICOM format, for data and trend analysis. The developed GUI is easy, fast and and constitutes a very useful tool for individualized dosimetry. One of the future goals is to incorporate a remote access to a PACS server functionality.

  12. The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata

    Energy Technology Data Exchange (ETDEWEB)

    Fenner, Marsha W; Liolios, Konstantinos; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Kyrpides, Nikos C.

    2007-12-31

    The Genomes On Line Database (GOLD) is a comprehensive resource of information for genome and metagenome projects world-wide. GOLD provides access to complete and ongoing projects and their associated metadata through pre-computed lists and a search page. The database currently incorporates information for more than 2900 sequencing projects, of which 639 have been completed and the data deposited in the public databases. GOLD is constantly expanding to provide metadata information related to the project and the organism and is compliant with the Minimum Information about a Genome Sequence (MIGS) specifications.

  13. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata

    Energy Technology Data Exchange (ETDEWEB)

    Liolios, Konstantinos; Chen, Amy; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Phil; Markowitz, Victor; Kyrpides, Nikos C.

    2009-09-01

    The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification.

  14. A novel approach towards skill-based search and services of Open Educational Resources

    NARCIS (Netherlands)

    Ha, Kyung-Hun; Niemann, Katja; Schwertel, Uta; Holtkamp, Philipp; Pirkkalainen, Henri; Börner, Dirk; Kalz, Marco; Pitsilis, Vassilis; Vidalis, Ares; Pappa, Dimitra; Bick, Markus; Pawlowski, Jan; Wolpers, Martin

    2011-01-01

    Ha, K.-H., Niemann, K., Schwertel, U., Holtkamp, P., Pirkkalainen, H., Börner, D. et al (2011). A novel approach towards skill-based search and services of Open Educational Resources. In E. Garcia-Barriocanal, A. Öztürk, & M. C. Okur (Eds.), Metadata and Semantics Research: 5th International

  15. Defense Technical Information Center Managed Services for DoD Generated Datasets

    Science.gov (United States)

    2010-03-03

    data discovery tools. The tools are intended to search the web and identify potential datasets of interest. The initial steps of the effort...Service Interface – This provides an interface to external systems for exchanging security information, metadata, and data .  Discovery Agent – The

  16. Exploiting Dark Information Resources to Create New Value Added Services to Study Earth Science Phenomena

    Science.gov (United States)

    Ramachandran, Rahul; Maskey, Manil; Li, Xiang; Bugbee, Kaylin

    2017-01-01

    This paper presents two research applications exploiting unused metadata resources in novel ways to aid data discovery and exploration capabilities. The results based on the experiments are encouraging and each application has the potential to serve as a useful standalone component or service in a data system. There were also some interesting lessons learned while designing the two applications and these are presented next.

  17. Defining Linkages between the GSC and NSF's LTER Program: How the Ecological Metadata Language (EML) Relates to GCDML and Other Outcomes

    Science.gov (United States)

    Inigo San Gil; Wade Sheldon; Tom Schmidt; Mark Servilla; Raul Aguilar; Corinna Gries; Tanya Gray; Dawn Field; James Cole; Jerry Yun Pan; Giri Palanisamy; Donald Henshaw; Margaret O' Brien; Linda Kinkel; Kathrine McMahon; Renzo Kottmann; Linda Amaral-Zettler; John Hobbie; Philip Goldstein; Robert P. Guralnick; James Brunt; William K. Michener

    2008-01-01

    The Genomic Standards Consortium (GSC) invited a representative of the Long-Term Ecological Research (LTER) to its fifth workshop to present the Ecological Metadata Language (EML) metadata standard and its relationship to the Minimum Information about a Genome/Metagenome Sequence (MIGS/MIMS) and its implementation, the Genomic Contextual Data Markup Language (GCDML)....

  18. What Information Does Your EHR Contain? Automatic Generation of a Clinical Metadata Warehouse (CMDW) to Support Identification and Data Access Within Distributed Clinical Research Networks.

    Science.gov (United States)

    Bruland, Philipp; Doods, Justin; Storck, Michael; Dugas, Martin

    2017-01-01

    Data dictionaries provide structural meta-information about data definitions in health information technology (HIT) systems. In this regard, reusing healthcare data for secondary purposes offers several advantages (e.g. reduce documentation times or increased data quality). Prerequisites for data reuse are its quality, availability and identical meaning of data. In diverse projects, research data warehouses serve as core components between heterogeneous clinical databases and various research applications. Given the complexity (high number of data elements) and dynamics (regular updates) of electronic health record (EHR) data structures, we propose a clinical metadata warehouse (CMDW) based on a metadata registry standard. Metadata of two large hospitals were automatically inserted into two CMDWs containing 16,230 forms and 310,519 data elements. Automatic updates of metadata are possible as well as semantic annotations. A CMDW allows metadata discovery, data quality assessment and similarity analyses. Common data models for distributed research networks can be established based on similarity analyses.

  19. Playing the Metadata Game: Technologies and Strategies Used by Climate Diagnostics Center for Cataloging and Distributing Climate Data.

    Science.gov (United States)

    Schweitzer, R. H.

    2001-05-01

    The Climate Diagnostics Center maintains a collection of gridded climate data primarily for use by local researchers. Because this data is available on fast digital storage and because it has been converted to netCDF using a standard metadata convention (called COARDS), we recognize that this data collection is also useful to the community at large. At CDC we try to use technology and metadata standards to reduce our costs associated with making these data available to the public. The World Wide Web has been an excellent technology platform for meeting that goal. Specifically we have developed Web-based user interfaces that allow users to search, plot and download subsets from the data collection. We have also been exploring use of the Pacific Marine Environment Laboratory's Live Access Server (LAS) as an engine for this task. This would result in further savings by allowing us to concentrate on customizing the LAS where needed, rather that developing and maintaining our own system. One such customization currently under development is the use of Java Servlets and JavaServer pages in conjunction with a metadata database to produce a hierarchical user interface to LAS. In addition to these Web-based user interfaces all of our data are available via the Distributed Oceanographic Data System (DODS). This allows other sites using LAS and individuals using DODS-enabled clients to use our data as if it were a local file. All of these technology systems are driven by metadata. When we began to create netCDF files, we collaborated with several other agencies to develop a netCDF convention (COARDS) for metadata. At CDC we have extended that convention to incorporate additional metadata elements to make the netCDF files as self-describing as possible. Part of the local metadata is a set of controlled names for the variable, level in the atmosphere and ocean, statistic and data set for each netCDF file. To allow searching and easy reorganization of these metadata, we loaded

  20. Data Service: Distributed Data Capture and Replication

    Science.gov (United States)

    Warner, P. B.; Pietrowicz, S. R.

    2007-10-01

    Data Service is a critical component of the NOAO Data Management and Science Support (DMaSS) Solutions Platform, which is based on a service-oriented architecture, and is to replace the current NOAO Data Transport System. Its responsibilities include capturing data from NOAO and partner telescopes and instruments and replicating the data across multiple (currently six) storage sites. Java 5 was chosen as the implementation language, and Java EE as the underlying enterprise framework. Application metadata persistence is performed using EJB and Hibernate on the JBoss Application Server, with PostgreSQL as the persistence back-end. Although potentially any underlying mass storage system may be used as the Data Service file persistence technology, DTS deployments and Data Service test deployments currently use the Storage Resource Broker from SDSC. This paper presents an overview and high-level design of the Data Service, including aspects of deployment, e.g., for the LSST Data Challenge at the NCSA computing facilities.

  1. Grid computing enhances standards-compatible geospatial catalogue service

    Science.gov (United States)

    Chen, Aijun; Di, Liping; Bai, Yuqi; Wei, Yaxing; Liu, Yang

    2010-04-01

    A catalogue service facilitates sharing, discovery, retrieval, management of, and access to large volumes of distributed geospatial resources, for example data, services, applications, and their replicas on the Internet. Grid computing provides an infrastructure for effective use of computing, storage, and other resources available online. The Open Geospatial Consortium has proposed a catalogue service specification and a series of profiles for promoting the interoperability of geospatial resources. By referring to the profile of the catalogue service for Web, an innovative information model of a catalogue service is proposed to offer Grid-enabled registry, management, retrieval of and access to geospatial resources and their replicas. This information model extends the e-business registry information model by adopting several geospatial data and service metadata standards—the International Organization for Standardization (ISO)'s 19115/19119 standards and the US Federal Geographic Data Committee (FGDC) and US National Aeronautics and Space Administration (NASA) metadata standards for describing and indexing geospatial resources. In order to select the optimal geospatial resources and their replicas managed by the Grid, the Grid data management service and information service from the Globus Toolkits are closely integrated with the extended catalogue information model. Based on this new model, a catalogue service is implemented first as a Web service. Then, the catalogue service is further developed as a Grid service conforming to Grid service specifications. The catalogue service can be deployed in both the Web and Grid environments and accessed by standard Web services or authorized Grid services, respectively. The catalogue service has been implemented at the George Mason University/Center for Spatial Information Science and Systems (GMU/CSISS), managing more than 17 TB of geospatial data and geospatial Grid services. This service makes it easy to share and

  2. Individual-Level, Partnership-Level, and Sexual Event-Level Predictors of Condom Use During Receptive Anal Intercourse Among HIV-Negative Men Who Have Sex with Men in Los Angeles.

    Science.gov (United States)

    Pines, Heather A; Gorbach, Pamina M; Weiss, Robert E; Reback, Cathy J; Landovitz, Raphael J; Mutchler, Matt G; Mitsuyasu, Ronald T

    2016-06-01

    We examined individual-level, partnership-level, and sexual event-level factors associated with condom use during receptive anal intercourse (RAI) among 163 low-income, racially/ethnically diverse, HIV-negative men who have sex with men (MSM) in Los Angeles (2007-2010). At baseline, 3-month, and 12-month visits, computer-assisted self-interviews collected information on ≤3 recent male partners and the last sexual event with those partners. Factors associated with condom use during RAI at the last sexual event were identified using logistic generalized linear mixed models. Condom use during RAI was negatively associated with reporting ≥ high school education (adjusted odds ratio [AOR] = 0.32, 95 % confidence interval [CI] 0.11-0.96) and methamphetamine use, specifically during RAI events with non-main partners (AOR = 0.20, 95 % CI 0.07-0.53) and those that included lubricant use (AOR = 0.20, 95 % CI 0.08-0.53). Condom use during RAI varies according to individual-level, partnership-level, and sexual event-level factors that should be considered in the development of risk reduction strategies for this population.

  3. Lightweight Advertising and Scalable Discovery of Services, Datasets, and Events Using Feedcasts

    Science.gov (United States)

    Wilson, B. D.; Ramachandran, R.; Movva, S.

    2010-12-01

    Broadcast feeds (Atom or RSS) are a mechanism for advertising the existence of new data objects on the web, with metadata and links to further information. Users then subscribe to the feed to receive updates. This concept has already been used to advertise the new granules of science data as they are produced (datacasting), with browse images and metadata, and to advertise bundles of web services (service casting). Structured metadata is introduced into the XML feed format by embedding new XML tags (in defined namespaces), using typed links, and reusing built-in Atom feed elements. This “infocasting” concept can be extended to include many other science artifacts, including data collections, workflow documents, topical geophysical events (hurricanes, forest fires, etc.), natural hazard warnings, and short articles describing a new science result. The common theme is that each infocast contains machine-readable, structured metadata describing the object and enabling further manipulation. For example, service casts contain type links pointing to the service interface description (e.g., WSDL for SOAP services), service endpoint, and human-readable documentation. Our Infocasting project has three main goals: (1) define and evangelize micro-formats (metadata standards) so that providers can easily advertise their web services, datasets, and topical geophysical events by adding structured information to broadcast feeds; (2) develop authoring tools so that anyone can easily author such service advertisements, data casts, and event descriptions; and (3) provide a one-stop, Google-like search box in the browser that allows discovery of service, data and event casts visible on the web, and services & data registered in the GEOSS repository and other NASA repositories (GCMD & ECHO). To demonstrate the event casting idea, a series of micro-articles—with accompanying event casts containing links to relevant datasets, web services, and science analysis workflows--will be

  4. Dynamic Service Selection in Workflows Using Performance Data

    Directory of Open Access Journals (Sweden)

    David W. Walker

    2007-01-01

    Full Text Available An approach to dynamic workflow management and optimisation using near-realtime performance data is presented. Strategies are discussed for choosing an optimal service (based on user-specified criteria from several semantically equivalent Web services. Such an approach may involve finding "similar" services, by first pruning the set of discovered services based on service metadata, and subsequently selecting an optimal service based on performance data. The current implementation of the prototype workflow framework is described, and demonstrated with a simple workflow. Performance results are presented that show the performance benefits of dynamic service selection. A statistical analysis based on the first order statistic is used to investigate the likely improvement in service response time arising from dynamic service selection.

  5. Experiences with making diffraction image data available: what metadata do we need to archive?

    Energy Technology Data Exchange (ETDEWEB)

    Kroon-Batenburg, Loes M. J., E-mail: l.m.j.kroon-batenburg@uu.nl [Utrecht University, Padualaan 8, 3584 CH Utrecht (Netherlands); Helliwell, John R. [University of Manchester, Brunswick Street, Manchester M14 9PL (United Kingdom); Utrecht University, Padualaan 8, 3584 CH Utrecht (Netherlands)

    2014-10-01

    A local raw ‘diffraction data images’ archive was made available and some data sets were retrieved and reprocessed, which led to analysis of the anomalous difference densities of two partially occupied Cl atoms in cisplatin as well as a re-evaluation of the resolution cutoff in these diffraction data. General questions on storing raw data are discussed. It is also demonstrated that often one needs unambiguous prior knowledge to read the (binary) detector format and the setup of goniometer geometries. Recently, the IUCr (International Union of Crystallography) initiated the formation of a Diffraction Data Deposition Working Group with the aim of developing standards for the representation of raw diffraction data associated with the publication of structural papers. Archiving of raw data serves several goals: to improve the record of science, to verify the reproducibility and to allow detailed checks of scientific data, safeguarding against fraud and to allow reanalysis with future improved techniques. A means of studying this issue is to submit exemplar publications with associated raw data and metadata. In a recent study of the binding of cisplatin and carboplatin to histidine in lysozyme crystals under several conditions, the possible effects of the equipment and X-ray diffraction data-processing software on the occupancies and B factors of the bound Pt compounds were compared. Initially, 35.3 GB of data were transferred from Manchester to Utrecht to be processed with EVAL. A detailed description and discussion of the availability of metadata was published in a paper that was linked to a local raw data archive at Utrecht University and also mirrored at the TARDIS raw diffraction data archive in Australia. By making these raw diffraction data sets available with the article, it is possible for the diffraction community to make their own evaluation. This led to one of the authors of XDS (K. Diederichs) to re-integrate the data from crystals that supposedly

  6. Payments for environmental services: A solution for biodiversity conservation?

    OpenAIRE

    S. Wertz-Kanounnikoff

    2006-01-01

    Metadata only record "Direct payments for environmental services (PES) are increasingly becoming subject of national development strategies and of actions promoted by large networks of non-governmental conservation organizations as means to finance biodiversity conservation. They arose also partly in response to the criticism against the efficiency of traditional approaches to conservation. Based on a literature review, the objective of this note is to assemble lessons learned from PES sch...

  7. Are tags from Mars and descriptors from Venus? A study on the ecology of educational resource metadata

    NARCIS (Netherlands)

    Vuorikari, Riina; Sillaots, Martin; Panzavolta, Silvia; Koper, Rob

    2009-01-01

    Vuorikari, R., Sillaots, M., Panzavolta, S. & Koper, R. (2009). Are tags from Mars and descriptors from Venus? A study on the ecology of educational resource metadata. In M. Spaniol, Q. Li, R. Klamma & R. W. H. Lau (Eds.), Proceedings of the 8th International Conference Advances in Web Based

  8. Study on Information Management for the Conservation of Traditional Chinese Architectural Heritage - 3d Modelling and Metadata Representation

    Science.gov (United States)

    Yen, Y. N.; Weng, K. H.; Huang, H. Y.

    2013-07-01

    After over 30 years of practise and development, Taiwan's architectural conservation field is moving rapidly into digitalization and its applications. Compared to modern buildings, traditional Chinese architecture has considerably more complex elements and forms. To document and digitize these unique heritages in their conservation lifecycle is a new and important issue. This article takes the caisson ceiling of the Taipei Confucius Temple, octagonal with 333 elements in 8 types, as a case study for digitization practise. The application of metadata representation and 3D modelling are the two key issues to discuss. Both Revit and SketchUp were appliedin this research to compare its effectiveness to metadata representation. Due to limitation of the Revit database, the final 3D models wasbuilt with SketchUp. The research found that, firstly, cultural heritage databasesmustconvey that while many elements are similar in appearance, they are unique in value; although 3D simulations help the general understanding of architectural heritage, software such as Revit and SketchUp, at this stage, could onlybe used tomodel basic visual representations, and is ineffective indocumenting additional critical data ofindividually unique elements. Secondly, when establishing conservation lifecycle information for application in management systems, a full and detailed presentation of the metadata must also be implemented; the existing applications of BIM in managing conservation lifecycles are still insufficient. Results of the research recommends SketchUp as a tool for present modelling needs, and BIM for sharing data between users, but the implementation of metadata representation is of the utmost importance.

  9. Summary Record of the First Meeting of the Radioactive Waste Repository Metadata Management (RepMet) Initiative

    International Nuclear Information System (INIS)

    2014-01-01

    National radioactive waste repository programmes are collecting large amounts of data to support the long-term management of their nations' radioactive wastes. The data and related records increase in number, type and quality as programmes proceed through the successive stages of repository development: pre-siting, siting, characterisation, construction, operation and finally closure. Regulatory and societal approvals are included in this sequence. Some programmes are also documenting past repository projects and facing a challenge in allowing both current and future generations to understand actions carried out in the past. Metadata allows context to be stored with data and information so that it can be located, used, updated and maintained. Metadata helps waste management organisations better utilise their data in carrying out their statutory tasks and can also help verify and demonstrate that their programmes are appropriately driven. The NEA Radioactive Waste Repository Metadata Management (RepMet) initiative aims to bring about a better understanding of the identification and administration of metadata - a key aspect of data management - to support national programmes in managing their radioactive waste repository data, information and records in a way that is both harmonised internationally and suitable for long-term management and use. This is a summary record of the 1. meeting of the RepMet initiative. The actions and decisions from this meeting were sent separately to the group after the meeting, but are also included in this document (Annex A). The list of participants is attached as well (Annex B)

  10. MetaRNA-Seq: An Interactive Tool to Browse and Annotate Metadata from RNA-Seq Studies

    Directory of Open Access Journals (Sweden)

    Pankaj Kumar

    2015-01-01

    Full Text Available The number of RNA-Seq studies has grown in recent years. The design of RNA-Seq studies varies from very simple (e.g., two-condition case-control to very complicated (e.g., time series involving multiple samples at each time point with separate drug treatments. Most of these publically available RNA-Seq studies are deposited in NCBI databases, but their metadata are scattered throughout four different databases: Sequence Read Archive (SRA, Biosample, Bioprojects, and Gene Expression Omnibus (GEO. Although the NCBI web interface is able to provide all of the metadata information, it often requires significant effort to retrieve study- or project-level information by traversing through multiple hyperlinks and going to another page. Moreover, project- and study-level metadata lack manual or automatic curation by categories, such as disease type, time series, case-control, or replicate type, which are vital to comprehending any RNA-Seq study. Here we describe “MetaRNA-Seq,” a new tool for interactively browsing, searching, and annotating RNA-Seq metadata with the capability of semiautomatic curation at the study level.

  11. Discovering NOAA Climate Data and Product Services (Invited)

    Science.gov (United States)

    Baldwin, R.; Ansari, S.; Reid, G.

    2009-12-01

    The National Climatic Data Center (NCDC) archives climate data for the US and the world. These data are provided through traditional web systems as well as web services. The web service implementation follows standards set by the Open Geospatial Consortium (OGC) and the World Wide Web Consortium (W3C). Simple object access protocol (SOAP) and representational state transfer (REST) are the two types of services provided. Provision of many data and product services from multiple organizations presents consumers with the difficulty of discovery. Standards based collection level metadata describe these data and products. This information delivered using a catalog service (CSW) in combination with an ontology service provides a robust mechanism for data discovery. Service endpoints or clients that use service endpoints are embedded within the metadata providing customers with tools to access and interrogate the fine details of the data. These technologies are demonstrated in current NCDC projects such as NOAA Climate Services Portal (NCSP), National Integrated Drought Information System (NIDIS), Pacific Climate Information System (PaCIS) and work with the Consortium of Universities for Advancement of Hydrologic Science (CUAHSI).

  12. Definition of an ISO 19115 metadata profile for SeaDataNet II Cruise Summary Reports and its XML encoding

    Science.gov (United States)

    Boldrini, Enrico; Schaap, Dick M. A.; Nativi, Stefano

    2013-04-01

    SeaDataNet implements a distributed pan-European infrastructure for Ocean and Marine Data Management whose nodes are maintained by 40 national oceanographic and marine data centers from 35 countries riparian to all European seas. A unique portal makes possible distributed discovery, visualization and access of the available sea data across all the member nodes. Geographic metadata play an important role in such an infrastructure, enabling an efficient documentation and discovery of the resources of interest. In particular: - Common Data Index (CDI) metadata describe the sea datasets, including identification information (e.g. product title, interested area), evaluation information (e.g. data resolution, constraints) and distribution information (e.g. download endpoint, download protocol); - Cruise Summary Reports (CSR) metadata describe cruises and field experiments at sea, including identification information (e.g. cruise title, name of the ship), acquisition information (e.g. utilized instruments, number of samples taken) In the context of the second phase of SeaDataNet (SeaDataNet 2 EU FP7 project, grant agreement 283607, started on October 1st, 2011 for a duration of 4 years) a major target is the setting, adoption and promotion of common international standards, to the benefit of outreach and interoperability with the international initiatives and communities (e.g. OGC, INSPIRE, GEOSS, …). A standardization effort conducted by CNR with the support of MARIS, IFREMER, STFC, BODC and ENEA has led to the creation of a ISO 19115 metadata profile of CDI and its XML encoding based on ISO 19139. The CDI profile is now in its stable version and it's being implemented and adopted by the SeaDataNet community tools and software. The effort has then continued to produce an ISO based metadata model and its XML encoding also for CSR. The metadata elements included in the CSR profile belong to different models: - ISO 19115: E.g. cruise identification information, including

  13. Metadados digitais: revisão bibliográfica da evolução e tendências por meio de categorias funcionaisDigital metadata: bibliographical review on evolution and trends using functional categories

    Directory of Open Access Journals (Sweden)

    Luiz Fernando de Barros Campos

    2007-01-01

    Full Text Available Revisão bibliográfica sobre metadados digitais no campo da ciência da informação, objetivando precisar as várias funcionalidades dos metadados. Empregando-se uma técnica de análise de conteúdo temática, constatou-se a recorrência de tecnologias como XML, RDFS, ontologias, data warehouses, Web Semântica, serviços Web, entre outras, e de dez categorias relativas às funções dos metadados que fundamentavam os textos examinados. Com suporte na análise temática, adotando-se uma perspectiva técnica e histórica, relacionaram-se as categorias funcionais às tecnologias e mostrou-se como as categorias são tratadas em dezenove dos artigos revisados, destacando-se as ênfases empregadas e a convergência dos temas. Notou-se que trabalhos e tecnologias que versam sobre padrões e modelos para preservação digital adotam perspectivas mais abrangentes e integradas, incorporando todas ou quase todas as categorias. Com base nos resultados, foram comentadas tendências e questões latentes percebidas na revisão bibliográfica, e sugeridas abordagens e metodologias para a análise de metadados e das tecnologias relacionadas.By means of a bibliographical review on digital metadata in the information science field, aiming to distinguish the various functionalities of metadata and employing thematic content analysis, it was noticed the recurrence of technologies like XML, RDFS, ontology, data warehouses, semantic Web, Web services, among others, and of certain categories connected to metadata functions that grounded the examined works. Based on these findings, adopting a technical and historical approach, the functional categories were related to the technologies and it was showed how the categories wereexpounded in 19 texts selected from the review. It was observed that studies and technologies that deal with models and standards for digital preservation adopt more comprehensive and integrated perspectives, encompassing all or nearly all

  14. Introducing the PRIDE Archive RESTful web services.

    Science.gov (United States)

    Reisinger, Florian; del-Toro, Noemi; Ternent, Tobias; Hermjakob, Henning; Vizcaíno, Juan Antonio

    2015-07-01

    The PRIDE (PRoteomics IDEntifications) database is one of the world-leading public repositories of mass spectrometry (MS)-based proteomics data and it is a founding member of the ProteomeXchange Consortium of proteomics resources. In the original PRIDE database system, users could access data programmatically by accessing the web services provided by the PRIDE BioMart interface. New REST (REpresentational State Transfer) web services have been developed to serve the most popular functionality provided by BioMart (now discontinued due to data scalability issues) and address the data access requirements of the newly developed PRIDE Archive. Using the API (Application Programming Interface) it is now possible to programmatically query for and retrieve peptide and protein identifications, project and assay metadata and the originally submitted files. Searching and filtering is also possible by metadata information, such as sample details (e.g. species and tissues), instrumentation (mass spectrometer), keywords and other provided annotations. The PRIDE Archive web services were first made available in April 2014. The API has already been adopted by a few applications and standalone tools such as PeptideShaker, PRIDE Inspector, the Unipept web application and the Python-based BioServices package. This application is free and open to all users with no login requirement and can be accessed at http://www.ebi.ac.uk/pride/ws/archive/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. Exploring the Relevance of Europeana Digital Resources: Preliminary Ideas on Europeana Metadata Quality

    Directory of Open Access Journals (Sweden)

    Paulo Alonso Gaona-García

    2017-01-01

    Full Text Available Europeana is a European project aimed to become the modern “Alexandria Digital Library”, as it targets providing access to thousands of resources of European cultural heritage, contributed by more than fifteen hundred institutions such as museums, libraries, archives and cultural centers. This article aims to explore Europeana digital resources as open learning repositories in order to re-use digital resources to improve learning process in the domain of arts and cultural heritage. To carry out this purpose, we present results of metadata quality based on a study case associated to recommendations and suggestions that provide this type of initiatives in our educational context in order to improve the access of digital resources according to a specific knowledge areas.

  16. Automated Metadata Formatting for Cornell’s Print-on-Demand Books

    Directory of Open Access Journals (Sweden)

    Dianne Dietrich

    2009-11-01

    Full Text Available Cornell University Library has made Print-On Demand (POD books available for many of its digitized out-of-copyright books. The printer must be supplied with metadata from the MARC bibliographic record in order to produce book covers. Although the names of authors are present in MARC records, they are given in an inverted order suitable for alphabetical filing rather than the natural order that is desirable for book covers. This article discusses a process for parsing and manipulating the MARC author strings to identify their various component parts and to create natural order strings. In particular, the article focuses on processing non-name information in author strings, such as titles that were commonly used in older works, e.g., baron or earl, and suffixes appended to names, e.g., "of Bolsena." Relevant patterns are identified and a Python script is used to manipulate the author name strings.

  17. Chemical machine vision: automated extraction of chemical metadata from raster images.

    Science.gov (United States)

    Gkoutos, Georgios V; Rzepa, Henry; Clark, Richard M; Adjei, Osei; Johal, Harpal

    2003-01-01

    We present a novel application of machine vision methods for the identification of chemical composition diagrams from two-dimensional digital raster images. The method is based on the use of Gabor wavelets and an energy function to derive feature vectors from digital images. These are used for training and classification purposes using a Kohonen network for classification with the Euclidean distance norm. We compare this method with previous approaches to transforming such images to a molecular connection table, which are designed to achieve complete atom connection table fidelity but at the expense of requiring human interaction. The present texture-based approach is complementary in attempting to recognize higher order features such as the presence of a chemical representation in the original raster image. This information can be used for providing chemical metadata descriptors of the original image as part of a robot-based Internet resource discovery tool.

  18. Metadata Modelling of the IPv6 Wireless Sensor Network in the Heihe River Watershed

    Directory of Open Access Journals (Sweden)

    Wanming Luo

    2013-03-01

    Full Text Available Environmental monitoring in ecological and hydrological watershed-scale research is an important and promising area of application for wireless sensor networks. This paper presents the system design of the IPv6 wireless sensor network (IPv6WSN in the Heihe River watershed in the Gansu province of China to assist ecological and hydrological scientists collecting field scientific data in an extremely harsh environment. To solve the challenging problems they face, this paper focuses on the key technologies adopted in our project, metadata modeling for the IPv6WSN. The system design introduced in this paper provides a solid foundation for effective use of a self-developed IPv6 wireless sensor network by ecological and hydrological scientists.

  19. PCDDB: the Protein Circular Dichroism Data Bank, a repository for circular dichroism spectral and metadata.

    Science.gov (United States)

    Whitmore, Lee; Woollett, Benjamin; Miles, Andrew John; Klose, D P; Janes, Robert W; Wallace, B A

    2011-01-01

    The Protein Circular Dichroism Data Bank (PCDDB) is a public repository that archives and freely distributes circular dichroism (CD) and synchrotron radiation CD (SRCD) spectral data and their associated experimental metadata. All entries undergo validation and curation procedures to ensure completeness, consistency and quality of the data included. A web-based interface enables users to browse and query sample types, sample conditions, experimental parameters and provides spectra in both graphical display format and as downloadable text files. The entries are linked, when appropriate, to primary sequence (UniProt) and structural (PDB) databases, as well as to secondary databases such as the Enzyme Commission functional classification database and the CATH fold classification database, as well as to literature citations. The PCDDB is available at: http://pcddb.cryst.bbk.ac.uk.

  20. Availability of Previously Unprocessed ALSEP Raw Instrument Data, Derivative Data, and Metadata Products

    Science.gov (United States)

    Nagihara, S.; Nakamura, Y.; Williams, D. R.; Taylor, P. T.; Kiefer, W. S.; Hager, M. A.; Hills, H. K.

    2016-01-01

    In year 2010, 440 original data archival tapes for the Apollo Lunar Science Experiment Package (ALSEP) experiments were found at the Washington National Records Center. These tapes hold raw instrument data received from the Moon for all the ALSEP instruments for the period of April through June 1975. We have recently completed extraction of binary files from these tapes, and we have delivered them to the NASA Space Science Data Cordinated Archive (NSSDCA). We are currently processing the raw data into higher order data products in file formats more readily usable by contemporary researchers. These data products will fill a number of gaps in the current ALSEP data collection at NSSDCA. In addition, we have estabilished a digital, searcheable archive of ALSEP document and metadata as part of the web portal of the Lunar and Planetary Institute. It currently holds approx. 700 documents totaling approx. 40,000 pages

  1. Semantic technologies improving the recall and precision of the Mercury metadata search engine

    Science.gov (United States)

    Pouchard, L. C.; Cook, R. B.; Green, J.; Palanisamy, G.; Noy, N.

    2011-12-01

    The Mercury federated metadata system [1] was developed at the Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC), a NASA-sponsored effort holding datasets about biogeochemical dynamics, ecological data, and environmental processes. Mercury currently indexes over 100,000 records from several data providers conforming to community standards, e.g. EML, FGDC, FGDC Biological Profile, ISO 19115 and DIF. With the breadth of sciences represented in Mercury, the potential exists to address some key interdisciplinary scientific challenges related to climate change, its environmental and ecological impacts, and mitigation of these impacts. However, this wealth of metadata also hinders pinpointing datasets relevant to a particular inquiry. We implemented a semantic solution after concluding that traditional search approaches cannot improve the accuracy of the search results in this domain because: a) unlike everyday queries, scientific queries seek to return specific datasets with numerous parameters that may or may not be exposed to search (Deep Web queries); b) the relevance of a dataset cannot be judged by its popularity, as each scientific inquiry tends to be unique; and c)each domain science has its own terminology, more or less curated, consensual, and standardized depending on the domain. The same terms may refer to different concepts across domains (homonyms), but different terms mean the same thing (synonyms). Interdisciplinary research is arduous because an expert in a domain must become fluent in the language of another, just to find relevant datasets. Thus, we decided to use scientific ontologies because they can provide a context for a free-text search, in a way that string-based keywords never will. With added context, relevant datasets are more easily discoverable. To enable search and programmatic access to ontology entities in Mercury, we are using an instance of the BioPortal ontology repository. Mercury accesses ontology entities

  2. Scientific Reproducibility in Biomedical Research: Provenance Metadata Ontology for Semantic Annotation of Study Description.

    Science.gov (United States)

    Sahoo, Satya S; Valdez, Joshua; Rueschman, Michael

    2016-01-01

    Scientific reproducibility is key to scientific progress as it allows the research community to build on validated results, protect patients from potentially harmful trial drugs derived from incorrect results, and reduce wastage of valuable resources. The National Institutes of Health (NIH) recently published a systematic guideline titled "Rigor and Reproducibility " for supporting reproducible research studies, which has also been accepted by several scientific journals. These journals will require published articles to conform to these new guidelines. Provenance metadata describes the history or origin of data and it has been long used in computer science to capture metadata information for ensuring data quality and supporting scientific reproducibility. In this paper, we describe the development of Provenance for Clinical and healthcare Research (ProvCaRe) framework together with a provenance ontology to support scientific reproducibility by formally modeling a core set of data elements representing details of research study. We extend the PROV Ontology (PROV-O), which has been recommended as the provenance representation model by World Wide Web Consortium (W3C), to represent both: (a) data provenance, and (b) process provenance. We use 124 study variables from 6 clinical research studies from the National Sleep Research Resource (NSRR) to evaluate the coverage of the provenance ontology. NSRR is the largest repository of NIH-funded sleep datasets with 50,000 studies from 36,000 participants. The provenance ontology reuses ontology concepts from existing biomedical ontologies, for example the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), to model the provenance information of research studies. The ProvCaRe framework is being developed as part of the Big Data to Knowledge (BD2K) data provenance project.

  3. Habitat-Lite: A GSC case study based on free text terms for environmental metadata

    Energy Technology Data Exchange (ETDEWEB)

    Kyrpides, Nikos; Hirschman, Lynette; Clark, Cheryl; Cohen, K. Bretonnel; Mardis, Scott; Luciano, Joanne; Kottmann, Renzo; Cole, James; Markowitz, Victor; Kyrpides, Nikos; Field, Dawn

    2008-04-01

    There is an urgent need to capture metadata on the rapidly growing number of genomic, metagenomic and related sequences, such as 16S ribosomal genes. This need is a major focus within the Genomic Standards Consortium (GSC), and Habitat is a key metadata descriptor in the proposed 'Minimum Information about a Genome Sequence' (MIGS) specification. The goal of the work described here is to provide a light-weight, easy-to-use (small) set of terms ('Habitat-Lite') that captures high-level information about habitat while preserving a mapping to the recently launched Environment Ontology (EnvO). Our motivation for building Habitat-Lite is to meet the needs of multiple users, such as annotators curating these data, database providers hosting the data, and biologists and bioinformaticians alike who need to search and employ such data in comparative analyses. Here, we report a case study based on semi-automated identification of terms from GenBank and GOLD. We estimate that the terms in the initial version of Habitat-Lite would provide useful labels for over 60% of the kinds of information found in the GenBank isolation-source field, and around 85% of the terms in the GOLD habitat field. We present a revised version of Habitat-Lite and invite the community's feedback on its further development in order to provide a minimum list of terms to capture high-level habitat information and to provide classification bins needed for future studies.

  4. QuakeML: XML for Seismological Data Exchange and Resource Metadata Description

    Science.gov (United States)

    Euchner, F.; Schorlemmer, D.; Becker, J.; Heinloo, A.; Kästli, P.; Saul, J.; Weber, B.; QuakeML Working Group

    2007-12-01

    QuakeML is an XML-based data exchange format for seismology that is under development. Current collaborators are from ETH, GFZ, USC, USGS, IRIS DMC, EMSC, ORFEUS, and ISTI. QuakeML development was motivated by the lack of a widely accepted and well-documented data format that is applicable to a broad range of fields in seismology. The development team brings together expertise from communities dealing with analysis and creation of earthquake catalogs, distribution of seismic bulletins, and real-time processing of seismic data. Efforts to merge QuakeML with existing XML dialects are under way. The first release of QuakeML will cover a basic description of seismic events including picks, arrivals, amplitudes, magnitudes, origins, focal mechanisms, and moment tensors. Further extensions are in progress or planned, e.g., for macroseismic information, location probability density functions, slip distributions, and ground motion information. The QuakeML language definition is supplemented by a concept to provide resource metadata and facilitate metadata exchange between distributed data providers. For that purpose, we introduce unique, location-independent identifiers of seismological resources. As an application of QuakeML, ETH Zurich currently develops a Python-based seismicity analysis toolkit as a contribution to CSEP (Collaboratory for the Study of Earthquake Predictability). We follow a collaborative and transparent development approach along the lines of the procedures of the World Wide Web Consortium (W3C). QuakeML currently is in working draft status. The standard description will be subjected to a public Request for Comments (RFC) process and eventually reach the status of a recommendation. QuakeML can be found at http://www.quakeml.org.

  5. Keyword Search over Data Service Integration for Accurate Results

    CERN Document Server

    Zemleris, Vidmantas; Robert Gwadera

    2013-01-01

    Virtual data integration provides a coherent interface for querying heterogeneous data sources (e.g., web services, proprietary systems) with minimum upfront effort. Still, this requires its users to learn the query language and to get acquainted with data organization, which may pose problems even to proficient users. We present a keyword search system, which proposes a ranked list of structured queries along with their explanations. It operates mainly on the metadata, such as the constraints on inputs accepted by services. It was developed as an integral part of the CMS data discovery service, and is currently available as open source.

  6. Interoperability of Volcano Observation Thematic Core Services with the EPOS Integrated Core Services

    Science.gov (United States)

    Vogfjord, Kristin; Sigurdsson, Sigurdur F.; Reitano, Danilo

    2017-04-01

    The volcano observations community, represented by Volcano Observatories (VO) and Volcano Research Institutions (VRI) participating in The European Plate Observing System (EPOS), will implement services to enable open access to data, data products, software and services (DDSS) from the community. Technical implementation of these services is established within the Volcano Observations Thematic Core Service (VO-TCS), which will coordinate activities among the contributing VOs and VRIs to ensure their interoperability with the EPOS Integrated Core services (ICS). The goal is to implement a service-oriented architecture (SOA) to guarantee interoperability among the different components of the VO-TCS and the EPOS-ICS architecture. This entails linking and harmonizing the technical implementation of the VO-TCS with the EPOS-ICS, defining standards for TCS-ICS interaction and implementing a prototype for a RESTful service (REpresentational State Transfer). The VO-TCS services will also coordinate with services and platforms already developed and implemented within the two Volcano Supersite projects, FUTUREVOLC and MED-SUV and will utilize some of their already established services to enable initial access to the community's products. To prepare for initial implementation in the fall of 2017, a survey among the VO-TCS participants was carried out to evaluate the maturity level of their different products (DDSSs). The specific goal was to obtain a report for each participating institution describing the real cross-reference between each DDSS status and the TCS requirements, as well as to determine the availability of data and metadata for each DDSS and their level of maturity. Data and metadata similarities between the participants highlighted by the survey results are used to reorganize and simplify the list of products to be made available in the VO-TCS. The presentation will give an overview of the planned services in the Volcano Observations TCS and outline the roadmap

  7. Alcohol Expectancies and Inhibition Conflict as Moderators of the Alcohol-Unprotected Sex Relationship: Event-Level Findings from a Daily Diary Study Among Individuals Living with HIV in Cape Town, South Africa.

    Science.gov (United States)

    Kiene, Susan M; Simbayi, Leickness C; Abrams, Amber; Cloete, Allanise

    2016-01-01

    Literature from sub-Saharan Africa and elsewhere supports a global association between alcohol and HIV risk. However, more rigorous studies using multiple event-level methods find mixed support for this association, suggesting the importance of examining potential moderators of this relationship. The present study explores the assumptions of alcohol expectancy theory and alcohol myopia theory as possible moderators that help elucidate the circumstances under which alcohol may affect individuals' ability to use a condom. Participants were 82 individuals (58 women, 24 men) living with HIV who completed daily phone interviews for 42 days which assessed daily sexual behavior and alcohol consumption. Logistic generalized estimating equation models were used to examine the potential moderating effects of inhibition conflict and sex-related alcohol outcome expectancies. The data provided some support for both theories and in some cases the moderation effects were stronger when both partners consumed alcohol.

  8. Private collection: high correlation of sample collection and patient admission date in clinical microbiological testing complicates sharing of phylodynamic metadata.

    Science.gov (United States)

    Shean, Ryan C; Greninger, Alexander L

    2018-01-01

    Infectious pathogens are known for their rapid evolutionary rates with new mutations arising over days to weeks. The ability to rapidly recover whole genome sequences and analyze the spread and evolution of pathogens using genetic information and pathogen collection dates has lead to interest in real-time tracking of infectious transmission and outbreaks. However, the level of temporal resolution afforded by these analyses may conflict with definitions of what constitutes protected health information (PHI) and privacy requirements for de-identification for publication and public sharing of research data and metadata. In the United States, dates and locations associated with patient care that provide greater resolution than year or the first three digits of the zip code are generally considered patient identifiers. Admission and discharge dates are specifically named as identifiers in Department of Health and Human Services guidance. To understand the degree to which one can impute admission dates from specimen collection dates, we examined sample collection dates and patient admission dates associated with more than 270,000 unique microbiological results from the University of Washington Laboratory Medicine Department between 2010 and 2017. Across all positive microbiological tests, the sample collection date exactly matched the patient admission date in 68.8% of tests. Collection dates and admission dates were identical from emergency department and outpatient testing 86.7% and 96.5% of the time, respectively, with >99% of tests collected within 1 day from the patient admission date. Samples from female patients were significantly more likely to be collected closer to admission date that those from male patients. We show that PHI-associated dates such as admission date can confidently be imputed from deposited collection date. We suggest that publicly depositing microbiological collection dates at greater resolution than the year may not meet routine Safe Harbor

  9. Impacts and Viability of Open Source Software on Earth Science Metadata Clearing House and Service Registry Applications

    Science.gov (United States)

    Pilone, D.; Cechini, M. F.; Mitchell, A.

    2011-12-01

    Earth Science applications typically deal with large amounts of data and high throughput rates, if not also high transaction rates. While Open Source is frequently used for smaller scientific applications, large scale, highly available systems frequently fall back to "enterprise" class solutions like Oracle RAC or commercial grade JEE Application Servers. NASA's Earth Observing System Data and Information System (EOSDIS) provides end-to-end capabilities for managing NASA's Earth science data from multiple sources - satellites, aircraft, field measurements, and various other programs. A core capability of EOSDIS, the Earth Observing System (EOS) Clearinghouse (ECHO), is a highly available search and order clearinghouse of over 100 million pieces of science data that has evolved from its early R&D days to a fully operational system. Over the course of this maturity ECHO has largely transitioned from commercial frameworks, databases, and operating systems to Open Source solutions...and in some cases, back. In this talk we discuss the progression of our technological solutions and our lessons learned in the areas of: ? High performance, large scale searching solutions ? GeoSpatial search capabilities and dealing with multiple coordinate systems ? Search and storage of variable format source (science) data ? Highly available deployment solutions ? Scalable (elastic) solutions to visual searching and image handling Throughout the evolution of the ECHO system we have had to evaluate solutions with respect to performance, cost, developer productivity, reliability, and maintainability in the context of supporting global science users. Open Source solutions have played a significant role in our architecture and development but several critical commercial components remain (or have been reinserted) to meet our operational demands.

  10. A Case for Data and Service Fusions

    Science.gov (United States)

    Huang, T.; Boening, C.; Quach, N. T.; Gill, K.; Zlotnicki, V.; Moore, B.; Tsontos, V. M.

    2015-12-01

    In this distributed, data-intensive era, developing any solution that requires multi-disciplinary data and service requires careful review of interfaces with data and service providers. Information is stored in many different locations and data services are distributed across the Internet. In design and development of mash-up heterogeneous data systems, the challenge is not entirely technological; it is our ability to document the external interface specifications and to create a coherent environment for our users. While is impressive to present a complex web of data, the true measure of our success is in the quality of the data we are serving, the throughput of our creation, and user experience. The presentation presents two current funded NASA projects that require integration of heterogeneous data and service that reside in different locations. The NASA Sea Level Change Portal is designed a "one-stop" source for current sea level change information. Behind this portal is an architecture that integrates data and services from various sources, which includes PI-generated products, satellite products from the DAACs, and metadata from ESDIS Common Metadata Repository (CMR) and other sources, and services reside in the data centers, universities, and ESDIS. The recently funded Distributed Oceanographic Matchup Service (DOMS) project is a project under the NASA Advance Information Technology (AIST) program. DOMS will integrate with satellite products managed by NASA Physical Oceanography Distributed Active Archive Center (PO.DAAC) and three different in-situ projects that are located in difference parts of the U.S. These projects are good examples of delivering content-rich solutions through mash-up of heterogeneous data and systems.

  11. Mining metadata from unidentified ITS sequences in GenBank: A case study in Inocybe (Basidiomycota

    Directory of Open Access Journals (Sweden)

    Jacobsson Stig

    2008-02-01

    Full Text Available Abstract Background The lack of reference sequences from well-identified mycorrhizal fungi often poses a challenge to the inference of taxonomic affiliation of sequences from environmental samples, and many environmental sequences are thus left unidentified. Such unidentified sequences belonging to the widely distributed ectomycorrhizal fungal genus Inocybe (Basidiomycota were retrieved from GenBank and divided into species that were identified in a phylogenetic context using a reference dataset from an ongoing study of the genus. The sequence metadata of the unidentified Inocybe sequences stored in GenBank, as well as data from the corresponding original papers, were compiled and used to explore the ecology and distribution of the genus. In addition, the relative occurrence of Inocybe was contrasted to that of other mycorrhizal genera. Results Most species of Inocybe were found to have less than 3% intraspecific variability in the ITS2 region of the nuclear ribosomal DNA. This cut-off value was used jointly with phylogenetic analysis to delimit and identify unidentified Inocybe sequences to species level. A total of 177 unidentified Inocybe ITS sequences corresponding to 98 species were recovered, 32% of which were successfully identified to species level in this study. These sequences account for an unexpectedly large proportion of the publicly available unidentified fungal ITS sequences when compared with other mycorrhizal genera. Eight Inocybe species were reported from multiple hosts and some even from hosts forming arbutoid or orchid mycorrhizae. Furthermore, Inocybe sequences have been reported from four continents and in climate zones ranging from cold temperate to equatorial climate. Out of the 19 species found in more than one study, six were found in both Europe and North America and one was found in both Europe and Japan, indicating that at least many north temperate species have a wide distribution. Conclusion Although DNA

  12. Mining metadata from unidentified ITS sequences in GenBank: a case study in Inocybe (Basidiomycota).

    Science.gov (United States)

    Ryberg, Martin; Nilsson, R Henrik; Kristiansson, Erik; Töpel, Mats; Jacobsson, Stig; Larsson, Ellen

    2008-02-18

    The lack of reference sequences from well-identified mycorrhizal fungi often poses a challenge to the inference of taxonomic affiliation of sequences from environmental samples, and many environmental sequences are thus left unidentified. Such unidentified sequences belonging to the widely distributed ectomycorrhizal fungal genus Inocybe (Basidiomycota) were retrieved from GenBank and divided into species that were identified in a phylogenetic context using a reference dataset from an ongoing study of the genus. The sequence metadata of the unidentified Inocybe sequences stored in GenBank, as well as data from the corresponding original papers, were compiled and used to explore the ecology and distribution of the genus. In addition, the relative occurrence of Inocybe was contrasted to that of other mycorrhizal genera. Most species of Inocybe were found to have less than 3% intraspecific variability in the ITS2 region of the nuclear ribosomal DNA. This cut-off value was used jointly with phylogenetic analysis to delimit and identify unidentified Inocybe sequences to species level. A total of 177 unidentified Inocybe ITS sequences corresponding to 98 species were recovered, 32% of which were successfully identified to species level in this study. These sequences account for an unexpectedly large proportion of the publicly available unidentified fungal ITS sequences when compared with other mycorrhizal genera. Eight Inocybe species were reported from multiple hosts and some even from hosts forming arbutoid or orchid mycorrhizae. Furthermore, Inocybe sequences have been reported from four continents and in climate zones ranging from cold temperate to equatorial climate. Out of the 19 species found in more than one study, six were found in both Europe and North America and one was found in both Europe and Japan, indicating that at least many north temperate species have a wide distribution. Although DNA-based species identification and circumscription are associated

  13. Latent Feature Models for Uncovering Human Mobility Patterns from Anonymized User Location Traces with Metadata

    KAUST Repository

    Alharbi, Basma Mohammed

    2017-04-10

    In the mobile era, data capturing individuals’ locations have become unprecedentedly available. Data from Location-Based Social Networks is one example of large-scale user-location data. Such data provide a valuable source for understanding patterns governing human mobility, and thus enable a wide range of research. However, mining and utilizing raw user-location data is a challenging task. This is mainly due to the sparsity of data (at the user level), the imbalance of data with power-law users and locations check-ins degree (at the global level), and more importantly the lack of a uniform low-dimensional feature space describing users. Three latent feature models are proposed in this dissertation. Each proposed model takes as an input a collection of user-location check-ins, and outputs a new representation space for users and locations respectively. To avoid invading users privacy, the proposed models are designed to learn from anonymized location data where only IDs - not geophysical positioning or category - of locations are utilized. To enrich the inferred mobility patterns, the proposed models incorporate metadata, often associated with user-location data, into the inference process. In this dissertation, two types of metadata are utilized to enrich the inferred patterns, timestamps and social ties. Time adds context to the inferred patterns, while social ties amplifies incomplete user-location check-ins. The first proposed model incorporates timestamps by learning from collections of users’ locations sharing the same discretized time. The second proposed model also incorporates time into the learning model, yet takes a further step by considering time at different scales (hour of a day, day of a week, month, and so on). This change in modeling time allows for capturing meaningful patterns over different times scales. The last proposed model incorporates social ties into the learning process to compensate for inactive users who contribute a large volume

  14. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata

    Science.gov (United States)

    Liolios, Konstantinos; Chen, I-Min A.; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Philip; Markowitz, Victor M.; Kyrpides, Nikos C.

    2010-01-01

    The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification. GOLD is available at: http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece, at: http://gold.imbb.forth.gr/ PMID:19914934

  15. Employing Metadata Standards in Electronic Records and Document Management a Path before Archives and Documentation and Information Centers

    Directory of Open Access Journals (Sweden)

    Ali Reza Saadat

    2006-10-01

    Full Text Available Archives and special documentations and information centers within government offices, companies and organizations house a collection of paper documents within themselves. The rising number of these documents and storage space limitation on one hand, and current organizational trend towards e-government on the other, had caused these documents to be increasingly converted into electronic format with concomitant change in management and preservation strategy. Electronic Document and Records Management or EDRM is one such management strategy. The most important management issues are consistency, authority, interface, description and retrieval.  These issues emphasize the role of metadata given their unique capabilities in this respect. The present paper, while introducing the international standards in Electronic Record Management, would discuss the common metadata standards drafted such as e-GMS, AGLS, GILS, DC.

  16. The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata

    Science.gov (United States)

    Pagani, Ioanna; Liolios, Konstantinos; Jansson, Jakob; Chen, I-Min A.; Smirnova, Tatyana; Nosrat, Bahador; Markowitz, Victor M.; Kyrpides, Nikos C.

    2012-01-01

    The Genomes OnLine Database (GOLD, http://www.genomesonline.org/) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2011, GOLD, now on version 4.0, contains information for 11 472 sequencing projects, of which 2907 have been completed and their sequence data has been deposited in a public repository. Out of these complete projects, 1918 are finished and 989 are permanent drafts. Moreover, GOLD contains information for 340 metagenome studies associated with 1927 metagenome samples. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about any (x) Sequence specification and beyond. PMID:22135293

  17. Looking back on 10 years of the ATLAS Metadata Interface. Reflections on architecture, code design and development methods.

    CERN Document Server

    Fulachier, J; The ATLAS collaboration; Albrand, S; Lambert, F

    2014-01-01

    The “ATLAS Metadata Interface” framework (AMI) has been developed in the context of ATLAS, one of the largest scientific collaborations. AMI can be considered to be a mature application, since its basic architecture has been maintained for over 10 years. In this paper we will briefly describe the architecture and the main uses of the framework within the experiment (TagCollector for release management and Dataset Discovery). These two applications, which share almost 2000 registered users, are superficially quite different, however much of the code is shared and they have been developed and maintained over a decade almost completely by the same team of 3 people. We will discuss how the architectural principles established at the beginning of the project have allowed us to continue both to integrate the new technologies and to respond to the new metadata use cases which inevitably appear over such a time period.

  18. Looking back on 10 years of the ATLAS Metadata Interface. Reflections on architecture, code design and development methods

    Science.gov (United States)

    Fulachier, J.; Aidel, O.; Albrand, S.; Lambert, F.; Atlas Collaboration

    2014-06-01

    The "ATLAS Metadata Interface" framework (AMI) has been developed in the context of ATLAS, one of the largest scientific collaborations. AMI can be considered to be a mature application, since its basic architecture has been maintained for over 10 years. In this paper we describe briefly the architecture and the main uses of the framework within the experiment (TagCollector for release management and Dataset Discovery). These two applications, which share almost 2000 registered users, are superficially quite different, however much of the code is shared and they have been developed and maintained over a decade almost completely by the same team of 3 people. We discuss how the architectural principles established at the beginning of the project have allowed us to continue both to integrate the new technologies and to respond to the new metadata use cases which inevitably appear over such a time period.

  19. Looking back on 10 years of the ATLAS Metadata Interface. Reflections on architecture, code design and development methods

    International Nuclear Information System (INIS)

    Fulachier, J; Albrand, S; Lambert, F; Aidel, O

    2014-01-01

    The 'ATLAS Metadata Interface' framework (AMI) has been developed in the context of ATLAS, one of the largest scientific collaborations. AMI can be considered to be a mature application, since its basic architecture has been maintained for over 10 years. In this paper we describe briefly the architecture and the main uses of the framework within the experiment (TagCollector for release management and Dataset Discovery). These two applications, which share almost 2000 registered users, are superficially quite different, however much of the code is shared and they have been developed and maintained over a decade almost completely by the same team of 3 people. We discuss how the architectural principles established at the beginning of the project have allowed us to continue both to integrate the new technologies and to respond to the new metadata use cases which inevitably appear over such a time period.

  20. OntoStudyEdit: a new approach for ontology-based representation and management of metadata in clinical and epidemiological research.

    Science.gov (United States)

    Uciteli, Alexandr; Herre, Heinrich

    2015-01-01

    The specification of metadata in clinical and epidemiological study projects absorbs significant expense. The validity and quality of the collected data depend heavily on the precise and semantical correct representation of their metadata. In various research organizations, which are planning and coordinating studies, the required metadata are specified differently, depending on many conditions, e.g., on the used study management software. The latter does not always meet the needs of a particular research organization, e.g., with respect to the relevant metadata attributes and structuring possibilities. The objective of the research, set forth in this paper, is the development of a new approach for ontology-based representation and management of metadata. The basic features of this approach are demonstrated by the software tool OntoStudyEdit (OSE). The OSE is designed and developed according to the three ontology method. This method for developing software is based on the interactions of three different kinds of ontologies: a task ontology, a domain ontology and a top-level ontology. The OSE can be easily adapted to different requirements, and it supports an ontologically founded representation and efficient management of metadata. The metadata specifications can by imported from various sources; they can be edited with the OSE, and they can be exported in/to several formats, which are used, e.g., by different study management software. Advantages of this approach are the adaptability of the OSE by integrating suitable domain ontologies, the ontological specification of mappings between the import/export formats and the DO, the specification of the study metadata in a uniform manner and its reuse in different research projects, and an intuitive data entry for non-expert users.