WorldWideScience

Sample records for scalable event-level metadata

  1. Building a scalable event-level metadata service for ATLAS

    Cranshaw, J; Malon, D; Goosens, L; Viegas, F T A; McGlone, H

    2008-01-01

    The ATLAS TAG Database is a multi-terabyte event-level metadata selection system, intended to allow discovery, selection of and navigation to events of interest to an analysis. The TAG Database encompasses file- and relational-database-resident event-level metadata, distributed across all ATLAS Tiers. An oracle hosted global TAG relational database, containing all ATLAS events, implemented in Oracle, will exist at Tier O. Implementing a system that is both performant and manageable at this scale is a challenge. A 1 TB relational TAG Database has been deployed at Tier 0 using simulated tag data. The database contains one billion events, each described by two hundred event metadata attributes, and is currently undergoing extensive testing in terms of queries, population and manageability. These 1 TB tests aim to demonstrate and optimise the performance and scalability of an Oracle TAG Database on a global scale. Partitioning and indexing strategies are crucial to well-performing queries and manageability of the database and have implications for database population and distribution, so these are investigated. Physics query patterns are anticipated, but a crucial feature of the system must be to support a broad range of queries across all attributes. Concurrently, event tags from ATLAS Computing System Commissioning distributed simulations are accumulated in an Oracle-hosted database at CERN, providing an event-level selection service valuable for user experience and gathering information about physics query patterns. In this paper we describe the status of the Global TAG relational database scalability work and highlight areas of future direction

  2. Event metadata records as a testbed for scalable data mining

    Gemmeren, P van; Malon, D

    2010-01-01

    At a data rate of 200 hertz, event metadata records ('TAGs,' in ATLAS parlance) provide fertile grounds for development and evaluation of tools for scalable data mining. It is easy, of course, to apply HEP-specific selection or classification rules to event records and to label such an exercise 'data mining,' but our interest is different. Advanced statistical methods and tools such as classification, association rule mining, and cluster analysis are common outside the high energy physics community. These tools can prove useful, not for discovery physics, but for learning about our data, our detector, and our software. A fixed and relatively simple schema makes TAG export to other storage technologies such as HDF5 straightforward. This simplifies the task of exploiting very-large-scale parallel platforms such as Argonne National Laboratory's BlueGene/P, currently the largest supercomputer in the world for open science, in the development of scalable tools for data mining. Using a domain-neutral scientific data format may also enable us to take advantage of existing data mining components from other communities. There is, further, a substantial literature on the topic of one-pass algorithms and stream mining techniques, and such tools may be inserted naturally at various points in the event data processing and distribution chain. This paper describes early experience with event metadata records from ATLAS simulation and commissioning as a testbed for scalable data mining tool development and evaluation.

  3. Scalable Metadata Management for a Large Multi-Source Seismic Data Repository

    Gaylord, J. M. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Dodge, D. A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Magana-Zook, S. A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Barno, J. G. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Knapp, D. R. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Thomas, J. M. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Sullivan, D. S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Ruppert, S. D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Mellors, R. J. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2017-05-26

    In this work, we implemented the key metadata management components of a scalable seismic data ingestion framework to address limitations in our existing system, and to position it for anticipated growth in volume and complexity.

  4. Scalable Metadata Management for a Large Multi-Source Seismic Data Repository

    Gaylord, J. M. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Dodge, D. A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Magana-Zook, S. A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Barno, J. G. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Knapp, D. R. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2017-04-11

    In this work, we implemented the key metadata management components of a scalable seismic data ingestion framework to address limitations in our existing system, and to position it for anticipated growth in volume and complexity. We began the effort with an assessment of open source data flow tools from the Hadoop ecosystem. We then began the construction of a layered architecture that is specifically designed to address many of the scalability and data quality issues we experience with our current pipeline. This included implementing basic functionality in each of the layers, such as establishing a data lake, designing a unified metadata schema, tracking provenance, and calculating data quality metrics. Our original intent was to test and validate the new ingestion framework with data from a large-scale field deployment in a temporary network. This delivered somewhat unsatisfying results, since the new system immediately identified fatal flaws in the data relatively early in the pipeline. Although this is a correct result it did not allow us to sufficiently exercise the whole framework. We then widened our scope to process all available metadata from over a dozen online seismic data sources to further test the implementation and validate the design. This experiment also uncovered a higher than expected frequency of certain types of metadata issues that challenged us to further tune our data management strategy to handle them. Our result from this project is a greatly improved understanding of real world data issues, a validated design, and prototype implementations of major components of an eventual production framework. This successfully forms the basis of future development for the Geophysical Monitoring Program data pipeline, which is a critical asset supporting multiple programs. It also positions us very well to deliver valuable metadata management expertise to our sponsors, and has already resulted in an NNSA Office of Defense Nuclear Nonproliferation

  5. Metadata

    Zeng, Marcia Lei

    2016-01-01

    Metadata remains the solution for describing the explosively growing, complex world of digital information, and continues to be of paramount importance for information professionals. Providing a solid grounding in the variety and interrelationships among different metadata types, Zeng and Qin's thorough revision of their benchmark text offers a comprehensive look at the metadata schemas that exist in the world of library and information science and beyond, as well as the contexts in which they operate. Cementing its value as both an LIS text and a handy reference for professionals already in the field, this book: * Lays out the fundamentals of metadata, including principles of metadata, structures of metadata vocabularies, and metadata descriptions * Surveys metadata standards and their applications in distinct domains and for various communities of metadata practice * Examines metadata building blocks, from modelling to defining properties, and from designing application profiles to implementing value vocabu...

  6. Metadata

    Pomerantz, Jeffrey

    2015-01-01

    When "metadata" became breaking news, appearing in stories about surveillance by the National Security Agency, many members of the public encountered this once-obscure term from information science for the first time. Should people be reassured that the NSA was "only" collecting metadata about phone calls -- information about the caller, the recipient, the time, the duration, the location -- and not recordings of the conversations themselves? Or does phone call metadata reveal more than it seems? In this book, Jeffrey Pomerantz offers an accessible and concise introduction to metadata. In the era of ubiquitous computing, metadata has become infrastructural, like the electrical grid or the highway system. We interact with it or generate it every day. It is not, Pomerantz tell us, just "data about data." It is a means by which the complexity of an object is represented in a simpler form. For example, the title, the author, and the cover art are metadata about a book. When metadata does its job well, it fades i...

  7. An integrated overview of metadata in ATLAS

    Gallas, E J; Malon, D; Hawkings, R J; Albrand, S; Torrence, E

    2010-01-01

    Metadata (data about data) arise in many contexts, from many diverse sources, and at many levels in ATLAS. Familiar examples include run-level, luminosity-block-level, and event-level metadata, and, related to processing and organization, dataset-level and file-level metadata, but these categories are neither exhaustive nor orthogonal. Some metadata are known a priori, in advance of data taking or simulation; other metadata are known only after processing, and occasionally, quite late (e.g., detector status or quality updates that may appear after initial reconstruction is complete). Metadata that may seem relevant only internally to the distributed computing infrastructure under ordinary conditions may become relevant to physics analysis under error conditions ('What can I discover about data I failed to process?'). This talk provides an overview of metadata and metadata handling in ATLAS, and describes ongoing work to deliver integrated metadata services in support of physics analysis.

  8. Drivers of flood damage on event level

    Kreibich, H.; Aerts, J. C. J. H.; Apel, H.

    2016-01-01

    Flood risk is dynamic and influenced by many processes related to hazard, exposure and vulnerability. Flood damage increased significantly over the past decades, however, resulting overall economic loss per event is an aggregated indicator and it is difficult to attribute causes to this increasing...... trend. Much has been learned about damaging processes during floods at the micro-scale, e.g. building level. However, little is known about the main factors determining the amount of flood damage on event level. Thus, we analyse and compare paired flood events, i.e. consecutive, similar damaging floods...... example are the 2002 and 2013 floods in the Elbe and Danube catchments in Germany. The 2002 flood caused the highest economic damage (EUR 11600 million) due to a natural hazard event in Germany. Damage was so high due to extreme flood hazard triggered by extreme precipitation and a high number...

  9. Active Marine Station Metadata

    National Oceanic and Atmospheric Administration, Department of Commerce — The Active Marine Station Metadata is a daily metadata report for active marine bouy and C-MAN (Coastal Marine Automated Network) platforms from the National Data...

  10. Tethys Acoustic Metadata Database

    National Oceanic and Atmospheric Administration, Department of Commerce — The Tethys database houses the metadata associated with the acoustic data collection efforts by the Passive Acoustic Group. These metadata include dates, locations...

  11. Scalable devices

    Krü ger, Jens J.; Hadwiger, Markus

    2014-01-01

    In computer science in general and in particular the field of high performance computing and supercomputing the term scalable plays an important role. It indicates that a piece of hardware, a concept, an algorithm, or an entire system scales

  12. Metadata aided run selection at ATLAS

    Buckingham, R M; Gallas, E J; Tseng, J C-L; Viegas, F; Vinek, E

    2011-01-01

    Management of the large volume of data collected by any large scale scientific experiment requires the collection of coherent metadata quantities, which can be used by reconstruction or analysis programs and/or user interfaces, to pinpoint collections of data needed for specific purposes. In the ATLAS experiment at the LHC, we have collected metadata from systems storing non-event-wise data (Conditions) into a relational database. The Conditions metadata (COMA) database tables not only contain conditions known at the time of event recording, but also allow for the addition of conditions data collected as a result of later analysis of the data (such as improved measurements of beam conditions or assessments of data quality). A new web based interface called 'runBrowser' makes these Conditions Metadata available as a Run based selection service. runBrowser, based on PHP and JavaScript, uses jQuery to present selection criteria and report results. It not only facilitates data selection by conditions attributes, but also gives the user information at each stage about the relationship between the conditions chosen and the remaining conditions criteria available. When a set of COMA selections are complete, runBrowser produces a human readable report as well as an XML file in a standardized ATLAS format. This XML can be saved for later use or refinement in a future runBrowser session, shared with physics/detector groups, or used as input to ELSSI (event level Metadata browser) or other ATLAS run or event processing services.

  13. Scalable devices

    Krüger, Jens J.

    2014-01-01

    In computer science in general and in particular the field of high performance computing and supercomputing the term scalable plays an important role. It indicates that a piece of hardware, a concept, an algorithm, or an entire system scales with the size of the problem, i.e., it can not only be used in a very specific setting but it\\'s applicable for a wide range of problems. From small scenarios to possibly very large settings. In this spirit, there exist a number of fixed areas of research on scalability. There are works on scalable algorithms, scalable architectures but what are scalable devices? In the context of this chapter, we are interested in a whole range of display devices, ranging from small scale hardware such as tablet computers, pads, smart-phones etc. up to large tiled display walls. What interests us mostly is not so much the hardware setup but mostly the visualization algorithms behind these display systems that scale from your average smart phone up to the largest gigapixel display walls.

  14. USGIN ISO metadata profile

    Richard, S. M.

    2011-12-01

    The USGIN project has drafted and is using a specification for use of ISO 19115/19/39 metadata, recommendations for simple metadata content, and a proposal for a URI scheme to identify resources using resolvable http URI's(see http://lab.usgin.org/usgin-profiles). The principal target use case is a catalog in which resources can be registered and described by data providers for discovery by users. We are currently using the ESRI Geoportal (Open Source), with configuration files for the USGIN profile. The metadata offered by the catalog must provide sufficient content to guide search engines to locate requested resources, to describe the resource content, provenance, and quality so users can determine if the resource will serve for intended usage, and finally to enable human users and sofware clients to obtain or access the resource. In order to achieve an operational federated catalog system, provisions in the ISO specification must be restricted and usage clarified to reduce the heterogeneity of 'standard' metadata and service implementations such that a single client can search against different catalogs, and the metadata returned by catalogs can be parsed reliably to locate required information. Usage of the complex ISO 19139 XML schema allows for a great deal of structured metadata content, but the heterogenity in approaches to content encoding has hampered development of sophisticated client software that can take advantage of the rich metadata; the lack of such clients in turn reduces motivation for metadata producers to produce content-rich metadata. If the only significant use of the detailed, structured metadata is to format into text for people to read, then the detailed information could be put in free text elements and be just as useful. In order for complex metadata encoding and content to be useful, there must be clear and unambiguous conventions on the encoding that are utilized by the community that wishes to take advantage of advanced metadata

  15. Harvesting NASA's Common Metadata Repository

    Shum, D.; Mitchell, A. E.; Durbin, C.; Norton, J.

    2017-12-01

    As part of NASA's Earth Observing System Data and Information System (EOSDIS), the Common Metadata Repository (CMR) stores metadata for over 30,000 datasets from both NASA and international providers along with over 300M granules. This metadata enables sub-second discovery and facilitates data access. While the CMR offers a robust temporal, spatial and keyword search functionality to the general public and international community, it is sometimes more desirable for international partners to harvest the CMR metadata and merge the CMR metadata into a partner's existing metadata repository. This poster will focus on best practices to follow when harvesting CMR metadata to ensure that any changes made to the CMR can also be updated in a partner's own repository. Additionally, since each partner has distinct metadata formats they are able to consume, the best practices will also include guidance on retrieving the metadata in the desired metadata format using CMR's Unified Metadata Model translation software.

  16. Studies of Big Data metadata segmentation between relational and non-relational databases

    Golosova, M. V.; Grigorieva, M. A.; Klimentov, A. A.; Ryabinkin, E. A.; Dimitrov, G.; Potekhin, M.

    2015-12-01

    In recent years the concepts of Big Data became well established in IT. Systems managing large data volumes produce metadata that describe data and workflows. These metadata are used to obtain information about current system state and for statistical and trend analysis of the processes these systems drive. Over the time the amount of the stored metadata can grow dramatically. In this article we present our studies to demonstrate how metadata storage scalability and performance can be improved by using hybrid RDBMS/NoSQL architecture.

  17. Studies of Big Data metadata segmentation between relational and non-relational databases

    Golosova, M V; Klimentov, A A; Ryabinkin, E A; Dimitrov, G; Potekhin, M

    2015-01-01

    In recent years the concepts of Big Data became well established in IT. Systems managing large data volumes produce metadata that describe data and workflows. These metadata are used to obtain information about current system state and for statistical and trend analysis of the processes these systems drive. Over the time the amount of the stored metadata can grow dramatically. In this article we present our studies to demonstrate how metadata storage scalability and performance can be improved by using hybrid RDBMS/NoSQL architecture.

  18. NAIP National Metadata

    Farm Service Agency, Department of Agriculture — The NAIP National Metadata Map contains USGS Quarter Quad and NAIP Seamline boundaries for every year NAIP imagery has been collected. Clicking on the map also makes...

  19. ATLAS Metadata Task Force

    ATLAS Collaboration; Costanzo, D.; Cranshaw, J.; Gadomski, S.; Jezequel, S.; Klimentov, A.; Lehmann Miotto, G.; Malon, D.; Mornacchi, G.; Nemethy, P.; Pauly, T.; von der Schmitt, H.; Barberis, D.; Gianotti, F.; Hinchliffe, I.; Mapelli, L.; Quarrie, D.; Stapnes, S.

    2007-04-04

    This document provides an overview of the metadata, which are needed to characterizeATLAS event data at different levels (a complete run, data streams within a run, luminosity blocks within a run, individual events).

  20. Data, Metadata, and Ted

    Borgman, Christine L.

    2014-01-01

    Ted Nelson coined the term “hypertext” and developed Xanadu in a universe parallel to the one in which librarians, archivists, and documentalists were creating metadata to establish cross-connections among the myriad topics of this world. When these universes collided, comets exploded as ontologies proliferated. Black holes were formed as data disappeared through lack of description. Today these universes coexist, each informing the other, if not always happily: the formal rules of metadata, ...

  1. The RBV metadata catalog

    Andre, Francois; Fleury, Laurence; Gaillardet, Jerome; Nord, Guillaume

    2015-04-01

    RBV (Réseau des Bassins Versants) is a French initiative to consolidate the national efforts made by more than 15 elementary observatories funded by various research institutions (CNRS, INRA, IRD, IRSTEA, Universities) that study river and drainage basins. The RBV Metadata Catalogue aims at giving an unified vision of the work produced by every observatory to both the members of the RBV network and any external person interested by this domain of research. Another goal is to share this information with other existing metadata portals. Metadata management is heterogeneous among observatories ranging from absence to mature harvestable catalogues. Here, we would like to explain the strategy used to design a state of the art catalogue facing this situation. Main features are as follows : - Multiple input methods: Metadata records in the catalog can either be entered with the graphical user interface, harvested from an existing catalogue or imported from information system through simplified web services. - Hierarchical levels: Metadata records may describe either an observatory, one of its experimental site or a single dataset produced by one instrument. - Multilingualism: Metadata can be easily entered in several configurable languages. - Compliance to standards : the backoffice part of the catalogue is based on a CSW metadata server (Geosource) which ensures ISO19115 compatibility and the ability of being harvested (globally or partially). On going tasks focus on the use of SKOS thesaurus and SensorML description of the sensors. - Ergonomy : The user interface is built with the GWT Framework to offer a rich client application with a fully ajaxified navigation. - Source code sharing : The work has led to the development of reusable components which can be used to quickly create new metadata forms in other GWT applications You can visit the catalogue (http://portailrbv.sedoo.fr/) or contact us by email rbv@sedoo.fr.

  2. A programmatic view of metadata, metadata services, and metadata flow in ATLAS

    Malon, D; Albrand, S; Gallas, E; Stewart, G

    2012-01-01

    The volume and diversity of metadata in an experiment of the size and scope of ATLAS are considerable. Even the definition of metadata may seem context-dependent: data that are primary for one purpose may be metadata for another. ATLAS metadata services must integrate and federate information from inhomogeneous sources and repositories, map metadata about logical or physics constructs to deployment and production constructs, provide a means to associate metadata at one level of granularity with processing or decision-making at another, offer a coherent and integrated view to physicists, and support both human use and programmatic access. In this paper we consider ATLAS metadata, metadata services, and metadata flow principally from the illustrative perspective of how disparate metadata are made available to executing jobs and, conversely, how metadata generated by such jobs are returned. We describe how metadata are read, how metadata are cached, and how metadata generated by jobs and the tasks of which they are a part are communicated, associated with data products, and preserved. We also discuss the principles that guide decision-making about metadata storage, replication, and access.

  3. A Programmatic View of Metadata, Metadata Services, and Metadata Flow in ATLAS

    CERN. Geneva

    2012-01-01

    The volume and diversity of metadata in an experiment of the size and scope of ATLAS is considerable. Even the definition of metadata may seem context-dependent: data that are primary for one purpose may be metadata for another. Trigger information and data from the Large Hadron Collider itself provide cases in point, but examples abound. Metadata about logical or physics constructs, such as data-taking periods and runs and luminosity blocks and events and algorithms, often need to be mapped to deployment and production constructs, such as datasets and jobs and files and software versions, and vice versa. Metadata at one level of granularity may have implications at another. ATLAS metadata services must integrate and federate information from inhomogeneous sources and repositories, map metadata about logical or physics constructs to deployment and production constructs, provide a means to associate metadata at one level of granularity with processing or decision-making at another, offer a coherent and ...

  4. Metadata in Scientific Dialects

    Habermann, T.

    2011-12-01

    Discussions of standards in the scientific community have been compared to religious wars for many years. The only things scientists agree on in these battles are either "standards are not useful" or "everyone can benefit from using my standard". Instead of achieving the goal of facilitating interoperable communities, in many cases the standards have served to build yet another barrier between communities. Some important progress towards diminishing these obstacles has been made in the data layer with the merger of the NetCDF and HDF scientific data formats. The universal adoption of XML as the standard for representing metadata and the recent adoption of ISO metadata standards by many groups around the world suggests that similar convergence is underway in the metadata layer. At the same time, scientists and tools will likely need support for native tongues for some time. I will describe an approach that combines re-usable metadata "components" and restful web services that provide those components in many dialects. This approach uses advanced XML concepts of referencing and linking to construct complete records that include reusable components and builds on the ISO Standards as the "unabridged dictionary" that encompasses the content of many other dialects.

  5. Languages for Metadata

    Brussee, R.; Veenstra, M.; Blanken, Henk; de Vries, A.P.; Blok, H.E.; Feng, L.

    2007-01-01

    The term meta origins from the Greek word µ∈τα, meaning after. The word Metaphysics is the title of Aristotle’s book coming after his book on nature called Physics. This has given meta the modern connotation of a nature of a higher order or of a more fundamental kind [1]. Literally, metadata is

  6. Automated Metadata Extraction

    2008-06-01

    Store [4]. The files purchased from the iTunes Music Store include the following metadata. • Name • Email address of purchaser • Year • Album ...6 3. Music : MP3 and AAC .........................................................................7 4. Tagged Image File Format...Expert Group (MPEG) set of standards for music encoding. Open Document Format (ODF) – an open, license-free, and clearly documented file format

  7. Cytometry metadata in XML

    Leif, Robert C.; Leif, Stephanie H.

    2016-04-01

    Introduction: The International Society for Advancement of Cytometry (ISAC) has created a standard for the Minimum Information about a Flow Cytometry Experiment (MIFlowCyt 1.0). CytometryML will serve as a common metadata standard for flow and image cytometry (digital microscopy). Methods: The MIFlowCyt data-types were created, as is the rest of CytometryML, in the XML Schema Definition Language (XSD1.1). The datatypes are primarily based on the Flow Cytometry and the Digital Imaging and Communication (DICOM) standards. A small section of the code was formatted with standard HTML formatting elements (p, h1, h2, etc.). Results:1) The part of MIFlowCyt that describes the Experimental Overview including the specimen and substantial parts of several other major elements has been implemented as CytometryML XML schemas (www.cytometryml.org). 2) The feasibility of using MIFlowCyt to provide the combination of an overview, table of contents, and/or an index of a scientific paper or a report has been demonstrated. Previously, a sample electronic publication, EPUB, was created that could contain both MIFlowCyt metadata as well as the binary data. Conclusions: The use of CytometryML technology together with XHTML5 and CSS permits the metadata to be directly formatted and together with the binary data to be stored in an EPUB container. This will facilitate: formatting, data- mining, presentation, data verification, and inclusion in structured research, clinical, and regulatory documents, as well as demonstrate a publication's adherence to the MIFlowCyt standard, promote interoperability and should also result in the textual and numeric data being published using web technology without any change in composition.

  8. Metadata Realities for Cyberinfrastructure: Data Authors as Metadata Creators

    Mayernik, Matthew Stephen

    2011-01-01

    As digital data creation technologies become more prevalent, data and metadata management are necessary to make data available, usable, sharable, and storable. Researchers in many scientific settings, however, have little experience or expertise in data and metadata management. In this dissertation, I explore the everyday data and metadata…

  9. Creating preservation metadata from XML-metadata profiles

    Ulbricht, Damian; Bertelmann, Roland; Gebauer, Petra; Hasler, Tim; Klump, Jens; Kirchner, Ingo; Peters-Kottig, Wolfgang; Mettig, Nora; Rusch, Beate

    2014-05-01

    Registration of dataset DOIs at DataCite makes research data citable and comes with the obligation to keep data accessible in the future. In addition, many universities and research institutions measure data that is unique and not repeatable like the data produced by an observational network and they want to keep these data for future generations. In consequence, such data should be ingested in preservation systems, that automatically care for file format changes. Open source preservation software that is developed along the definitions of the ISO OAIS reference model is available but during ingest of data and metadata there are still problems to be solved. File format validation is difficult, because format validators are not only remarkably slow - due to variety in file formats different validators return conflicting identification profiles for identical data. These conflicts are hard to resolve. Preservation systems have a deficit in the support of custom metadata. Furthermore, data producers are sometimes not aware that quality metadata is a key issue for the re-use of data. In the project EWIG an university institute and a research institute work together with Zuse-Institute Berlin, that is acting as an infrastructure facility, to generate exemplary workflows for research data into OAIS compliant archives with emphasis on the geosciences. The Institute for Meteorology provides timeseries data from an urban monitoring network whereas GFZ Potsdam delivers file based data from research projects. To identify problems in existing preservation workflows the technical work is complemented by interviews with data practitioners. Policies for handling data and metadata are developed. Furthermore, university teaching material is created to raise the future scientists awareness of research data management. As a testbed for ingest workflows the digital preservation system Archivematica [1] is used. During the ingest process metadata is generated that is compliant to the

  10. Mercury Toolset for Spatiotemporal Metadata

    Devarakonda, Ranjeet; Palanisamy, Giri; Green, James; Wilson, Bruce; Rhyne, B. Timothy; Lindsley, Chris

    2010-06-01

    Mercury (http://mercury.ornl.gov) is a set of tools for federated harvesting, searching, and retrieving metadata, particularly spatiotemporal metadata. Version 3.0 of the Mercury toolset provides orders of magnitude improvements in search speed, support for additional metadata formats, integration with Google Maps for spatial queries, facetted type search, support for RSS (Really Simple Syndication) delivery of search results, and enhanced customization to meet the needs of the multiple projects that use Mercury. It provides a single portal to very quickly search for data and information contained in disparate data management systems, each of which may use different metadata formats. Mercury harvests metadata and key data from contributing project servers distributed around the world and builds a centralized index. The search interfaces then allow the users to perform a variety of fielded, spatial, and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury periodically (typically daily)harvests metadata sources through a collection of interfaces and re-indexes these metadata to provide extremely rapid search capabilities, even over collections with tens of millions of metadata records. A number of both graphical and application interfaces have been constructed within Mercury, to enable both human users and other computer programs to perform queries. Mercury was also designed to support multiple different projects, so that the particular fields that can be queried and used with search filters are easy to configure for each different project.

  11. Mercury Toolset for Spatiotemporal Metadata

    Wilson, Bruce E.; Palanisamy, Giri; Devarakonda, Ranjeet; Rhyne, B. Timothy; Lindsley, Chris; Green, James

    2010-01-01

    Mercury (http://mercury.ornl.gov) is a set of tools for federated harvesting, searching, and retrieving metadata, particularly spatiotemporal metadata. Version 3.0 of the Mercury toolset provides orders of magnitude improvements in search speed, support for additional metadata formats, integration with Google Maps for spatial queries, facetted type search, support for RSS (Really Simple Syndication) delivery of search results, and enhanced customization to meet the needs of the multiple projects that use Mercury. It provides a single portal to very quickly search for data and information contained in disparate data management systems, each of which may use different metadata formats. Mercury harvests metadata and key data from contributing project servers distributed around the world and builds a centralized index. The search interfaces then allow the users to perform a variety of fielded, spatial, and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury periodically (typically daily) harvests metadata sources through a collection of interfaces and re-indexes these metadata to provide extremely rapid search capabilities, even over collections with tens of millions of metadata records. A number of both graphical and application interfaces have been constructed within Mercury, to enable both human users and other computer programs to perform queries. Mercury was also designed to support multiple different projects, so that the particular fields that can be queried and used with search filters are easy to configure for each different project.

  12. ATLAS Metadata Interface (AMI), a generic metadata framework

    Fulachier, Jerome; The ATLAS collaboration

    2016-01-01

    The ATLAS Metadata Interface (AMI) is a mature application of more than 15 years of existence. Mainly used by the ATLAS experiment at CERN, it consists of a very generic tool ecosystem for metadata aggregation and cataloguing. We briefly describe the architecture, the main services and the benefits of using AMI in big collaborations, especially for high energy physics. We focus on the recent improvements, for instance: the lightweight clients (Python, Javascript, C++), the new smart task server system and the Web 2.0 AMI framework for simplifying the development of metadata-oriented web interfaces.

  13. ATLAS Metadata Interface (AMI), a generic metadata framework

    Fulachier, J.; Odier, J.; Lambert, F.; ATLAS Collaboration

    2017-10-01

    The ATLAS Metadata Interface (AMI) is a mature application of more than 15 years of existence. Mainly used by the ATLAS experiment at CERN, it consists of a very generic tool ecosystem for metadata aggregation and cataloguing. We briefly describe the architecture, the main services and the benefits of using AMI in big collaborations, especially for high energy physics. We focus on the recent improvements, for instance: the lightweight clients (Python, JavaScript, C++), the new smart task server system and the Web 2.0 AMI framework for simplifying the development of metadata-oriented web interfaces.

  14. ATLAS Metadata Interface (AMI), a generic metadata framework

    AUTHOR|(SzGeCERN)573735; The ATLAS collaboration; Odier, Jerome; Lambert, Fabian

    2017-01-01

    The ATLAS Metadata Interface (AMI) is a mature application of more than 15 years of existence. Mainly used by the ATLAS experiment at CERN, it consists of a very generic tool ecosystem for metadata aggregation and cataloguing. We briefly describe the architecture, the main services and the benefits of using AMI in big collaborations, especially for high energy physics. We focus on the recent improvements, for instance: the lightweight clients (Python, JavaScript, C++), the new smart task server system and the Web 2.0 AMI framework for simplifying the development of metadata-oriented web interfaces.

  15. The future of event-level information repositories, indexing, and selection in ATLAS

    Barberis, D; Cranshaw, J; Malon, D; Gemmeren, P Van; Zhang, Q; Dimitrov, G; Nairz, A; Sorokoletov, R; Doherty, T; Quilty, D; Gallas, E J; Hrivnac, J; Nowak, M

    2014-01-01

    ATLAS maintains a rich corpus of event-by-event information that provides a global view of the billions of events the collaboration has measured or simulated, along with sufficient auxiliary information to navigate to and retrieve data for any event at any production processing stage. This unique resource has been employed for a range of purposes, from monitoring, statistics, anomaly detection, and integrity checking, to event picking, subset selection, and sample extraction. Recent years of data-taking provide a foundation for assessment of how this resource has and has not been used in practice, of the uses for which it should be optimized, of how it should be deployed and provisioned for scalability to future data volumes, and of the areas in which enhancements to functionality would be most valuable. This paper describes how ATLAS event-level information repositories and selection infrastructure are evolving in light of this experience, and in view of their expected roles both in wide-area event delivery services and in an evolving ATLAS analysis model in which the importance of efficient selective access to data can only grow.

  16. Metadata Dictionary Database: A Proposed Tool for Academic Library Metadata Management

    Southwick, Silvia B.; Lampert, Cory

    2011-01-01

    This article proposes a metadata dictionary (MDD) be used as a tool for metadata management. The MDD is a repository of critical data necessary for managing metadata to create "shareable" digital collections. An operational definition of metadata management is provided. The authors explore activities involved in metadata management in…

  17. The metadata manual a practical workbook

    Lubas, Rebecca; Schneider, Ingrid

    2013-01-01

    Cultural heritage professionals have high levels of training in metadata. However, the institutions in which they practice often depend on support staff, volunteers, and students in order to function. With limited time and funding for training in metadata creation for digital collections, there are often many questions about metadata without a reliable, direct source for answers. The Metadata Manual provides such a resource, answering basic metadata questions that may appear, and exploring metadata from a beginner's perspective. This title covers metadata basics, XML basics, Dublin Core, VRA C

  18. METADATA, DESKRIPSI SERTA TITIK AKSESNYA DAN INDOMARC

    Sulistiyo Basuki

    2012-07-01

    Full Text Available lstilah metadata mulai sering muncul dalam literature tentang database management systems (DBMS pada tahun 1980 an. lstilah tersebut digunakan untuk menggambarkan informasi yang diperlukan untuk mencatat karakteristik informasi yang terdapat pada pangkalan data. Banyak sumber yang mengartikan istilah metadata. Metadata dapat diartikan sumber, menunjukan lokasi dokumen, serta memberikan ringkasan yang diperlukan untuk memanfaat-kannya. Secara umum ada 3 bagian yang digunakan untuk membuat metadata sebagai sebuah paket informasi, dan penyandian (encoding pembuatan deskripsi paket informasi, dan penyediaan akses terhadap deskripsi tersebut. Dalam makalah ini diuraikan mengenai konsep data dalam kaitannya dengan perpustakaan. Uraian meliputi definisi metadata; fungsi metadata; standar penyandian (encoding, cantuman bibliografis. surogat, metadata; penciptaan isi cantuman surogat; ancangan terhadap format metadata; serta metadata dan standar metadata.

  19. FSA 2002 Digital Orthophoto Metadata

    Minnesota Department of Natural Resources — Metadata for the 2002 FSA Color Orthophotos Layer. Each orthophoto is represented by a Quarter 24k Quad tile polygon. The polygon attributes contain the quarter-quad...

  20. How libraries use publisher metadata

    Steve Shadle

    2013-11-01

    Full Text Available With the proliferation of electronic publishing, libraries are increasingly relying on publisher-supplied metadata to meet user needs for discovery in library systems. However, many publisher/content provider staff creating metadata are unaware of the end-user environment and how libraries use their metadata. This article provides an overview of the three primary discovery systems that are used by academic libraries, with examples illustrating how publisher-supplied metadata directly feeds into these systems and is used to support end-user discovery and access. Commonly seen metadata problems are discussed, with recommendations suggested. Based on a series of presentations given in Autumn 2012 to the staff of a large publisher, this article uses the University of Washington Libraries systems and services as illustrative examples. Judging by the feedback received from these presentations, publishers (specifically staff not familiar with the big picture of metadata standards work would benefit from a better understanding of the systems and services libraries provide using the data that is created and managed by publishers.

  1. On the Origin of Metadata

    Sam Coppens

    2012-12-01

    Full Text Available Metadata has been around and has evolved for centuries, albeit not recognized as such. Medieval manuscripts typically had illuminations at the start of each chapter, being both a kind of signature for the author writing the script and a pictorial chapter anchor for the illiterates at the time. Nowadays, there is so much fragmented information on the Internet that users sometimes fail to distinguish the real facts from some bended truth, let alone being able to interconnect different facts. Here, the metadata can both act as noise-reductors for detailed recommendations to the end-users, as it can be the catalyst to interconnect related information. Over time, metadata thus not only has had different modes of information, but furthermore, metadata’s relation of information to meaning, i.e., “semantics”, evolved. Darwin’s evolutionary propositions, from “species have an unlimited reproductive capacity”, over “natural selection”, to “the cooperation of mutations leads to adaptation to the environment” show remarkable parallels to both metadata’s different modes of information and to its relation of information to meaning over time. In this paper, we will show that the evolution of the use of (metadata can be mapped to Darwin’s nine evolutionary propositions. As mankind and its behavior are products of an evolutionary process, the evolutionary process of metadata with its different modes of information is on the verge of a new-semantic-era.

  2. THE NEW ONLINE METADATA EDITOR FOR GENERATING STRUCTURED METADATA

    Devarakonda, Ranjeet [ORNL; Shrestha, Biva [ORNL; Palanisamy, Giri [ORNL; Hook, Leslie A [ORNL; Killeffer, Terri S [ORNL; Boden, Thomas A [ORNL; Cook, Robert B [ORNL; Zolly, Lisa [United States Geological Service (USGS); Hutchison, Viv [United States Geological Service (USGS); Frame, Mike [United States Geological Service (USGS); Cialella, Alice [Brookhaven National Laboratory (BNL); Lazer, Kathy [Brookhaven National Laboratory (BNL)

    2014-01-01

    Nobody is better suited to describe data than the scientist who created it. This description about a data is called Metadata. In general terms, Metadata represents the who, what, when, where, why and how of the dataset [1]. eXtensible Markup Language (XML) is the preferred output format for metadata, as it makes it portable and, more importantly, suitable for system discoverability. The newly developed ORNL Metadata Editor (OME) is a Web-based tool that allows users to create and maintain XML files containing key information, or metadata, about the research. Metadata include information about the specific projects, parameters, time periods, and locations associated with the data. Such information helps put the research findings in context. In addition, the metadata produced using OME will allow other researchers to find these data via Metadata clearinghouses like Mercury [2][4]. OME is part of ORNL s Mercury software fleet [2][3]. It was jointly developed to support projects funded by the United States Geological Survey (USGS), U.S. Department of Energy (DOE), National Aeronautics and Space Administration (NASA) and National Oceanic and Atmospheric Administration (NOAA). OME s architecture provides a customizable interface to support project-specific requirements. Using this new architecture, the ORNL team developed OME instances for USGS s Core Science Analytics, Synthesis, and Libraries (CSAS&L), DOE s Next Generation Ecosystem Experiments (NGEE) and Atmospheric Radiation Measurement (ARM) Program, and the international Surface Ocean Carbon Dioxide ATlas (SOCAT). Researchers simply use the ORNL Metadata Editor to enter relevant metadata into a Web-based form. From the information on the form, the Metadata Editor can create an XML file on the server that the editor is installed or to the user s personal computer. Researchers can also use the ORNL Metadata Editor to modify existing XML metadata files. As an example, an NGEE Arctic scientist use OME to register

  3. The Machinic Temporality of Metadata

    Claudio Celis

    2015-03-01

    Full Text Available In 1990 Deleuze introduced the hypothesis that disciplinary societies are gradually being replaced by a new logic of power: control. Accordingly, Matteo Pasquinelli has recently argued that we are moving towards societies of metadata, which correspond to a new stage of what Deleuze called control societies. Societies of metadata are characterised for the central role that meta-information acquires both as a source of surplus value and as an apparatus of social control. The aim of this article is to develop Pasquinelli’s thesis by examining the temporal scope of these emerging societies of metadata. In particular, this article employs Guattari’s distinction between human and machinic times. Through these two concepts, this article attempts to show how societies of metadata combine the two poles of capitalist power formations as identified by Deleuze and Guattari, i.e. social subjection and machinic enslavement. It begins by presenting the notion of metadata in order to identify some of the defining traits of contemporary capitalism. It then examines Berardi’s account of the temporality of the attention economy from the perspective of the asymmetric relation between cyber-time and human time. The third section challenges Berardi’s definition of the temporality of the attention economy by using Guattari’s notions of human and machinic times. Parts four and five fall back upon Deleuze and Guattari’s notions of machinic surplus labour and machinic enslavement, respectively. The concluding section tries to show that machinic and human times constitute two poles of contemporary power formations that articulate the temporal dimension of societies of metadata.

  4. Enriching The Metadata On CDS

    Chhibber, Nalin

    2014-01-01

    The project report revolves around the open source software package called Invenio. It provides the tools for management of digital assets in a repository and drives CERN Document Server. Primary objective is to enhance the existing metadata in CDS with data from other libraries. An implicit part of this task is to manage disambiguation (within incoming data), removal of multiple entries and handle replications between new and existing records. All such elements and their corresponding changes are integrated within Invenio to make the upgraded metadata available on the CDS. Latter part of the report discuss some changes related to the Invenio code-base itself.

  5. The PDS4 Metadata Management System

    Raugh, A. C.; Hughes, J. S.

    2018-04-01

    We present the key features of the Planetary Data System (PDS) PDS4 Information Model as an extendable metadata management system for planetary metadata related to data structure, analysis/interpretation, and provenance.

  6. U.S. EPA Metadata Editor (EME)

    U.S. Environmental Protection Agency — The EPA Metadata Editor (EME) allows users to create geospatial metadata that meets EPA's requirements. The tool has been developed as a desktop application that...

  7. Radiological dose and metadata management

    Walz, M.; Madsack, B.; Kolodziej, M.

    2016-01-01

    This article describes the features of management systems currently available in Germany for extraction, registration and evaluation of metadata from radiological examinations, particularly in the digital imaging and communications in medicine (DICOM) environment. In addition, the probable relevant developments in this area concerning radiation protection legislation, terminology, standardization and information technology are presented. (orig.) [de

  8. Scalable coherent interface

    Alnaes, K.; Kristiansen, E.H.; Gustavson, D.B.; James, D.V.

    1990-01-01

    The Scalable Coherent Interface (IEEE P1596) is establishing an interface standard for very high performance multiprocessors, supporting a cache-coherent-memory model scalable to systems with up to 64K nodes. This Scalable Coherent Interface (SCI) will supply a peak bandwidth per node of 1 GigaByte/second. The SCI standard should facilitate assembly of processor, memory, I/O and bus bridge cards from multiple vendors into massively parallel systems with throughput far above what is possible today. The SCI standard encompasses two levels of interface, a physical level and a logical level. The physical level specifies electrical, mechanical and thermal characteristics of connectors and cards that meet the standard. The logical level describes the address space, data transfer protocols, cache coherence mechanisms, synchronization primitives and error recovery. In this paper we address logical level issues such as packet formats, packet transmission, transaction handshake, flow control, and cache coherence. 11 refs., 10 figs

  9. The essential guide to metadata for books

    Register, Renee

    2013-01-01

    In The Essential Guide to Metadata for Books, you will learn exactly what you need to know to effectively generate, handle and disseminate metadata for books and ebooks. This comprehensive but digestible document will explain the life-cycle of book metadata, industry standards, XML, ONIX and the essential elements of metadata. It will also show you how effective, well-organized metadata can improve your efforts to sell a book, especially when it comes to marketing, discoverability and converting at the point of sale. This information-packed document also includes a glossary of terms

  10. Metadata Life Cycles, Use Cases and Hierarchies

    Ted Habermann

    2018-05-01

    Full Text Available The historic view of metadata as “data about data” is expanding to include data about other items that must be created, used, and understood throughout the data and project life cycles. In this context, metadata might better be defined as the structured and standard part of documentation, and the metadata life cycle can be described as the metadata content that is required for documentation in each phase of the project and data life cycles. This incremental approach to metadata creation is similar to the spiral model used in software development. Each phase also has distinct users and specific questions to which they need answers. In many cases, the metadata life cycle involves hierarchies where latter phases have increased numbers of items. The relationships between metadata in different phases can be captured through structure in the metadata standard, or through conventions for identifiers. Metadata creation and management can be streamlined and simplified by re-using metadata across many records. Many of these ideas have been developed to various degrees in several Geoscience disciplines and are being used in metadata for documenting the integrated life cycle of environmental research in the Arctic, including projects, collection sites, and datasets.

  11. Log-Less Metadata Management on Metadata Server for Parallel File Systems

    Jianwei Liao

    2014-01-01

    Full Text Available This paper presents a novel metadata management mechanism on the metadata server (MDS for parallel and distributed file systems. In this technique, the client file system backs up the sent metadata requests, which have been handled by the metadata server, so that the MDS does not need to log metadata changes to nonvolatile storage for achieving highly available metadata service, as well as better performance improvement in metadata processing. As the client file system backs up certain sent metadata requests in its memory, the overhead for handling these backup requests is much smaller than that brought by the metadata server, while it adopts logging or journaling to yield highly available metadata service. The experimental results show that this newly proposed mechanism can significantly improve the speed of metadata processing and render a better I/O data throughput, in contrast to conventional metadata management schemes, that is, logging or journaling on MDS. Besides, a complete metadata recovery can be achieved by replaying the backup logs cached by all involved clients, when the metadata server has crashed or gone into nonoperational state exceptionally.

  12. PKI Scalability Issues

    Slagell, Adam J; Bonilla, Rafael

    2004-01-01

    This report surveys different PKI technologies such as PKIX and SPKI and the issues of PKI that affect scalability. Much focus is spent on certificate revocation methodologies and status verification systems such as CRLs, Delta-CRLs, CRS, Certificate Revocation Trees, Windowed Certificate Revocation, OCSP, SCVP and DVCS.

  13. Evolution in Metadata Quality: Common Metadata Repository's Role in NASA Curation Efforts

    Gilman, Jason; Shum, Dana; Baynes, Katie

    2016-01-01

    Metadata Quality is one of the chief drivers of discovery and use of NASA EOSDIS (Earth Observing System Data and Information System) data. Issues with metadata such as lack of completeness, inconsistency, and use of legacy terms directly hinder data use. As the central metadata repository for NASA Earth Science data, the Common Metadata Repository (CMR) has a responsibility to its users to ensure the quality of CMR search results. This poster covers how we use humanizers, a technique for dealing with the symptoms of metadata issues, as well as our plans for future metadata validation enhancements. The CMR currently indexes 35K collections and 300M granules.

  14. phosphorus retention data and metadata

    phosphorus retention in wetlands data and metadataThis dataset is associated with the following publication:Lane , C., and B. Autrey. Phosphorus retention of forested and emergent marsh depressional wetlands in differing land uses in Florida, USA. Wetlands Ecology and Management. Springer Science and Business Media B.V;Formerly Kluwer Academic Publishers B.V., GERMANY, 24(1): 45-60, (2016).

  15. Finding Atmospheric Composition (AC) Metadata

    Strub, Richard F..; Falke, Stefan; Fiakowski, Ed; Kempler, Steve; Lynnes, Chris; Goussev, Oleg

    2015-01-01

    The Atmospheric Composition Portal (ACP) is an aggregator and curator of information related to remotely sensed atmospheric composition data and analysis. It uses existing tools and technologies and, where needed, enhances those capabilities to provide interoperable access, tools, and contextual guidance for scientists and value-adding organizations using remotely sensed atmospheric composition data. The initial focus is on Essential Climate Variables identified by the Global Climate Observing System CH4, CO, CO2, NO2, O3, SO2 and aerosols. This poster addresses our efforts in building the ACP Data Table, an interface to help discover and understand remotely sensed data that are related to atmospheric composition science and applications. We harvested GCMD, CWIC, GEOSS metadata catalogs using machine to machine technologies - OpenSearch, Web Services. We also manually investigated the plethora of CEOS data providers portals and other catalogs where that data might be aggregated. This poster is our experience of the excellence, variety, and challenges we encountered.Conclusions:1.The significant benefits that the major catalogs provide are their machine to machine tools like OpenSearch and Web Services rather than any GUI usability improvements due to the large amount of data in their catalog.2.There is a trend at the large catalogs towards simulating small data provider portals through advanced services. 3.Populating metadata catalogs using ISO19115 is too complex for users to do in a consistent way, difficult to parse visually or with XML libraries, and too complex for Java XML binders like CASTOR.4.The ability to search for Ids first and then for data (GCMD and ECHO) is better for machine to machine operations rather than the timeouts experienced when returning the entire metadata entry at once. 5.Metadata harvest and export activities between the major catalogs has led to a significant amount of duplication. (This is currently being addressed) 6.Most (if not all

  16. XML for catalogers and metadata librarians

    Cole, Timothy W

    2013-01-01

    How are today's librarians to manage and describe the everexpanding volumes of resources, in both digital and print formats? The use of XML in cataloging and metadata workflows can improve metadata quality, the consistency of cataloging workflows, and adherence to standards. This book is intended to enable current and future catalogers and metadata librarians to progress beyond a bare surfacelevel acquaintance with XML, thereby enabling them to integrate XML technologies more fully into their cataloging workflows. Building on the wealth of work on library descriptive practices, cataloging, and metadata, XML for Catalogers and Metadata Librarians explores the use of XML to serialize, process, share, and manage library catalog and metadata records. The authors' expert treatment of the topic is written to be accessible to those with little or no prior practical knowledge of or experience with how XML is used. Readers will gain an educated appreciation of the nuances of XML and grasp the benefit of more advanced ...

  17. Security in a Replicated Metadata Catalogue

    Koblitz, B

    2007-01-01

    The gLite-AMGA metadata has been developed by NA4 to provide simple relational metadata access for the EGEE user community. As advanced features, which will be the focus of this presentation, AMGA provides very fine-grained security also in connection with the built-in support for replication and federation of metadata. AMGA is extensively used by the biomedical community to store medical images metadata, digital libraries, in HEP for logging and bookkeeping data and in the climate community. The biomedical community intends to deploy a distributed metadata system for medical images consisting of various sites, which range from hospitals to computing centres. Only safe sharing of the highly sensitive metadata as provided in AMGA makes such a scenario possible. Other scenarios are digital libraries, which federate copyright protected (meta-) data into a common catalogue. The biomedical and digital libraries have been deployed using a centralized structure already for some time. They now intend to decentralize ...

  18. Scalable Resolution Display Walls

    Leigh, Jason; Johnson, Andrew; Renambot, Luc; Peterka, Tom; Jeong, Byungil; Sandin, Daniel J.; Talandis, Jonas; Jagodic, Ratko; Nam, Sungwon; Hur, Hyejung; Sun, Yiwen

    2013-01-01

    This article will describe the progress since 2000 on research and development in 2-D and 3-D scalable resolution display walls that are built from tiling individual lower resolution flat panel displays. The article will describe approaches and trends in display hardware construction, middleware architecture, and user-interaction design. The article will also highlight examples of use cases and the benefits the technology has brought to their respective disciplines. © 1963-2012 IEEE.

  19. Technologies for metadata management in scientific a

    Castro-Romero, Alexander; González-Sanabria, Juan S.; Ballesteros-Ricaurte, Javier A.

    2015-01-01

    The use of Semantic Web technologies has been increasing, so it is common using them in different ways. This article evaluates how these technologies can contribute to improve the indexing in articles in scientific journals. Initially, there is a conceptual review about metadata. Later, studying the most important technologies for the use of metadata in Web and, this way, choosing one of them to apply it in the case of study of scientific articles indexing, in order to determine the metadata ...

  20. Critical Metadata for Spectroscopy Field Campaigns

    Barbara A. Rasaiah

    2014-04-01

    Full Text Available A field spectroscopy metadata standard is defined as those data elements that explicitly document the spectroscopy dataset and field protocols, sampling strategies, instrument properties and environmental and logistical variables. Standards for field spectroscopy metadata affect the quality, completeness, reliability, and usability of datasets created in situ. Currently there is no standardized methodology for documentation of in situ spectroscopy data or metadata. This paper presents results of an international experiment comprising a web-based survey and expert panel evaluation that investigated critical metadata in field spectroscopy. The survey participants were a diverse group of scientists experienced in gathering spectroscopy data across a wide range of disciplines. Overall, respondents were in agreement about a core metadataset for generic campaign metadata, allowing for a prioritization of critical metadata elements to be proposed including those relating to viewing geometry, location, general target and sampling properties, illumination, instrument properties, reference standards, calibration, hyperspectral signal properties, atmospheric conditions, and general project details. Consensus was greatest among individual expert groups in specific application domains. The results allow the identification of a core set of metadata fields that enforce long term data storage and serve as a foundation for a metadata standard. This paper is part one in a series about the core elements of a robust and flexible field spectroscopy metadata standard.

  1. Evaluating the privacy properties of telephone metadata

    Mayer, Jonathan; Mutchler, Patrick; Mitchell, John C.

    2016-01-01

    Since 2013, a stream of disclosures has prompted reconsideration of surveillance law and policy. One of the most controversial principles, both in the United States and abroad, is that communications metadata receives substantially less protection than communications content. Several nations currently collect telephone metadata in bulk, including on their own citizens. In this paper, we attempt to shed light on the privacy properties of telephone metadata. Using a crowdsourcing methodology, we demonstrate that telephone metadata is densely interconnected, can trivially be reidentified, and can be used to draw sensitive inferences. PMID:27185922

  2. Logic programming and metadata specifications

    Lopez, Antonio M., Jr.; Saacks, Marguerite E.

    1992-01-01

    Artificial intelligence (AI) ideas and techniques are critical to the development of intelligent information systems that will be used to collect, manipulate, and retrieve the vast amounts of space data produced by 'Missions to Planet Earth.' Natural language processing, inference, and expert systems are at the core of this space application of AI. This paper presents logic programming as an AI tool that can support inference (the ability to draw conclusions from a set of complicated and interrelated facts). It reports on the use of logic programming in the study of metadata specifications for a small problem domain of airborne sensors, and the dataset characteristics and pointers that are needed for data access.

  3. Integrated Array/Metadata Analytics

    Misev, Dimitar; Baumann, Peter

    2015-04-01

    Data comes in various forms and types, and integration usually presents a problem that is often simply ignored and solved with ad-hoc solutions. Multidimensional arrays are an ubiquitous data type, that we find at the core of virtually all science and engineering domains, as sensor, model, image, statistics data. Naturally, arrays are richly described by and intertwined with additional metadata (alphanumeric relational data, XML, JSON, etc). Database systems, however, a fundamental building block of what we call "Big Data", lack adequate support for modelling and expressing these array data/metadata relationships. Array analytics is hence quite primitive or non-existent at all in modern relational DBMS. Recognizing this, we extended SQL with a new SQL/MDA part seamlessly integrating multidimensional array analytics into the standard database query language. We demonstrate the benefits of SQL/MDA with real-world examples executed in ASQLDB, an open-source mediator system based on HSQLDB and rasdaman, that already implements SQL/MDA.

  4. Scalable optical quantum computer

    Manykin, E A; Mel' nichenko, E V [Institute for Superconductivity and Solid-State Physics, Russian Research Centre ' Kurchatov Institute' , Moscow (Russian Federation)

    2014-12-31

    A way of designing a scalable optical quantum computer based on the photon echo effect is proposed. Individual rare earth ions Pr{sup 3+}, regularly located in the lattice of the orthosilicate (Y{sub 2}SiO{sub 5}) crystal, are suggested to be used as optical qubits. Operations with qubits are performed using coherent and incoherent laser pulses. The operation protocol includes both the method of measurement-based quantum computations and the technique of optical computations. Modern hybrid photon echo protocols, which provide a sufficient quantum efficiency when reading recorded states, are considered as most promising for quantum computations and communications. (quantum computer)

  5. Scalable optical quantum computer

    Manykin, E A; Mel'nichenko, E V

    2014-01-01

    A way of designing a scalable optical quantum computer based on the photon echo effect is proposed. Individual rare earth ions Pr 3+ , regularly located in the lattice of the orthosilicate (Y 2 SiO 5 ) crystal, are suggested to be used as optical qubits. Operations with qubits are performed using coherent and incoherent laser pulses. The operation protocol includes both the method of measurement-based quantum computations and the technique of optical computations. Modern hybrid photon echo protocols, which provide a sufficient quantum efficiency when reading recorded states, are considered as most promising for quantum computations and communications. (quantum computer)

  6. Leveraging Metadata to Create Better Web Services

    Mitchell, Erik

    2012-01-01

    Libraries have been increasingly concerned with data creation, management, and publication. This increase is partly driven by shifting metadata standards in libraries and partly by the growth of data and metadata repositories being managed by libraries. In order to manage these data sets, libraries are looking for new preservation and discovery…

  7. International Metadata Initiatives: Lessons in Bibliographic Control.

    Caplan, Priscilla

    This paper looks at a subset of metadata schemes, including the Text Encoding Initiative (TEI) header, the Encoded Archival Description (EAD), the Dublin Core Metadata Element Set (DCMES), and the Visual Resources Association (VRA) Core Categories for visual resources. It examines why they developed as they did, major point of difference from…

  8. A Metadata-Rich File System

    Ames, S; Gokhale, M B; Maltzahn, C

    2009-01-07

    Despite continual improvements in the performance and reliability of large scale file systems, the management of file system metadata has changed little in the past decade. The mismatch between the size and complexity of large scale data stores and their ability to organize and query their metadata has led to a de facto standard in which raw data is stored in traditional file systems, while related, application-specific metadata is stored in relational databases. This separation of data and metadata requires considerable effort to maintain consistency and can result in complex, slow, and inflexible system operation. To address these problems, we have developed the Quasar File System (QFS), a metadata-rich file system in which files, metadata, and file relationships are all first class objects. In contrast to hierarchical file systems and relational databases, QFS defines a graph data model composed of files and their relationships. QFS includes Quasar, an XPATH-extended query language for searching the file system. Results from our QFS prototype show the effectiveness of this approach. Compared to the defacto standard, the QFS prototype shows superior ingest performance and comparable query performance on user metadata-intensive operations and superior performance on normal file metadata operations.

  9. Collection Metadata Solutions for Digital Library Applications

    Hill, Linda L.; Janee, Greg; Dolin, Ron; Frew, James; Larsgaard, Mary

    1999-01-01

    Within a digital library, collections may range from an ad hoc set of objects that serve a temporary purpose to established library collections intended to persist through time. The objects in these collections vary widely, from library and data center holdings to pointers to real-world objects, such as geographic places, and the various metadata schemas that describe them. The key to integrated use of such a variety of collections in a digital library is collection metadata that represents the inherent and contextual characteristics of a collection. The Alexandria Digital Library (ADL) Project has designed and implemented collection metadata for several purposes: in XML form, the collection metadata "registers" the collection with the user interface client; in HTML form, it is used for user documentation; eventually, it will be used to describe the collection to network search agents; and it is used for internal collection management, including mapping the object metadata attributes to the common search parameters of the system.

  10. Incorporating ISO Metadata Using HDF Product Designer

    Jelenak, Aleksandar; Kozimor, John; Habermann, Ted

    2016-01-01

    The need to store in HDF5 files increasing amounts of metadata of various complexity is greatly overcoming the capabilities of the Earth science metadata conventions currently in use. Data producers until now did not have much choice but to come up with ad hoc solutions to this challenge. Such solutions, in turn, pose a wide range of issues for data managers, distributors, and, ultimately, data users. The HDF Group is experimenting on a novel approach of using ISO 19115 metadata objects as a catch-all container for all the metadata that cannot be fitted into the current Earth science data conventions. This presentation will showcase how the HDF Product Designer software can be utilized to help data producers include various ISO metadata objects in their products.

  11. A Flexible Online Metadata Editing and Management System

    Aguilar, Raul [Arizona State University; Pan, Jerry Yun [ORNL; Gries, Corinna [Arizona State University; Inigo, Gil San [University of New Mexico, Albuquerque; Palanisamy, Giri [ORNL

    2010-01-01

    A metadata editing and management system is being developed employing state of the art XML technologies. A modular and distributed design was chosen for scalability, flexibility, options for customizations, and the possibility to add more functionality at a later stage. The system consists of a desktop design tool or schema walker used to generate code for the actual online editor, a native XML database, and an online user access management application. The design tool is a Java Swing application that reads an XML schema, provides the designer with options to combine input fields into online forms and give the fields user friendly tags. Based on design decisions, the tool generates code for the online metadata editor. The code generated is an implementation of the XForms standard using the Orbeon Framework. The design tool fulfills two requirements: First, data entry forms based on one schema may be customized at design time and second data entry applications may be generated for any valid XML schema without relying on custom information in the schema. However, the customized information generated at design time is saved in a configuration file which may be re-used and changed again in the design tool. Future developments will add functionality to the design tool to integrate help text, tool tips, project specific keyword lists, and thesaurus services. Additional styling of the finished editor is accomplished via cascading style sheets which may be further customized and different look-and-feels may be accumulated through the community process. The customized editor produces XML files in compliance with the original schema, however, data from the current page is saved into a native XML database whenever the user moves to the next screen or pushes the save button independently of validity. Currently the system uses the open source XML database eXist for storage and management, which comes with third party online and desktop management tools. However, access to

  12. Building Scalable Knowledge Graphs for Earth Science

    Ramachandran, R.; Maskey, M.; Gatlin, P. N.; Zhang, J.; Duan, X.; Bugbee, K.; Christopher, S. A.; Miller, J. J.

    2017-12-01

    Estimates indicate that the world's information will grow by 800% in the next five years. In any given field, a single researcher or a team of researchers cannot keep up with this rate of knowledge expansion without the help of cognitive systems. Cognitive computing, defined as the use of information technology to augment human cognition, can help tackle large systemic problems. Knowledge graphs, one of the foundational components of cognitive systems, link key entities in a specific domain with other entities via relationships. Researchers could mine these graphs to make probabilistic recommendations and to infer new knowledge. At this point, however, there is a dearth of tools to generate scalable Knowledge graphs using existing corpus of scientific literature for Earth science research. Our project is currently developing an end-to-end automated methodology for incrementally constructing Knowledge graphs for Earth Science. Semantic Entity Recognition (SER) is one of the key steps in this methodology. SER for Earth Science uses external resources (including metadata catalogs and controlled vocabulary) as references to guide entity extraction and recognition (i.e., labeling) from unstructured text, in order to build a large training set to seed the subsequent auto-learning component in our algorithm. Results from several SER experiments will be presented as well as lessons learned.

  13. Scalable photoreactor for hydrogen production

    Takanabe, Kazuhiro; Shinagawa, Tatsuya

    2017-01-01

    Provided herein are scalable photoreactors that can include a membrane-free water- splitting electrolyzer and systems that can include a plurality of membrane-free water- splitting electrolyzers. Also provided herein are methods of using the scalable photoreactors provided herein.

  14. Scalable photoreactor for hydrogen production

    Takanabe, Kazuhiro

    2017-04-06

    Provided herein are scalable photoreactors that can include a membrane-free water- splitting electrolyzer and systems that can include a plurality of membrane-free water- splitting electrolyzers. Also provided herein are methods of using the scalable photoreactors provided herein.

  15. Metadata to Support Data Warehouse Evolution

    Solodovnikova, Darja

    The focus of this chapter is metadata necessary to support data warehouse evolution. We present the data warehouse framework that is able to track evolution process and adapt data warehouse schemata and data extraction, transformation, and loading (ETL) processes. We discuss the significant part of the framework, the metadata repository that stores information about the data warehouse, logical and physical schemata and their versions. We propose the physical implementation of multiversion data warehouse in a relational DBMS. For each modification of a data warehouse schema, we outline the changes that need to be made to the repository metadata and in the database.

  16. Handbook of metadata, semantics and ontologies

    Sicilia, Miguel-Angel

    2013-01-01

    Metadata research has emerged as a discipline cross-cutting many domains, focused on the provision of distributed descriptions (often called annotations) to Web resources or applications. Such associated descriptions are supposed to serve as a foundation for advanced services in many application areas, including search and location, personalization, federation of repositories and automated delivery of information. Indeed, the Semantic Web is in itself a concrete technological framework for ontology-based metadata. For example, Web-based social networking requires metadata describing people and

  17. USGS Digital Orthophoto Quad (DOQ) Metadata

    Minnesota Department of Natural Resources — Metadata for the USGS DOQ Orthophoto Layer. Each orthophoto is represented by a Quarter 24k Quad tile polygon. The polygon attributes contain the quarter-quad tile...

  18. Structural Metadata Research in the Ears Program

    Liu, Yang; Shriberg, Elizabeth; Stolcke, Andreas; Peskin, Barbara; Ang, Jeremy; Hillard, Dustin; Ostendorf, Mari; Tomalin, Marcus; Woodland, Phil; Harper, Mary

    2005-01-01

    Both human and automatic processing of speech require recognition of more than just words. In this paper we provide a brief overview of research on structural metadata extraction in the DARPA EARS rich transcription program...

  19. Mining Building Metadata by Data Stream Comparison

    Holmegaard, Emil; Kjærgaard, Mikkel Baun

    2016-01-01

    to handle data streams with only slightly similar patterns. We have evaluated Metafier with points and data from one building located in Denmark. We have evaluated Metafier with 903 points, and the overall accuracy, with only 3 known examples, was 94.71%. Furthermore we found that using DTW for mining...... ways to annotate sensor and actuation points. This makes it difficult to create intuitive queries for retrieving data streams from points. Another problem is the amount of insufficient or missing metadata. We introduce Metafier, a tool for extracting metadata from comparing data streams. Metafier...... enables a semi-automatic labeling of metadata to building instrumentation. Metafier annotates points with metadata by comparing the data from a set of validated points with unvalidated points. Metafier has three different algorithms to compare points with based on their data. The three algorithms...

  20. FSA 2003-2004 Digital Orthophoto Metadata

    Minnesota Department of Natural Resources — Metadata for the 2003-2004 FSA Color Orthophotos Layer. Each orthophoto is represented by a Quarter 24k Quad tile polygon. The polygon attributes contain the...

  1. Scalable Frequent Subgraph Mining

    Abdelhamid, Ehab

    2017-06-19

    A graph is a data structure that contains a set of nodes and a set of edges connecting these nodes. Nodes represent objects while edges model relationships among these objects. Graphs are used in various domains due to their ability to model complex relations among several objects. Given an input graph, the Frequent Subgraph Mining (FSM) task finds all subgraphs with frequencies exceeding a given threshold. FSM is crucial for graph analysis, and it is an essential building block in a variety of applications, such as graph clustering and indexing. FSM is computationally expensive, and its existing solutions are extremely slow. Consequently, these solutions are incapable of mining modern large graphs. This slowness is caused by the underlying approaches of these solutions which require finding and storing an excessive amount of subgraph matches. This dissertation proposes a scalable solution for FSM that avoids the limitations of previous work. This solution is composed of four components. The first component is a single-threaded technique which, for each candidate subgraph, needs to find only a minimal number of matches. The second component is a scalable parallel FSM technique that utilizes a novel two-phase approach. The first phase quickly builds an approximate search space, which is then used by the second phase to optimize and balance the workload of the FSM task. The third component focuses on accelerating frequency evaluation, which is a critical step in FSM. To do so, a machine learning model is employed to predict the type of each graph node, and accordingly, an optimized method is selected to evaluate that node. The fourth component focuses on mining dynamic graphs, such as social networks. To this end, an incremental index is maintained during the dynamic updates. Only this index is processed and updated for the majority of graph updates. Consequently, search space is significantly pruned and efficiency is improved. The empirical evaluation shows that the

  2. Optimising metadata workflows in a distributed information environment

    Robertson, R. John; Barton, Jane

    2005-01-01

    The different purposes present within a distributed information environment create the potential for repositories to enhance their metadata by capitalising on the diversity of metadata available for any given object. This paper presents three conceptual reference models required to achieve this optimisation of metadata workflow: the ecology of repositories, the object lifecycle model, and the metadata lifecycle model. It suggests a methodology for developing the metadata lifecycle model, and ...

  3. Science friction: data, metadata, and collaboration.

    Edwards, Paul N; Mayernik, Matthew S; Batcheller, Archer L; Bowker, Geoffrey C; Borgman, Christine L

    2011-10-01

    When scientists from two or more disciplines work together on related problems, they often face what we call 'science friction'. As science becomes more data-driven, collaborative, and interdisciplinary, demand increases for interoperability among data, tools, and services. Metadata--usually viewed simply as 'data about data', describing objects such as books, journal articles, or datasets--serve key roles in interoperability. Yet we find that metadata may be a source of friction between scientific collaborators, impeding data sharing. We propose an alternative view of metadata, focusing on its role in an ephemeral process of scientific communication, rather than as an enduring outcome or product. We report examples of highly useful, yet ad hoc, incomplete, loosely structured, and mutable, descriptions of data found in our ethnographic studies of several large projects in the environmental sciences. Based on this evidence, we argue that while metadata products can be powerful resources, usually they must be supplemented with metadata processes. Metadata-as-process suggests the very large role of the ad hoc, the incomplete, and the unfinished in everyday scientific work.

  4. Scalable Nanomanufacturing—A Review

    Khershed Cooper

    2017-01-01

    Full Text Available This article describes the field of scalable nanomanufacturing, its importance and need, its research activities and achievements. The National Science Foundation is taking a leading role in fostering basic research in scalable nanomanufacturing (SNM. From this effort several novel nanomanufacturing approaches have been proposed, studied and demonstrated, including scalable nanopatterning. This paper will discuss SNM research areas in materials, processes and applications, scale-up methods with project examples, and manufacturing challenges that need to be addressed to move nanotechnology discoveries closer to the marketplace.

  5. Scalable Nonlinear Compact Schemes

    Ghosh, Debojyoti [Argonne National Lab. (ANL), Argonne, IL (United States); Constantinescu, Emil M. [Univ. of Chicago, IL (United States); Brown, Jed [Univ. of Colorado, Boulder, CO (United States)

    2014-04-01

    In this work, we focus on compact schemes resulting in tridiagonal systems of equations, specifically the fifth-order CRWENO scheme. We propose a scalable implementation of the nonlinear compact schemes by implementing a parallel tridiagonal solver based on the partitioning/substructuring approach. We use an iterative solver for the reduced system of equations; however, we solve this system to machine zero accuracy to ensure that no parallelization errors are introduced. It is possible to achieve machine-zero convergence with few iterations because of the diagonal dominance of the system. The number of iterations is specified a priori instead of a norm-based exit criterion, and collective communications are avoided. The overall algorithm thus involves only point-to-point communication between neighboring processors. Our implementation of the tridiagonal solver differs from and avoids the drawbacks of past efforts in the following ways: it introduces no parallelization-related approximations (multiprocessor solutions are exactly identical to uniprocessor ones), it involves minimal communication, the mathematical complexity is similar to that of the Thomas algorithm on a single processor, and it does not require any communication and computation scheduling.

  6. Distributed metadata servers for cluster file systems using shared low latency persistent key-value metadata store

    Bent, John M.; Faibish, Sorin; Pedone, Jr., James M.; Tzelnic, Percy; Ting, Dennis P. J.; Ionkov, Latchesar A.; Grider, Gary

    2017-12-26

    A cluster file system is provided having a plurality of distributed metadata servers with shared access to one or more shared low latency persistent key-value metadata stores. A metadata server comprises an abstract storage interface comprising a software interface module that communicates with at least one shared persistent key-value metadata store providing a key-value interface for persistent storage of key-value metadata. The software interface module provides the key-value metadata to the at least one shared persistent key-value metadata store in a key-value format. The shared persistent key-value metadata store is accessed by a plurality of metadata servers. A metadata request can be processed by a given metadata server independently of other metadata servers in the cluster file system. A distributed metadata storage environment is also disclosed that comprises a plurality of metadata servers having an abstract storage interface to at least one shared persistent key-value metadata store.

  7. Streamlining geospatial metadata in the Semantic Web

    Fugazza, Cristiano; Pepe, Monica; Oggioni, Alessandro; Tagliolato, Paolo; Carrara, Paola

    2016-04-01

    In the geospatial realm, data annotation and discovery rely on a number of ad-hoc formats and protocols. These have been created to enable domain-specific use cases generalized search is not feasible for. Metadata are at the heart of the discovery process and nevertheless they are often neglected or encoded in formats that either are not aimed at efficient retrieval of resources or are plainly outdated. Particularly, the quantum leap represented by the Linked Open Data (LOD) movement did not induce so far a consistent, interlinked baseline in the geospatial domain. In a nutshell, datasets, scientific literature related to them, and ultimately the researchers behind these products are only loosely connected; the corresponding metadata intelligible only to humans, duplicated on different systems, seldom consistently. Instead, our workflow for metadata management envisages i) editing via customizable web- based forms, ii) encoding of records in any XML application profile, iii) translation into RDF (involving the semantic lift of metadata records), and finally iv) storage of the metadata as RDF and back-translation into the original XML format with added semantics-aware features. Phase iii) hinges on relating resource metadata to RDF data structures that represent keywords from code lists and controlled vocabularies, toponyms, researchers, institutes, and virtually any description one can retrieve (or directly publish) in the LOD Cloud. In the context of a distributed Spatial Data Infrastructure (SDI) built on free and open-source software, we detail phases iii) and iv) of our workflow for the semantics-aware management of geospatial metadata.

  8. Prediction of Solar Eruptions Using Filament Metadata

    Aggarwal, Ashna; Schanche, Nicole; Reeves, Katharine K.; Kempton, Dustin; Angryk, Rafal

    2018-05-01

    We perform a statistical analysis of erupting and non-erupting solar filaments to determine the properties related to the eruption potential. In order to perform this study, we correlate filament eruptions documented in the Heliophysics Event Knowledgebase (HEK) with HEK filaments that have been grouped together using a spatiotemporal tracking algorithm. The HEK provides metadata about each filament instance, including values for length, area, tilt, and chirality. We add additional metadata properties such as the distance from the nearest active region and the magnetic field decay index. We compare trends in the metadata from erupting and non-erupting filament tracks to discover which properties present signs of an eruption. We find that a change in filament length over time is the most important factor in discriminating between erupting and non-erupting filament tracks, with erupting tracks being more likely to have decreasing length. We attempt to find an ensemble of predictive filament metadata using a Random Forest Classifier approach, but find the probability of correctly predicting an eruption with the current metadata is only slightly better than chance.

  9. Handling Metadata in a Neurophysiology Laboratory

    Lyuba Zehl

    2016-07-01

    Full Text Available To date, non-reproducibility of neurophysiological research is a matterof intense discussion in the scientific community. A crucial componentto enhance reproducibility is to comprehensively collect and storemetadata, that is all information about the experiment, the data,and the applied preprocessing steps on the data, such that they canbe accessed and shared in a consistent and simple manner. However,the complexity of experiments, the highly specialized analysis workflowsand a lack of knowledge on how to make use of supporting softwaretools often overburden researchers to perform such a detailed documentation.For this reason, the collected metadata are often incomplete, incomprehensiblefor outsiders or ambiguous. Based on our research experience in dealingwith diverse datasets, we here provide conceptual and technical guidanceto overcome the challenges associated with the collection, organization,and storage of metadata in a neurophysiology laboratory. Through theconcrete example of managing the metadata of a complex experimentthat yields multi-channel recordings from monkeys performing a behavioralmotor task, we practically demonstrate the implementation of theseapproaches and solutions with the intention that they may be generalizedto a specific project at hand. Moreover, we detail five use casesthat demonstrate the resulting benefits of constructing a well-organizedmetadata collection when processing or analyzing the recorded data,in particular when these are shared between laboratories in a modernscientific collaboration. Finally, we suggest an adaptable workflowto accumulate, structure and store metadata from different sourcesusing, by way of example, the odML metadata framework.

  10. Geospatial metadata retrieval from web services

    Ivanildo Barbosa

    Full Text Available Nowadays, producers of geospatial data in either raster or vector formats are able to make them available on the World Wide Web by deploying web services that enable users to access and query on those contents even without specific software for geoprocessing. Several providers around the world have deployed instances of WMS (Web Map Service, WFS (Web Feature Service and WCS (Web Coverage Service, all of them specified by the Open Geospatial Consortium (OGC. In consequence, metadata about the available contents can be retrieved to be compared with similar offline datasets from other sources. This paper presents a brief summary and describes the matching process between the specifications for OGC web services (WMS, WFS and WCS and the specifications for metadata required by the ISO 19115 - adopted as reference for several national metadata profiles, including the Brazilian one. This process focuses on retrieving metadata about the identification and data quality packages as well as indicates the directions to retrieve metadata related to other packages. Therefore, users are able to assess whether the provided contents fit to their purposes.

  11. Building a Disciplinary Metadata Standards Directory

    Alexander Ball

    2014-07-01

    Full Text Available The Research Data Alliance (RDA Metadata Standards Directory Working Group (MSDWG is building a directory of descriptive, discipline-specific metadata standards. The purpose of the directory is to promote the discovery, access and use of such standards, thereby improving the state of research data interoperability and reducing duplicative standards development work.This work builds upon the UK Digital Curation Centre's Disciplinary Metadata Catalogue, a resource created with much the same aim in mind. The first stage of the MSDWG's work was to update and extend the information contained in the catalogue. In the current, second stage, a new platform is being developed in order to extend the functionality of the directory beyond that of the catalogue, and to make it easier to maintain and sustain. Future work will include making the directory more amenable to use by automated tools.

  12. Metadata and Ontologies in Learning Resources Design

    Vidal C., Christian; Segura Navarrete, Alejandra; Menéndez D., Víctor; Zapata Gonzalez, Alfredo; Prieto M., Manuel

    Resource design and development requires knowledge about educational goals, instructional context and information about learner's characteristics among other. An important information source about this knowledge are metadata. However, metadata by themselves do not foresee all necessary information related to resource design. Here we argue the need to use different data and knowledge models to improve understanding the complex processes related to e-learning resources and their management. This paper presents the use of semantic web technologies, as ontologies, supporting the search and selection of resources used in design. Classification is done, based on instructional criteria derived from a knowledge acquisition process, using information provided by IEEE-LOM metadata standard. The knowledge obtained is represented in an ontology using OWL and SWRL. In this work we give evidence of the implementation of a Learning Object Classifier based on ontology. We demonstrate that the use of ontologies can support the design activities in e-learning.

  13. Pembuatan Aplikasi Metadata Generator untuk Koleksi Peninggalan Warisan Budaya

    Wimba Agra Wicesa

    2017-03-01

    Full Text Available Warisan budaya merupakan suatu aset penting yang digunakan sebagai sumber informasi dalam mempelajari ilmu sejarah. Mengelola data warisan budaya menjadi suatu hal yang harus diperhatikan guna menjaga keutuhan data warisan budaya di masa depan. Menciptakan sebuah metadata warisan budaya merupakan salah satu langkah yang dapat diambil untuk menjaga nilai dari sebuah artefak. Dengan menggunakan konsep metadata, informasi dari setiap objek warisan budaya tersebut menjadi mudah untuk dibaca, dikelola, maupun dicari kembali meskipun telah tersimpan lama. Selain itu dengan menggunakan konsep metadata, informasi tentang warisan budaya dapat digunakan oleh banyak sistem. Metadata warisan budaya merupakan metadata yang cukup besar. Sehingga untuk membangun metada warisan budaya dibutuhkan waktu yang cukup lama. Selain itu kesalahan (human error juga dapat menghambat proses pembangunan metadata warisan budaya. Proses pembangkitan metadata warisan budaya melalui Aplikasi Metadata Generator menjadi lebih cepat dan mudah karena dilakukan secara otomatis oleh sistem. Aplikasi ini juga dapat menekan human error sehingga proses pembangkitan menjadi lebih efisien.

  14. EPA Metadata Style Guide Keywords and EPA Organization Names

    The following keywords and EPA organization names listed below, along with EPA’s Metadata Style Guide, are intended to provide suggestions and guidance to assist with the standardization of metadata records.

  15. Scalable algorithms for contact problems

    Dostál, Zdeněk; Sadowská, Marie; Vondrák, Vít

    2016-01-01

    This book presents a comprehensive and self-contained treatment of the authors’ newly developed scalable algorithms for the solutions of multibody contact problems of linear elasticity. The brand new feature of these algorithms is theoretically supported numerical scalability and parallel scalability demonstrated on problems discretized by billions of degrees of freedom. The theory supports solving multibody frictionless contact problems, contact problems with possibly orthotropic Tresca’s friction, and transient contact problems. It covers BEM discretization, jumping coefficients, floating bodies, mortar non-penetration conditions, etc. The exposition is divided into four parts, the first of which reviews appropriate facets of linear algebra, optimization, and analysis. The most important algorithms and optimality results are presented in the third part of the volume. The presentation is complete, including continuous formulation, discretization, decomposition, optimality results, and numerical experimen...

  16. Creating metadata that work for digital libraries and Google

    Dawson, Alan

    2004-01-01

    For many years metadata has been recognised as a significant component of the digital information environment. Substantial work has gone into creating complex metadata schemes for describing digital content. Yet increasingly Web search engines, and Google in particular, are the primary means of discovering and selecting digital resources, although they make little use of metadata. This article considers how digital libraries can gain more value from their metadata by adapting it for Google us...

  17. Scalable cloud without dedicated storage

    Batkovich, D. V.; Kompaniets, M. V.; Zarochentsev, A. K.

    2015-05-01

    We present a prototype of a scalable computing cloud. It is intended to be deployed on the basis of a cluster without the separate dedicated storage. The dedicated storage is replaced by the distributed software storage. In addition, all cluster nodes are used both as computing nodes and as storage nodes. This solution increases utilization of the cluster resources as well as improves fault tolerance and performance of the distributed storage. Another advantage of this solution is high scalability with a relatively low initial and maintenance cost. The solution is built on the basis of the open source components like OpenStack, CEPH, etc.

  18. Building a High Performance Metadata Broker using Clojure, NoSQL and Message Queues

    Truslove, I.; Reed, S.

    2013-12-01

    In practice, Earth and Space Science Informatics often relies on getting more done with less: fewer hardware resources, less IT staff, fewer lines of code. As a capacity-building exercise focused on rapid development of high-performance geoinformatics software, the National Snow and Ice Data Center (NSIDC) built a prototype metadata brokering system using a new JVM language, modern database engines and virtualized or cloud computing resources. The metadata brokering system was developed with the overarching goals of (i) demonstrating a technically viable product with as little development effort as possible, (ii) using very new yet very popular tools and technologies in order to get the most value from the least legacy-encumbered code bases, and (iii) being a high-performance system by using scalable subcomponents, and implementation patterns typically used in web architectures. We implemented the system using the Clojure programming language (an interactive, dynamic, Lisp-like JVM language), Redis (a fast in-memory key-value store) as both the data store for original XML metadata content and as the provider for the message queueing service, and ElasticSearch for its search and indexing capabilities to generate search results. On evaluating the results of the prototyping process, we believe that the technical choices did in fact allow us to do more for less, due to the expressive nature of the Clojure programming language and its easy interoperability with Java libraries, and the successful reuse or re-application of high performance products or designs. This presentation will describe the architecture of the metadata brokering system, cover the tools and techniques used, and describe lessons learned, conclusions, and potential next steps.

  19. Handling multiple metadata streams regarding digital learning material

    Roes, J.B.M.; Vuuren, J. van; Verbeij, N.; Nijstad, H.

    2010-01-01

    This paper presents the outcome of a study performed in the Netherlands on handling multiple metadata streams regarding digital learning material. The paper describes the present metadata architecture in the Netherlands, the present suppliers and users of metadata and digital learning materials. It

  20. From CLARIN Component Metadata to Linked Open Data

    Durco, M.; Windhouwer, Menzo

    2014-01-01

    In the European CLARIN infrastructure a growing number of resources are described with Component Metadata. In this paper we describe a transformation to make this metadata available as linked data. After this first step it becomes possible to connect the CLARIN Component Metadata with other valuable

  1. Metadata Exporter for Scientific Photography Management

    Staudigel, D.; English, B.; Delaney, R.; Staudigel, H.; Koppers, A.; Hart, S.

    2005-12-01

    Photographs have become an increasingly important medium, especially with the advent of digital cameras. It has become inexpensive to take photographs and quickly post them on a website. However informative photos may be, they still need to be displayed in a convenient way, and be cataloged in such a manner that makes them easily locatable. Managing the great number of photographs that digital cameras allow and creating a format for efficient dissemination of the information related to the photos is a tedious task. Products such as Apple's iPhoto have greatly eased the task of managing photographs, However, they often have limitations. Un-customizable metadata fields and poor metadata extraction tools limit their scientific usefulness. A solution to this persistent problem is a customizable metadata exporter. On the ALIA expedition, we successfully managed the thousands of digital photos we took. We did this with iPhoto and a version of the exporter that is now available to the public under the name "CustomHTMLExport" (http://www.versiontracker.com/dyn/moreinfo/macosx/27777), currently undergoing formal beta testing This software allows the use of customized metadata fields (including description, time, date, GPS data, etc.), which is exported along with the photo. It can also produce webpages with this data straight from iPhoto, in a much more flexible way than is already allowed. With this tool it becomes very easy to manage and distribute scientific photos.

  2. Metadata Aided Run Selection at ATLAS

    Buckingham, RM; The ATLAS collaboration; Tseng, JC-L; Viegas, F; Vinek, E

    2010-01-01

    Management of the large volume of data collected by any large scale sci- entific experiment requires the collection of coherent metadata quantities, which can be used by reconstruction or analysis programs and/or user in- terfaces, to pinpoint collections of data needed for specific purposes. In the ATLAS experiment at the LHC, we have collected metadata from systems storing non-event-wise data (Conditions) into a relational database. The Conditions metadata (COMA) database tables not only contain conditions known at the time of event recording, but also allow for the addition of conditions data collected as a result of later analysis of the data (such as improved measurements of beam conditions or assessments of data quality). A new web based interface called “runBrowser” makes these Conditions Metadata available as a Run based selection service. runBrowser, based on php and javascript, uses jQuery to present selection criteria and report results. It not only facilitates data selection by conditions at...

  3. Metadata aided run selection at ATLAS

    Buckingham, RM; The ATLAS collaboration; Tseng, JC-L; Viegas, F; Vinek, E

    2011-01-01

    Management of the large volume of data collected by any large scale scientific experiment requires the collection of coherent metadata quantities, which can be used by reconstruction or analysis programs and/or user interfaces, to pinpoint collections of data needed for specific purposes. In the ATLAS experiment at the LHC, we have collected metadata from systems storing non-event-wise data (Conditions) into a relational database. The Conditions metadata (COMA) database tables not only contain conditions known at the time of event recording, but also allow for the addition of conditions data collected as a result of later analysis of the data (such as improved measurements of beam conditions or assessments of data quality). A new web based interface called “runBrowser” makes these Conditions Metadata available as a Run based selection service. runBrowser, based on php and javascript, uses jQuery to present selection criteria and report results. It not only facilitates data selection by conditions attrib...

  4. Viewing and Editing Earth Science Metadata MOBE: Metadata Object Browser and Editor in Java

    Chase, A.; Helly, J.

    2002-12-01

    Metadata is an important, yet often neglected aspect of successful archival efforts. However, to generate robust, useful metadata is often a time consuming and tedious task. We have been approaching this problem from two directions: first by automating metadata creation, pulling from known sources of data, and in addition, what this (paper/poster?) details, developing friendly software for human interaction with the metadata. MOBE and COBE(Metadata Object Browser and Editor, and Canonical Object Browser and Editor respectively), are Java applications for editing and viewing metadata and digital objects. MOBE has already been designed and deployed, currently being integrated into other areas of the SIOExplorer project. COBE is in the design and development stage, being created with the same considerations in mind as those for MOBE. Metadata creation, viewing, data object creation, and data object viewing, when taken on a small scale are all relatively simple tasks. Computer science however, has an infamous reputation for transforming the simple into complex. As a system scales upwards to become more robust, new features arise and additional functionality is added to the software being written to manage the system. The software that emerges from such an evolution, though powerful, is often complex and difficult to use. With MOBE the focus is on a tool that does a small number of tasks very well. The result has been an application that enables users to manipulate metadata in an intuitive and effective way. This allows for a tool that serves its purpose without introducing additional cognitive load onto the user, an end goal we continue to pursue.

  5. An evaluation of internal event Level 1 PRA methods used in NUREG-1150

    Camp, A.L.; Cramond, W.R.

    1989-01-01

    As part of the effort to support NUREG-1150, Sandia National Laboratories and its subcontractors have developed innovative techniques for efficiently performing internal event Level I probabilistic risk assessments. This methodology is one of the alternatives for industry to use in performing individual plant evaluations in the future. While this new methodology was very successful, there are some areas where improvements can be made. This paper evaluates the strengths and weaknesses of the methodology and makes some important recommendations for modifications in order to provide insights to future users

  6. Scalable shared-memory multiprocessing

    Lenoski, Daniel E

    1995-01-01

    Dr. Lenoski and Dr. Weber have experience with leading-edge research and practical issues involved in implementing large-scale parallel systems. They were key contributors to the architecture and design of the DASH multiprocessor. Currently, they are involved with commercializing scalable shared-memory technology.

  7. Metadata Effectiveness in Internet Discovery: An Analysis of Digital Collection Metadata Elements and Internet Search Engine Keywords

    Yang, Le

    2016-01-01

    This study analyzed digital item metadata and keywords from Internet search engines to learn what metadata elements actually facilitate discovery of digital collections through Internet keyword searching and how significantly each metadata element affects the discovery of items in a digital repository. The study found that keywords from Internet…

  8. Metadata Authoring with Versatility and Extensibility

    Pollack, Janine; Olsen, Lola

    2004-01-01

    NASA's Global Change Master Directory (GCMD) assists the scientific community in the discovery of and linkage to Earth science data sets and related services. The GCMD holds over 13,800 data set descriptions in Directory Interchange Format (DIF) and 700 data service descriptions in Service Entry Resource Format (SERF), encompassing the disciplines of geology, hydrology, oceanography, meteorology, and ecology. Data descriptions also contain geographic coverage information and direct links to the data, thus allowing researchers to discover data pertaining to a geographic location of interest, then quickly acquire those data. The GCMD strives to be the preferred data locator for world-wide directory-level metadata. In this vein, scientists and data providers must have access to intuitive and efficient metadata authoring tools. Existing GCMD tools are attracting widespread usage; however, a need for tools that are portable, customizable and versatile still exists. With tool usage directly influencing metadata population, it has become apparent that new tools are needed to fill these voids. As a result, the GCMD has released a new authoring tool allowing for both web-based and stand-alone authoring of descriptions. Furthermore, this tool incorporates the ability to plug-and-play the metadata format of choice, offering users options of DIF, SERF, FGDC, ISO or any other defined standard. Allowing data holders to work with their preferred format, as well as an option of a stand-alone application or web-based environment, docBUlLDER will assist the scientific community in efficiently creating quality data and services metadata.

  9. Multi-facetted Metadata - Describing datasets with different metadata schemas at the same time

    Ulbricht, Damian; Klump, Jens; Bertelmann, Roland

    2013-04-01

    Inspired by the wish to re-use research data a lot of work is done to bring data systems of the earth sciences together. Discovery metadata is disseminated to data portals to allow building of customized indexes of catalogued dataset items. Data that were once acquired in the context of a scientific project are open for reappraisal and can now be used by scientists that were not part of the original research team. To make data re-use easier, measurement methods and measurement parameters must be documented in an application metadata schema and described in a written publication. Linking datasets to publications - as DataCite [1] does - requires again a specific metadata schema and every new use context of the measured data may require yet another metadata schema sharing only a subset of information with the meta information already present. To cope with the problem of metadata schema diversity in our common data repository at GFZ Potsdam we established a solution to store file-based research data and describe these with an arbitrary number of metadata schemas. Core component of the data repository is an eSciDoc infrastructure that provides versioned container objects, called eSciDoc [2] "items". The eSciDoc content model allows assigning files to "items" and adding any number of metadata records to these "items". The eSciDoc items can be submitted, revised, and finally published, which makes the data and metadata available through the internet worldwide. GFZ Potsdam uses eSciDoc to support its scientific publishing workflow, including mechanisms for data review in peer review processes by providing temporary web links for external reviewers that do not have credentials to access the data. Based on the eSciDoc API, panMetaDocs [3] provides a web portal for data management in research projects. PanMetaDocs, which is based on panMetaWorks [4], is a PHP based web application that allows to describe data with any XML-based schema. It uses the eSciDoc infrastructures

  10. Making Interoperability Easier with the NASA Metadata Management Tool

    Shum, D.; Reese, M.; Pilone, D.; Mitchell, A. E.

    2016-12-01

    ISO 19115 has enabled interoperability amongst tools, yet many users find it hard to build ISO metadata for their collections because it can be large and overly flexible for their needs. The Metadata Management Tool (MMT), part of NASA's Earth Observing System Data and Information System (EOSDIS), offers users a modern, easy to use browser based tool to develop ISO compliant metadata. Through a simplified UI experience, metadata curators can create and edit collections without any understanding of the complex ISO-19115 format, while still generating compliant metadata. The MMT is also able to assess the completeness of collection level metadata by evaluating it against a variety of metadata standards. The tool provides users with clear guidance as to how to change their metadata in order to improve their quality and compliance. It is based on NASA's Unified Metadata Model for Collections (UMM-C) which is a simpler metadata model which can be cleanly mapped to ISO 19115. This allows metadata authors and curators to meet ISO compliance requirements faster and more accurately. The MMT and UMM-C have been developed in an agile fashion, with recurring end user tests and reviews to continually refine the tool, the model and the ISO mappings. This process is allowing for continual improvement and evolution to meet the community's needs.

  11. Efficient processing of MPEG-21 metadata in the binary domain

    Timmerer, Christian; Frank, Thomas; Hellwagner, Hermann; Heuer, Jörg; Hutter, Andreas

    2005-10-01

    XML-based metadata is widely adopted across the different communities and plenty of commercial and open source tools for processing and transforming are available on the market. However, all of these tools have one thing in common: they operate on plain text encoded metadata which may become a burden in constrained and streaming environments, i.e., when metadata needs to be processed together with multimedia content on the fly. In this paper we present an efficient approach for transforming such kind of metadata which are encoded using MPEG's Binary Format for Metadata (BiM) without additional en-/decoding overheads, i.e., within the binary domain. Therefore, we have developed an event-based push parser for BiM encoded metadata which transforms the metadata by a limited set of processing instructions - based on traditional XML transformation techniques - operating on bit patterns instead of cost-intensive string comparisons.

  12. Creating context for the experiment record. User-defined metadata: investigations into metadata usage in the LabTrove ELN.

    Willoughby, Cerys; Bird, Colin L; Coles, Simon J; Frey, Jeremy G

    2014-12-22

    The drive toward more transparency in research, the growing willingness to make data openly available, and the reuse of data to maximize the return on research investment all increase the importance of being able to find information and make links to the underlying data. The use of metadata in Electronic Laboratory Notebooks (ELNs) to curate experiment data is an essential ingredient for facilitating discovery. The University of Southampton has developed a Web browser-based ELN that enables users to add their own metadata to notebook entries. A survey of these notebooks was completed to assess user behavior and patterns of metadata usage within ELNs, while user perceptions and expectations were gathered through interviews and user-testing activities within the community. The findings indicate that while some groups are comfortable with metadata and are able to design a metadata structure that works effectively, many users are making little attempts to use it, thereby endangering their ability to recover data in the future. A survey of patterns of metadata use in these notebooks, together with feedback from the user community, indicated that while a few groups are comfortable with metadata and are able to design a metadata structure that works effectively, many users adopt a "minimum required" approach to metadata. To investigate whether the patterns of metadata use in LabTrove were unusual, a series of surveys were undertaken to investigate metadata usage in a variety of platforms supporting user-defined metadata. These surveys also provided the opportunity to investigate whether interface designs in these other environments might inform strategies for encouraging metadata creation and more effective use of metadata in LabTrove.

  13. Testing Metadata Existence of Web Map Services

    Jan Růžička

    2011-05-01

    Full Text Available For a general user is quite common to use data sources available on WWW. Almost all GIS software allow to use data sources available via Web Map Service (ISO/OGC standard interface. The opportunity to use different sources and combine them brings a lot of problems that were discussed many times on conferences or journal papers. One of the problem is based on non existence of metadata for published sources. The question was: were the discussions effective? The article is partly based on comparison of situation for metadata between years 2007 and 2010. Second part of the article is focused only on 2010 year situation. The paper is created in a context of research of intelligent map systems, that can be used for an automatic or a semi-automatic map creation or a map evaluation.

  14. Semantic web technologies for video surveillance metadata

    Poppe, Chris; Martens, Gaëtan; De Potter, Pieterjan; Van de Walle, Rik

    2012-01-01

    Video surveillance systems are growing in size and complexity. Such systems typically consist of integrated modules of different vendors to cope with the increasing demands on network and storage capacity, intelligent video analytics, picture quality, and enhanced visual interfaces. Within a surveillance system, relevant information (like technical details on the video sequences, or analysis results of the monitored environment) is described using metadata standards. However, different module...

  15. Mdmap: A Tool for Metadata Collection and Matching

    Rico Simke

    2014-10-01

    Full Text Available This paper describes a front-end for the semi-automatic collection, matching, and generation of bibliographic metadata obtained from different sources for use within a digitization architecture. The Library of a Billion Words project is building an infrastructure for digitizing text that requires high-quality bibliographic metadata, but currently only sparse metadata from digitized editions is available. The project’s approach is to collect metadata for each digitized item from as many sources as possible. An expert user can then use an intuitive front-end tool to choose matching metadata. The collected metadata are centrally displayed in an interactive grid view. The user can choose which metadata they want to assign to a certain edition, and export these data as MARCXML. This paper presents a new approach to bibliographic work and metadata correction. We try to achieve a high quality of the metadata by generating a large amount of metadata to choose from, as well as by giving librarians an intuitive tool to manage their data.

  16. Leveraging Metadata to Create Interactive Images... Today!

    Hurt, Robert L.; Squires, G. K.; Llamas, J.; Rosenthal, C.; Brinkworth, C.; Fay, J.

    2011-01-01

    The image gallery for NASA's Spitzer Space Telescope has been newly rebuilt to fully support the Astronomy Visualization Metadata (AVM) standard to create a new user experience both on the website and in other applications. We encapsulate all the key descriptive information for a public image, including color representations and astronomical and sky coordinates and make it accessible in a user-friendly form on the website, but also embed the same metadata within the image files themselves. Thus, images downloaded from the site will carry with them all their descriptive information. Real-world benefits include display of general metadata when such images are imported into image editing software (e.g. Photoshop) or image catalog software (e.g. iPhoto). More advanced support in Microsoft's WorldWide Telescope can open a tagged image after it has been downloaded and display it in its correct sky position, allowing comparison with observations from other observatories. An increasing number of software developers are implementing AVM support in applications and an online image archive for tagged images is under development at the Spitzer Science Center. Tagging images following the AVM offers ever-increasing benefits to public-friendly imagery in all its standard forms (JPEG, TIFF, PNG). The AVM standard is one part of the Virtual Astronomy Multimedia Project (VAMP); http://www.communicatingastronomy.org

  17. The future of event-level information repositories, indexing, and selection in ATLAS

    Barberis, D; The ATLAS collaboration; Dimitrov, G; Doherty, T; Gallas, E; Hrivnac, J; Malon, D; Nairz, A; Nowak, M; Quilty, D; Sorokoletov, R; Van Gemmeren, P; Zhang, Q

    2014-01-01

    ATLAS maintains a rich corpus of event-by-event information that provides a global view of virtually all of the billions of events the collaboration has seen or simulated, along with sufficient auxiliary information to navigate to and retrieve data for any event at any production processing stage. This unique resource has been employed for a range of purposes, from monitoring, statistics, anomaly detection, and integrity checking to event picking, subset selection, and sample extraction. Recent years of data-taking provide a foundation for assessment of how this resource has and has not been used in practice, of the uses for which it should be optimized, of how it should be deployed and provisioned for scalability to future data volumes, and of the areas in which enhancements to functionality would be most valuable. \

  18. Study on high-level waste geological disposal metadata model

    Ding Xiaobin; Wang Changhong; Zhu Hehua; Li Xiaojun

    2008-01-01

    This paper expatiated the concept of metadata and its researches within china and abroad, then explain why start the study on the metadata model of high-level nuclear waste deep geological disposal project. As reference to GML, the author first set up DML under the framework of digital underground space engineering. Based on DML, a standardized metadata employed in high-level nuclear waste deep geological disposal project is presented. Then, a Metadata Model with the utilization of internet is put forward. With the standardized data and CSW services, this model may solve the problem in the data sharing and exchanging of different data form A metadata editor is build up in order to search and maintain metadata based on this model. (authors)

  19. An emergent theory of digital library metadata enrich then filter

    Stevens, Brett

    2015-01-01

    An Emergent Theory of Digital Library Metadata is a reaction to the current digital library landscape that is being challenged with growing online collections and changing user expectations. The theory provides the conceptual underpinnings for a new approach which moves away from expert defined standardised metadata to a user driven approach with users as metadata co-creators. Moving away from definitive, authoritative, metadata to a system that reflects the diversity of users’ terminologies, it changes the current focus on metadata simplicity and efficiency to one of metadata enriching, which is a continuous and evolving process of data linking. From predefined description to information conceptualised, contextualised and filtered at the point of delivery. By presenting this shift, this book provides a coherent structure in which future technological developments can be considered.

  20. The role of metadata in managing large environmental science datasets. Proceedings

    Melton, R.B.; DeVaney, D.M. [eds.] [Pacific Northwest Lab., Richland, WA (United States); French, J. C. [Univ. of Virginia, (United States)

    1995-06-01

    The purpose of this workshop was to bring together computer science researchers and environmental sciences data management practitioners to consider the role of metadata in managing large environmental sciences datasets. The objectives included: establishing a common definition of metadata; identifying categories of metadata; defining problems in managing metadata; and defining problems related to linking metadata with primary data.

  1. The NOAA OneStop System: From Well-Curated Metadata to Data Discovery

    McQuinn, E.; Jakositz, A.; Caldwell, A.; Delk, Z.; Neufeld, D.; Shapiro, J.; Partee, R.; Milan, A.

    2017-12-01

    The NOAA OneStop project is a pathfinder in the realm of enabling users to search for, discover, and access NOAA data. As the project continues along its path to maturity, it has become evident that three areas are of utmost importance to its success in the Earth science community: ensuring quality metadata, building a robust and scalable backend architecture, and keeping the user interface simple to use. Why is this the case? Because, simply put, we are dealing with all aspects of a Big Data problem: large volumes of disparate data needing to be quickly and easily processed and retrieved. In this presentation we discuss the three key aspects of OneStop architecture and how development in each area must be done through cross-team collaboration in order to succeed. We cover aspects of the web-based user interface and OneStop API and how metadata curators and software engineers have worked together to continually iterate on an ever-improving data discovery tool meant to be used by a variety of users searching across a broad assortment of data types.

  2. CMO: Cruise Metadata Organizer for JAMSTEC Research Cruises

    Fukuda, K.; Saito, H.; Hanafusa, Y.; Vanroosebeke, A.; Kitayama, T.

    2011-12-01

    JAMSTEC's Data Research Center for Marine-Earth Sciences manages and distributes a wide variety of observational data and samples obtained from JAMSTEC research vessels and deep sea submersibles. Generally, metadata are essential to identify data and samples were obtained. In JAMSTEC, cruise metadata include cruise information such as cruise ID, name of vessel, research theme, and diving information such as dive number, name of submersible and position of diving point. They are submitted by chief scientists of research cruises in the Microsoft Excel° spreadsheet format, and registered into a data management database to confirm receipt of observational data files, cruise summaries, and cruise reports. The cruise metadata are also published via "JAMSTEC Data Site for Research Cruises" within two months after end of cruise. Furthermore, these metadata are distributed with observational data, images and samples via several data and sample distribution websites after a publication moratorium period. However, there are two operational issues in the metadata publishing process. One is that duplication efforts and asynchronous metadata across multiple distribution websites due to manual metadata entry into individual websites by administrators. The other is that differential data types or representation of metadata in each website. To solve those problems, we have developed a cruise metadata organizer (CMO) which allows cruise metadata to be connected from the data management database to several distribution websites. CMO is comprised of three components: an Extensible Markup Language (XML) database, an Enterprise Application Integration (EAI) software, and a web-based interface. The XML database is used because of its flexibility for any change of metadata. Daily differential uptake of metadata from the data management database to the XML database is automatically processed via the EAI software. Some metadata are entered into the XML database using the web

  3. A web-based, dynamic metadata interface to MDSplus

    Gardner, Henry J.; Karia, Raju; Manduchi, Gabriele

    2008-01-01

    We introduce the concept of a Fusion Data Grid and discuss the management of metadata within such a Grid. We describe a prototype application which serves fusion data over the internet together with metadata information which can be flexibly created and modified over time. The application interfaces with the MDSplus data acquisition system and it has been designed to capture metadata which is generated by scientists from the post-processing of experimental data. The implementation of dynamic metadata tables using the Java programming language together with an object-relational mapping system, Hibernate, is described in the Appendix

  4. Scalable Techniques for Formal Verification

    Ray, Sandip

    2010-01-01

    This book presents state-of-the-art approaches to formal verification techniques to seamlessly integrate different formal verification methods within a single logical foundation. It should benefit researchers and practitioners looking to get a broad overview of the spectrum of formal verification techniques, as well as approaches to combining such techniques within a single framework. Coverage includes a range of case studies showing how such combination is fruitful in developing a scalable verification methodology for industrial designs. This book outlines both theoretical and practical issue

  5. Developing Scalable Information Security Systems

    Valery Konstantinovich Ablekov

    2013-06-01

    Full Text Available Existing physical security systems has wide range of lacks, including: high cost, a large number of vulnerabilities, problems of modification and support system. This paper covers an actual problem of developing systems without this list of drawbacks. The paper presents the architecture of the information security system, which operates through the network protocol TCP/IP, including the ability to connect different types of devices and integration with existing security systems. The main advantage is a significant increase in system reliability, scalability, both vertically and horizontally, with minimal cost of both financial and time resources.

  6. Demographic Predictors of Event-Level Associations between Alcohol Consumption and Sexual Behavior.

    Wells, Brooke E; Rendina, H Jonathon; Kelly, Brian C; Golub, Sarit A; Parsons, Jeffrey T

    2016-02-01

    Alcohol consumption is associated with sexual behavior and outcomes, though research indicates a variety of moderating factors, including demographic characteristics. To better target interventions aimed at alcohol-related sexual risk behavior, our analyses simultaneously examine demographic predictors of both day- and event-level associations between alcohol consumption and sexual behavior in a sample of young adults (N = 301) who are sexually active and consume alcohol. Young adults (aged 18-29) recruited using time-space sampling and incentivized snowball sampling completed a survey and a timeline follow-back calendar reporting alcohol consumption and sexual behavior in the past 30 days. On a given day, a greater number of drinks consumed was associated with higher likelihood of sex occurring, particularly for women and single participants. During a given sexual event, number of drinks consumed was not associated with condom use, nor did any demographic predictors predict that association. Findings highlight associations between alcohol and sexual behavior, though not between alcohol and sexual risk behavior, highlighting the need for additional research exploring the complex role of alcohol in sexual risk behavior and the need to develop prevention efforts to minimize the role of alcohol in the initiation of sexual encounters.

  7. Inheritance rules for Hierarchical Metadata Based on ISO 19115

    Zabala, A.; Masó, J.; Pons, X.

    2012-04-01

    Mainly, ISO19115 has been used to describe metadata for datasets and services. Furthermore, ISO19115 standard (as well as the new draft ISO19115-1) includes a conceptual model that allows to describe metadata at different levels of granularity structured in hierarchical levels, both in aggregated resources such as particularly series, datasets, and also in more disaggregated resources such as types of entities (feature type), types of attributes (attribute type), entities (feature instances) and attributes (attribute instances). In theory, to apply a complete metadata structure to all hierarchical levels of metadata, from the whole series to an individual feature attributes, is possible, but to store all metadata at all levels is completely impractical. An inheritance mechanism is needed to store each metadata and quality information at the optimum hierarchical level and to allow an ease and efficient documentation of metadata in both an Earth observation scenario such as a multi-satellite mission multiband imagery, as well as in a complex vector topographical map that includes several feature types separated in layers (e.g. administrative limits, contour lines, edification polygons, road lines, etc). Moreover, and due to the traditional split of maps in tiles due to map handling at detailed scales or due to the satellite characteristics, each of the previous thematic layers (e.g. 1:5000 roads for a country) or band (Landsat-5 TM cover of the Earth) are tiled on several parts (sheets or scenes respectively). According to hierarchy in ISO 19115, the definition of general metadata can be supplemented by spatially specific metadata that, when required, either inherits or overrides the general case (G.1.3). Annex H of this standard states that only metadata exceptions are defined at lower levels, so it is not necessary to generate the full registry of metadata for each level but to link particular values to the general value that they inherit. Conceptually the metadata

  8. Semantic Web: Metadata, Linked Data, Open Data

    Vanessa Russo

    2015-12-01

    Full Text Available What's the Semantic Web? What's the use? The inventor of the Web Tim Berners-Lee describes it as a research methodology able to take advantage of the network to its maximum capacity. This metadata system represents the innovative element through web 2.0 to web 3.0. In this context will try to understand what are the theoretical and informatic requirements of the Semantic Web. Finally will explain Linked Data applications to develop new tools for active citizenship.

  9. Information resource description creating and managing metadata

    Hider, Philip

    2012-01-01

    An overview of the field of information organization that examines resource description as both a product and process of the contemporary digital environment.This timely book employs the unifying mechanism of the semantic web and the resource description framework to integrate the various traditions and practices of information and knowledge organization. Uniquely, it covers both the domain-specific traditions and practices and the practices of the ?metadata movement' through a single lens ? that of resource description in the broadest, semantic web sense.This approach more readily accommodate

  10. Metadata: A user`s view

    Bretherton, F.P. [Univ. of Wisconsin, Madison, WI (United States); Singley, P.T. [Oak Ridge National Lab., TN (United States)

    1994-12-31

    An analysis is presented of the uses of metadata from four aspects of database operations: (1) search, query, retrieval, (2) ingest, quality control, processing, (3) application to application transfer; (4) storage, archive. Typical degrees of database functionality ranging from simple file retrieval to interdisciplinary global query with metadatabase-user dialog and involving many distributed autonomous databases, are ranked in approximate order of increasing sophistication of the required knowledge representation. An architecture is outlined for implementing such functionality in many different disciplinary domains utilizing a variety of off the shelf database management subsystems and processor software, each specialized to a different abstract data model.

  11. Multi-Unit Initiating Event Analysis for a Single-Unit Internal Events Level 1 PSA

    Kim, Dong San; Park, Jin Hee; Lim, Ho Gon [KAERI, Daejeon (Korea, Republic of)

    2016-05-15

    The Fukushima nuclear accident in 2011 highlighted the importance of considering the risks from multi-unit accidents at a site. The ASME/ANS probabilistic risk assessment (PRA) standard also includes some requirements related to multi-unit aspects, one of which (IE-B5) is as follows: 'For multi-unit sites with shared systems, DO NOT SUBSUME multi-unit initiating events if they impact mitigation capability [1].' However, the existing single-unit PSA models do not explicitly consider multi-unit initiating events and hence systems shared by multiple units (e.g., alternate AC diesel generator) are fully credited for the single unit and ignores the need for the shared systems by other units at the same site [2]. This paper describes the results of the multi-unit initiating event (IE) analysis performed as a part of the at-power internal events Level 1 probabilistic safety assessment (PSA) for an OPR1000 single unit ('reference unit'). In this study, a multi-unit initiating event analysis for a single-unit PSA was performed, and using the results, dual-unit LOOP initiating event was added to the existing PSA model for the reference unit (OPR1000 type). Event trees were developed for dual-unit LOOP and dual-unit SBO which can be transferred from dual- unit LOOP. Moreover, CCF basic events for 5 diesel generators were modelled. In case of simultaneous SBO occurrences in both units, this study compared two different assumptions on the availability of the AAC D/G. As a result, when dual-unit LOOP initiating event was added to the existing single-unit PSA model, the total CDF increased by 1∼ 2% depending on the probability that the AAC D/G is available to a specific unit in case of simultaneous SBO in both units.

  12. Scalable global grid catalogue for Run3 and beyond

    Martinez Pedreira, M.; Grigoras, C.; ALICE Collaboration

    2017-10-01

    The AliEn (ALICE Environment) file catalogue is a global unique namespace providing mapping between a UNIX-like logical name structure and the corresponding physical files distributed over 80 storage elements worldwide. Powerful search tools and hierarchical metadata information are integral parts of the system and are used by the Grid jobs as well as local users to store and access all files on the Grid storage elements. The catalogue has been in production since 2005 and over the past 11 years has grown to more than 2 billion logical file names. The backend is a set of distributed relational databases, ensuring smooth growth and fast access. Due to the anticipated fast future growth, we are looking for ways to enhance the performance and scalability by simplifying the catalogue schema while keeping the functionality intact. We investigated different backend solutions, such as distributed key value stores, as replacement for the relational database. This contribution covers the architectural changes in the system, together with the technology evaluation, benchmark results and conclusions.

  13. Metadata Laws, Journalism and Resistance in Australia

    Benedetta Brevini

    2017-03-01

    Full Text Available The intelligence leaks from Edward Snowden in 2013 unveiled the sophistication and extent of data collection by the United States’ National Security Agency and major global digital firms prompting domestic and international debates about the balance between security and privacy, openness and enclosure, accountability and secrecy. It is difficult not to see a clear connection with the Snowden leaks in the sharp acceleration of new national security legislations in Australia, a long term member of the Five Eyes Alliance. In October 2015, the Australian federal government passed controversial laws that require telecommunications companies to retain the metadata of their customers for a period of two years. The new acts pose serious threats for the profession of journalism as they enable government agencies to easily identify and pursue journalists’ sources. Bulk data collections of this type of information deter future whistleblowers from approaching journalists, making the performance of the latter’s democratic role a challenge. After situating this debate within the scholarly literature at the intersection between surveillance studies and communication studies, this article discusses the political context in which journalists are operating and working in Australia; assesses how metadata laws have affected journalism practices and addresses the possibility for resistance.

  14. SPASE, Metadata, and the Heliophysics Virtual Observatories

    Thieman, James; King, Todd; Roberts, Aaron

    2010-01-01

    To provide data search and access capability in the field of Heliophysics (the study of the Sun and its effects on the Solar System, especially the Earth) a number of Virtual Observatories (VO) have been established both via direct funding from the U.S. National Aeronautics and Space Administration (NASA) and through other funding agencies in the U.S. and worldwide. At least 15 systems can be labeled as Virtual Observatories in the Heliophysics community, 9 of them funded by NASA. The problem is that different metadata and data search approaches are used by these VO's and a search for data relevant to a particular research question can involve consulting with multiple VO's - needing to learn a different approach for finding and acquiring data for each. The Space Physics Archive Search and Extract (SPASE) project is intended to provide a common data model for Heliophysics data and therefore a common set of metadata for searches of the VO's. The SPASE Data Model has been developed through the common efforts of the Heliophysics Data and Model Consortium (HDMC) representatives over a number of years. We currently have released Version 2.1 of the Data Model. The advantages and disadvantages of the Data Model will be discussed along with the plans for the future. Recent changes requested by new members of the SPASE community indicate some of the directions for further development.

  15. Metadata Access Tool for Climate and Health

    Trtanji, J.

    2012-12-01

    The need for health information resources to support climate change adaptation and mitigation decisions is growing, both in the United States and around the world, as the manifestations of climate change become more evident and widespread. In many instances, these information resources are not specific to a changing climate, but have either been developed or are highly relevant for addressing health issues related to existing climate variability and weather extremes. To help address the need for more integrated data, the Interagency Cross-Cutting Group on Climate Change and Human Health, a working group of the U.S. Global Change Research Program, has developed the Metadata Access Tool for Climate and Health (MATCH). MATCH is a gateway to relevant information that can be used to solve problems at the nexus of climate science and public health by facilitating research, enabling scientific collaborations in a One Health approach, and promoting data stewardship that will enhance the quality and application of climate and health research. MATCH is a searchable clearinghouse of publicly available Federal metadata including monitoring and surveillance data sets, early warning systems, and tools for characterizing the health impacts of global climate change. Examples of relevant databases include the Centers for Disease Control and Prevention's Environmental Public Health Tracking System and NOAA's National Climate Data Center's national and state temperature and precipitation data. This presentation will introduce the audience to this new web-based geoportal and demonstrate its features and potential applications.

  16. Metadata as a means for correspondence on digital media

    Stouffs, R.; Kooistra, J.; Tuncer, B.

    2004-01-01

    Metadata derive their action from their association to data and from the relationship they maintain with this data. An interpretation of this action is that the metadata lays claim to the data collection to which it is associated, where the claim is successful if the data collection gains quality as

  17. Learning Object Metadata in a Web-Based Learning Environment

    Avgeriou, Paris; Koutoumanos, Anastasios; Retalis, Symeon; Papaspyrou, Nikolaos

    2000-01-01

    The plethora and variance of learning resources embedded in modern web-based learning environments require a mechanism to enable their structured administration. This goal can be achieved by defining metadata on them and constructing a system that manages the metadata in the context of the learning

  18. Shared Geospatial Metadata Repository for Ontario University Libraries: Collaborative Approaches

    Forward, Erin; Leahey, Amber; Trimble, Leanne

    2015-01-01

    Successfully providing access to special collections of digital geospatial data in academic libraries relies upon complete and accurate metadata. Creating and maintaining metadata using specialized standards is a formidable challenge for libraries. The Ontario Council of University Libraries' Scholars GeoPortal project, which created a shared…

  19. Developing Cyberinfrastructure Tools and Services for Metadata Quality Evaluation

    Mecum, B.; Gordon, S.; Habermann, T.; Jones, M. B.; Leinfelder, B.; Powers, L. A.; Slaughter, P.

    2016-12-01

    Metadata and data quality are at the core of reusable and reproducible science. While great progress has been made over the years, much of the metadata collected only addresses data discovery, covering concepts such as titles and keywords. Improving metadata beyond the discoverability plateau means documenting detailed concepts within the data such as sampling protocols, instrumentation used, and variables measured. Given that metadata commonly do not describe their data at this level, how might we improve the state of things? Giving scientists and data managers easy to use tools to evaluate metadata quality that utilize community-driven recommendations is the key to producing high-quality metadata. To achieve this goal, we created a set of cyberinfrastructure tools and services that integrate with existing metadata and data curation workflows which can be used to improve metadata and data quality across the sciences. These tools work across metadata dialects (e.g., ISO19115, FGDC, EML, etc.) and can be used to assess aspects of quality beyond what is internal to the metadata such as the congruence between the metadata and the data it describes. The system makes use of a user-friendly mechanism for expressing a suite of checks as code in popular data science programming languages such as Python and R. This reduces the burden on scientists and data managers to learn yet another language. We demonstrated these services and tools in three ways. First, we evaluated a large corpus of datasets in the DataONE federation of data repositories against a metadata recommendation modeled after existing recommendations such as the LTER best practices and the Attribute Convention for Dataset Discovery (ACDD). Second, we showed how this service can be used to display metadata and data quality information to data producers during the data submission and metadata creation process, and to data consumers through data catalog search and access tools. Third, we showed how the centrally

  20. Making the Case for Embedded Metadata in Digital Images

    Smith, Kari R.; Saunders, Sarah; Kejser, U.B.

    2014-01-01

    This paper discusses the standards, methods, use cases, and opportunities for using embedded metadata in digital images. In this paper we explain the past and current work engaged with developing specifications, standards for embedding metadata of different types, and the practicalities of data...... exchange in heritage institutions and the culture sector. Our examples and findings support the case for embedded metadata in digital images and the opportunities for such use more broadly in non-heritage sectors as well. We encourage the adoption of embedded metadata by digital image content creators...... and curators as well as those developing software and hardware that support the creation or re-use of digital images. We conclude that the usability of born digital images as well as physical objects that are digitized can be extended and the files preserved more readily with embedded metadata....

  1. Managing ebook metadata in academic libraries taming the tiger

    Frederick, Donna E

    2016-01-01

    Managing ebook Metadata in Academic Libraries: Taming the Tiger tackles the topic of ebooks in academic libraries, a trend that has been welcomed by students, faculty, researchers, and library staff. However, at the same time, the reality of acquiring ebooks, making them discoverable, and managing them presents library staff with many new challenges. Traditional methods of cataloging and managing library resources are no longer relevant where the purchasing of ebooks in packages and demand driven acquisitions are the predominant models for acquiring new content. Most academic libraries have a complex metadata environment wherein multiple systems draw upon the same metadata for different purposes. This complexity makes the need for standards-based interoperable metadata more important than ever. In addition to complexity, the nature of the metadata environment itself typically varies slightly from library to library making it difficult to recommend a single set of practices and procedures which would be releva...

  2. Interpreting the ASTM 'content standard for digital geospatial metadata'

    Nebert, Douglas D.

    1996-01-01

    ASTM and the Federal Geographic Data Committee have developed a content standard for spatial metadata to facilitate documentation, discovery, and retrieval of digital spatial data using vendor-independent terminology. Spatial metadata elements are identifiable quality and content characteristics of a data set that can be tied to a geographic location or area. Several Office of Management and Budget Circulars and initiatives have been issued that specify improved cataloguing of and accessibility to federal data holdings. An Executive Order further requires the use of the metadata content standard to document digital spatial data sets. Collection and reporting of spatial metadata for field investigations performed for the federal government is an anticipated requirement. This paper provides an overview of the draft spatial metadata content standard and a description of how the standard could be applied to investigations collecting spatially-referenced field data.

  3. Making the Case for Embedded Metadata in Digital Images

    Smith, Kari R.; Saunders, Sarah; Kejser, U.B.

    2014-01-01

    exchange in heritage institutions and the culture sector. Our examples and findings support the case for embedded metadata in digital images and the opportunities for such use more broadly in non-heritage sectors as well. We encourage the adoption of embedded metadata by digital image content creators......This paper discusses the standards, methods, use cases, and opportunities for using embedded metadata in digital images. In this paper we explain the past and current work engaged with developing specifications, standards for embedding metadata of different types, and the practicalities of data...... and curators as well as those developing software and hardware that support the creation or re-use of digital images. We conclude that the usability of born digital images as well as physical objects that are digitized can be extended and the files preserved more readily with embedded metadata....

  4. Department of the Interior metadata implementation guide—Framework for developing the metadata component for data resource management

    Obuch, Raymond C.; Carlino, Jennifer; Zhang, Lin; Blythe, Jonathan; Dietrich, Christopher; Hawkinson, Christine

    2018-04-12

    The Department of the Interior (DOI) is a Federal agency with over 90,000 employees across 10 bureaus and 8 agency offices. Its primary mission is to protect and manage the Nation’s natural resources and cultural heritage; provide scientific and other information about those resources; and honor its trust responsibilities or special commitments to American Indians, Alaska Natives, and affiliated island communities. Data and information are critical in day-to-day operational decision making and scientific research. DOI is committed to creating, documenting, managing, and sharing high-quality data and metadata in and across its various programs that support its mission. Documenting data through metadata is essential in realizing the value of data as an enterprise asset. The completeness, consistency, and timeliness of metadata affect users’ ability to search for and discover the most relevant data for the intended purpose; and facilitates the interoperability and usability of these data among DOI bureaus and offices. Fully documented metadata describe data usability, quality, accuracy, provenance, and meaning.Across DOI, there are different maturity levels and phases of information and metadata management implementations. The Department has organized a committee consisting of bureau-level points-of-contacts to collaborate on the development of more consistent, standardized, and more effective metadata management practices and guidance to support this shared mission and the information needs of the Department. DOI’s metadata implementation plans establish key roles and responsibilities associated with metadata management processes, procedures, and a series of actions defined in three major metadata implementation phases including: (1) Getting started—Planning Phase, (2) Implementing and Maintaining Operational Metadata Management Phase, and (3) the Next Steps towards Improving Metadata Management Phase. DOI’s phased approach for metadata management addresses

  5. Scalable Performance Measurement and Analysis

    Gamblin, Todd [Univ. of North Carolina, Chapel Hill, NC (United States)

    2009-01-01

    Concurrency levels in large-scale, distributed-memory supercomputers are rising exponentially. Modern machines may contain 100,000 or more microprocessor cores, and the largest of these, IBM's Blue Gene/L, contains over 200,000 cores. Future systems are expected to support millions of concurrent tasks. In this dissertation, we focus on efficient techniques for measuring and analyzing the performance of applications running on very large parallel machines. Tuning the performance of large-scale applications can be a subtle and time-consuming task because application developers must measure and interpret data from many independent processes. While the volume of the raw data scales linearly with the number of tasks in the running system, the number of tasks is growing exponentially, and data for even small systems quickly becomes unmanageable. Transporting performance data from so many processes over a network can perturb application performance and make measurements inaccurate, and storing such data would require a prohibitive amount of space. Moreover, even if it were stored, analyzing the data would be extremely time-consuming. In this dissertation, we present novel methods for reducing performance data volume. The first draws on multi-scale wavelet techniques from signal processing to compress systemwide, time-varying load-balance data. The second uses statistical sampling to select a small subset of running processes to generate low-volume traces. A third approach combines sampling and wavelet compression to stratify performance data adaptively at run-time and to reduce further the cost of sampled tracing. We have integrated these approaches into Libra, a toolset for scalable load-balance analysis. We present Libra and show how it can be used to analyze data from large scientific applications scalably.

  6. The XML Metadata Editor of GFZ Data Services

    Ulbricht, Damian; Elger, Kirsten; Tesei, Telemaco; Trippanera, Daniele

    2017-04-01

    Following the FAIR data principles, research data should be Findable, Accessible, Interoperable and Reuseable. Publishing data under these principles requires to assign persistent identifiers to the data and to generate rich machine-actionable metadata. To increase the interoperability, metadata should include shared vocabularies and crosslink the newly published (meta)data and related material. However, structured metadata formats tend to be complex and are not intended to be generated by individual scientists. Software solutions are needed that support scientists in providing metadata describing their data. To facilitate data publication activities of 'GFZ Data Services', we programmed an XML metadata editor that assists scientists to create metadata in different schemata popular in the earth sciences (ISO19115, DIF, DataCite), while being at the same time usable by and understandable for scientists. Emphasis is placed on removing barriers, in particular the editor is publicly available on the internet without registration [1] and the scientists are not requested to provide information that may be generated automatically (e.g. the URL of a specific licence or the contact information of the metadata distributor). Metadata are stored in browser cookies and a copy can be saved to the local hard disk. To improve usability, form fields are translated into the scientific language, e.g. 'creators' of the DataCite schema are called 'authors'. To assist filling in the form, we make use of drop down menus for small vocabulary lists and offer a search facility for large thesauri. Explanations to form fields and definitions of vocabulary terms are provided in pop-up windows and a full documentation is available for download via the help menu. In addition, multiple geospatial references can be entered via an interactive mapping tool, which helps to minimize problems with different conventions to provide latitudes and longitudes. Currently, we are extending the metadata editor

  7. Taxonomic names, metadata, and the Semantic Web

    Roderic D. M. Page

    2006-01-01

    Full Text Available Life Science Identifiers (LSIDs offer an attractive solution to the problem of globally unique identifiers for digital objects in biology. However, I suggest that in the context of taxonomic names, the most compelling benefit of adopting these identifiers comes from the metadata associated with each LSID. By using existing vocabularies wherever possible, and using a simple vocabulary for taxonomy-specific concepts we can quickly capture the essential information about a taxonomic name in the Resource Description Framework (RDF format. This opens up the prospect of using technologies developed for the Semantic Web to add ``taxonomic intelligence" to biodiversity databases. This essay explores some of these ideas in the context of providing a taxonomic framework for the phylogenetic database TreeBASE.

  8. Evolution of the ATLAS Metadata Interface (AMI)

    Odier, Jerome; The ATLAS collaboration; Fulachier, Jerome; Lambert, Fabian

    2015-01-01

    The ATLAS Metadata Interface (AMI) can be considered to be a mature application because it has existed for at least 10 years. Over the years, the number of users and the number of functions provided for these users has increased. It has been necessary to adapt the hardware infrastructure in a seamless way so that the Quality of Service remains high. We will describe the evolution of the application from the initial one, using single server with a MySQL backend database, to the current state, where we use a cluster of Virtual Machines on the French Tier 1 Cloud at Lyon, an ORACLE database backend also at Lyon, with replication to CERN using ORACLE streams behind a back-up server.

  9. Semantic Metadata for Heterogeneous Spatial Planning Documents

    Iwaniak, A.; Kaczmarek, I.; Łukowicz, J.; Strzelecki, M.; Coetzee, S.; Paluszyński, W.

    2016-09-01

    Spatial planning documents contain information about the principles and rights of land use in different zones of a local authority. They are the basis for administrative decision making in support of sustainable development. In Poland these documents are published on the Web according to a prescribed non-extendable XML schema, designed for optimum presentation to humans in HTML web pages. There is no document standard, and limited functionality exists for adding references to external resources. The text in these documents is discoverable and searchable by general-purpose web search engines, but the semantics of the content cannot be discovered or queried. The spatial information in these documents is geographically referenced but not machine-readable. Major manual efforts are required to integrate such heterogeneous spatial planning documents from various local authorities for analysis, scenario planning and decision support. This article presents results of an implementation using machine-readable semantic metadata to identify relationships among regulations in the text, spatial objects in the drawings and links to external resources. A spatial planning ontology was used to annotate different sections of spatial planning documents with semantic metadata in the Resource Description Framework in Attributes (RDFa). The semantic interpretation of the content, links between document elements and links to external resources were embedded in XHTML pages. An example and use case from the spatial planning domain in Poland is presented to evaluate its efficiency and applicability. The solution enables the automated integration of spatial planning documents from multiple local authorities to assist decision makers with understanding and interpreting spatial planning information. The approach is equally applicable to legal documents from other countries and domains, such as cultural heritage and environmental management.

  10. SEMANTIC METADATA FOR HETEROGENEOUS SPATIAL PLANNING DOCUMENTS

    A. Iwaniak

    2016-09-01

    Full Text Available Spatial planning documents contain information about the principles and rights of land use in different zones of a local authority. They are the basis for administrative decision making in support of sustainable development. In Poland these documents are published on the Web according to a prescribed non-extendable XML schema, designed for optimum presentation to humans in HTML web pages. There is no document standard, and limited functionality exists for adding references to external resources. The text in these documents is discoverable and searchable by general-purpose web search engines, but the semantics of the content cannot be discovered or queried. The spatial information in these documents is geographically referenced but not machine-readable. Major manual efforts are required to integrate such heterogeneous spatial planning documents from various local authorities for analysis, scenario planning and decision support. This article presents results of an implementation using machine-readable semantic metadata to identify relationships among regulations in the text, spatial objects in the drawings and links to external resources. A spatial planning ontology was used to annotate different sections of spatial planning documents with semantic metadata in the Resource Description Framework in Attributes (RDFa. The semantic interpretation of the content, links between document elements and links to external resources were embedded in XHTML pages. An example and use case from the spatial planning domain in Poland is presented to evaluate its efficiency and applicability. The solution enables the automated integration of spatial planning documents from multiple local authorities to assist decision makers with understanding and interpreting spatial planning information. The approach is equally applicable to legal documents from other countries and domains, such as cultural heritage and environmental management.

  11. Requirements for Scalable Access Control and Security Management Architectures

    Keromytis, Angelos D; Smith, Jonathan M

    2005-01-01

    Maximizing local autonomy has led to a scalable Internet. Scalability and the capacity for distributed control have unfortunately not extended well to resource access control policies and mechanisms...

  12. Design and Implementation of a Metadata-rich File System

    Ames, S; Gokhale, M B; Maltzahn, C

    2010-01-19

    Despite continual improvements in the performance and reliability of large scale file systems, the management of user-defined file system metadata has changed little in the past decade. The mismatch between the size and complexity of large scale data stores and their ability to organize and query their metadata has led to a de facto standard in which raw data is stored in traditional file systems, while related, application-specific metadata is stored in relational databases. This separation of data and semantic metadata requires considerable effort to maintain consistency and can result in complex, slow, and inflexible system operation. To address these problems, we have developed the Quasar File System (QFS), a metadata-rich file system in which files, user-defined attributes, and file relationships are all first class objects. In contrast to hierarchical file systems and relational databases, QFS defines a graph data model composed of files and their relationships. QFS incorporates Quasar, an XPATH-extended query language for searching the file system. Results from our QFS prototype show the effectiveness of this approach. Compared to the de facto standard, the QFS prototype shows superior ingest performance and comparable query performance on user metadata-intensive operations and superior performance on normal file metadata operations.

  13. Adaptive format conversion for scalable video coding

    Wan, Wade K.; Lim, Jae S.

    2001-12-01

    The enhancement layer in many scalable coding algorithms is composed of residual coding information. There is another type of information that can be transmitted instead of (or in addition to) residual coding. Since the encoder has access to the original sequence, it can utilize adaptive format conversion (AFC) to generate the enhancement layer and transmit the different format conversion methods as enhancement data. This paper investigates the use of adaptive format conversion information as enhancement data in scalable video coding. Experimental results are shown for a wide range of base layer qualities and enhancement bitrates to determine when AFC can improve video scalability. Since the parameters needed for AFC are small compared to residual coding, AFC can provide video scalability at low enhancement layer bitrates that are not possible with residual coding. In addition, AFC can also be used in addition to residual coding to improve video scalability at higher enhancement layer bitrates. Adaptive format conversion has not been studied in detail, but many scalable applications may benefit from it. An example of an application that AFC is well-suited for is the migration path for digital television where AFC can provide immediate video scalability as well as assist future migrations.

  14. Metadata Creation, Management and Search System for your Scientific Data

    Devarakonda, R.; Palanisamy, G.

    2012-12-01

    Mercury Search Systems is a set of tools for creating, searching, and retrieving of biogeochemical metadata. Mercury toolset provides orders of magnitude improvements in search speed, support for any metadata format, integration with Google Maps for spatial queries, multi-facetted type search, search suggestions, support for RSS (Really Simple Syndication) delivery of search results, and enhanced customization to meet the needs of the multiple projects that use Mercury. Mercury's metadata editor provides a easy way for creating metadata and Mercury's search interface provides a single portal to search for data and information contained in disparate data management systems, each of which may use any metadata format including FGDC, ISO-19115, Dublin-Core, Darwin-Core, DIF, ECHO, and EML. Mercury harvests metadata and key data from contributing project servers distributed around the world and builds a centralized index. The search interfaces then allow the users to perform a variety of fielded, spatial, and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury is being used more than 14 different projects across 4 federal agencies. It was originally developed for NASA, with continuing development funded by NASA, USGS, and DOE for a consortium of projects. Mercury search won the NASA's Earth Science Data Systems Software Reuse Award in 2008. References: R. Devarakonda, G. Palanisamy, B.E. Wilson, and J.M. Green, "Mercury: reusable metadata management data discovery and access system", Earth Science Informatics, vol. 3, no. 1, pp. 87-94, May 2010. R. Devarakonda, G. Palanisamy, J.M. Green, B.E. Wilson, "Data sharing and retrieval using OAI-PMH", Earth Science Informatics DOI: 10.1007/s12145-010-0073-0, (2010);

  15. Improving Metadata Compliance for Earth Science Data Records

    Armstrong, E. M.; Chang, O.; Foster, D.

    2014-12-01

    One of the recurring challenges of creating earth science data records is to ensure a consistent level of metadata compliance at the granule level where important details of contents, provenance, producer, and data references are necessary to obtain a sufficient level of understanding. These details are important not just for individual data consumers but also for autonomous software systems. Two of the most popular metadata standards at the granule level are the Climate and Forecast (CF) Metadata Conventions and the Attribute Conventions for Dataset Discovery (ACDD). Many data producers have implemented one or both of these models including the Group for High Resolution Sea Surface Temperature (GHRSST) for their global SST products and the Ocean Biology Processing Group for NASA ocean color and SST products. While both the CF and ACDD models contain various level of metadata richness, the actual "required" attributes are quite small in number. Metadata at the granule level becomes much more useful when recommended or optional attributes are implemented that document spatial and temporal ranges, lineage and provenance, sources, keywords, and references etc. In this presentation we report on a new open source tool to check the compliance of netCDF and HDF5 granules to the CF and ACCD metadata models. The tool, written in Python, was originally implemented to support metadata compliance for netCDF records as part of the NOAA's Integrated Ocean Observing System. It outputs standardized scoring for metadata compliance for both CF and ACDD, produces an objective summary weight, and can be implemented for remote records via OPeNDAP calls. Originally a command-line tool, we have extended it to provide a user-friendly web interface. Reports on metadata testing are grouped in hierarchies that make it easier to track flaws and inconsistencies in the record. We have also extended it to support explicit metadata structures and semantic syntax for the GHRSST project that can be

  16. ALADDIN - enhancing applicability and scalability

    Roverso, Davide

    2001-02-01

    The ALADDIN project aims at the study and development of flexible, accurate, and reliable techniques and principles for computerised event classification and fault diagnosis for complex machinery and industrial processes. The main focus of the project is on advanced numerical techniques, such as wavelets, and empirical modelling with neural networks. This document reports on recent important advancements, which significantly widen the practical applicability of the developed principles, both in terms of flexibility of use, and in terms of scalability to large problem domains. In particular, two novel techniques are here described. The first, which we call Wavelet On- Line Pre-processing (WOLP), is aimed at extracting, on-line, relevant dynamic features from the process data streams. This technique allows a system a greater flexibility in detecting and processing transients at a range of different time scales. The second technique, which we call Autonomous Recursive Task Decomposition (ARTD), is aimed at tackling the problem of constructing a classifier able to discriminate among a large number of different event/fault classes, which is often the case when the application domain is a complex industrial process. ARTD also allows for incremental application development (i.e. the incremental addition of new classes to an existing classifier, without the need of retraining the entire system), and for simplified application maintenance. The description of these novel techniques is complemented by reports of quantitative experiments that show in practice the extent of these improvements. (Author)

  17. Fast and scalable inequality joins

    Khayyat, Zuhair

    2016-09-07

    Inequality joins, which is to join relations with inequality conditions, are used in various applications. Optimizing joins has been the subject of intensive research ranging from efficient join algorithms such as sort-merge join, to the use of efficient indices such as (Formula presented.)-tree, (Formula presented.)-tree and Bitmap. However, inequality joins have received little attention and queries containing such joins are notably very slow. In this paper, we introduce fast inequality join algorithms based on sorted arrays and space-efficient bit-arrays. We further introduce a simple method to estimate the selectivity of inequality joins which is then used to optimize multiple predicate queries and multi-way joins. Moreover, we study an incremental inequality join algorithm to handle scenarios where data keeps changing. We have implemented a centralized version of these algorithms on top of PostgreSQL, a distributed version on top of Spark SQL, and an existing data cleaning system, Nadeef. By comparing our algorithms against well-known optimization techniques for inequality joins, we show our solution is more scalable and several orders of magnitude faster. © 2016 Springer-Verlag Berlin Heidelberg

  18. Embedded High Performance Scalable Computing Systems

    Ngo, David

    2003-01-01

    The Embedded High Performance Scalable Computing Systems (EHPSCS) program is a cooperative agreement between Sanders, A Lockheed Martin Company and DARPA that ran for three years, from Apr 1995 - Apr 1998...

  19. Ontology-based Metadata Portal for Unified Semantics

    National Aeronautics and Space Administration — The Ontology-based Metadata Portal for Unified Semantics (OlyMPUS) will extend the prototype Ontology-Driven Interactive Search Environment for Earth Sciences...

  20. Distributed metadata in a high performance computing environment

    Bent, John M.; Faibish, Sorin; Zhang, Zhenhua; Liu, Xuezhao; Tang, Haiying

    2017-07-11

    A computer-executable method, system, and computer program product for managing meta-data in a distributed storage system, wherein the distributed storage system includes one or more burst buffers enabled to operate with a distributed key-value store, the co computer-executable method, system, and computer program product comprising receiving a request for meta-data associated with a block of data stored in a first burst buffer of the one or more burst buffers in the distributed storage system, wherein the meta data is associated with a key-value, determining which of the one or more burst buffers stores the requested metadata, and upon determination that a first burst buffer of the one or more burst buffers stores the requested metadata, locating the key-value in a portion of the distributed key-value store accessible from the first burst buffer.

  1. USGS 24k Digital Raster Graphic (DRG) Metadata

    Minnesota Department of Natural Resources — Metadata for the scanned USGS 24k Topograpic Map Series (also known as 24k Digital Raster Graphic). Each scanned map is represented by a polygon in the layer and the...

  2. Metadata and Service at the GFZ ISDC Portal

    Ritschel, B.

    2008-05-01

    The online service portal of the GFZ Potsdam Information System and Data Center (ISDC) is an access point for all manner of geoscientific geodata, its corresponding metadata, scientific documentation and software tools. At present almost 2000 national and international users and user groups have the opportunity to request Earth science data from a portfolio of 275 different products types and more than 20 Million single data files with an added volume of approximately 12 TByte. The majority of the data and information, the portal currently offers to the public, are global geomonitoring products such as satellite orbit and Earth gravity field data as well as geomagnetic and atmospheric data for the exploration. These products for Earths changing system are provided via state-of-the art retrieval techniques. The data product catalog system behind these techniques is based on the extensive usage of standardized metadata, which are describing the different geoscientific product types and data products in an uniform way. Where as all ISDC product types are specified by NASA's Directory Interchange Format (DIF), Version 9.0 Parent XML DIF metadata files, the individual data files are described by extended DIF metadata documents. Depending on the beginning of the scientific project, one part of data files are described by extended DIF, Version 6 metadata documents and the other part are specified by data Child XML DIF metadata documents. Both, the product type dependent parent DIF metadata documents and the data file dependent child DIF metadata documents are derived from a base-DIF.xsd xml schema file. The ISDC metadata philosophy defines a geoscientific product as a package consisting of mostly one or sometimes more than one data file plus one extended DIF metadata file. Because NASA's DIF metadata standard has been developed in order to specify a collection of data only, the extension of the DIF standard consists of new and specific attributes, which are necessary for

  3. The Use of Metadata Visualisation Assist Information Retrieval

    2007-10-01

    album title, the track length and the genre of music . Again, any of these pieces of information can be used to quickly search and locate specific...that person. Music files also have metadata tags, in a format called ID3. This usually contains information such as the artist, the song title, the...tracks, to provide more information about the entire music collection, or to find similar or diverse tracks within the collection. Metadata is

  4. Dealing with metadata quality: the legacy of digital library efforts

    Tani, Alice; Candela, Leonardo; Castelli, Donatella

    2013-01-01

    In this work, we elaborate on the meaning of metadata quality by surveying efforts and experiences matured in the digital library domain. In particular, an overview of the frameworks developed to characterize such a multi-faceted concept is presented. Moreover, the most common quality-related problems affecting metadata both during the creation and the aggregation phase are discussed together with the approaches, technologies and tools developed to mitigate them. This survey on digital librar...

  5. Resource-aware complexity scalability for mobile MPEG encoding

    Mietens, S.O.; With, de P.H.N.; Hentschel, C.; Panchanatan, S.; Vasudev, B.

    2004-01-01

    Complexity scalability attempts to scale the required resources of an algorithm with the chose quality settings, in order to broaden the application range. In this paper, we present complexity-scalable MPEG encoding of which the core processing modules are modified for scalability. Scalability is

  6. Forensic devices for activism: Metadata tracking and public proof

    Lonneke van der Velden

    2015-10-01

    Full Text Available The central topic of this paper is a mobile phone application, ‘InformaCam’, which turns metadata from a surveillance risk into a method for the production of public proof. InformaCam allows one to manage and delete metadata from images and videos in order to diminish surveillance risks related to online tracking. Furthermore, it structures and stores the metadata in such a way that the documentary material becomes better accommodated to evidentiary settings, if needed. In this paper I propose InformaCam should be interpreted as a ‘forensic device’. By using the conceptualization of forensics and work on socio-technical devices the paper discusses how InformaCam, through a range of interventions, rearranges metadata into a technology of evidence. InformaCam explicitly recognizes mobile phones as context aware, uses their sensors, and structures metadata in order to facilitate data analysis after images are captured. Through these modifications it invents a form of ‘sensory data forensics'. By treating data in this particular way, surveillance resistance does more than seeking awareness. It becomes engaged with investigatory practices. Considering the extent by which states conduct metadata surveillance, the project can be seen as a timely response to the unequal distribution of power over data.

  7. A Metadata Schema for Geospatial Resource Discovery Use Cases

    Darren Hardy

    2014-07-01

    Full Text Available We introduce a metadata schema that focuses on GIS discovery use cases for patrons in a research library setting. Text search, faceted refinement, and spatial search and relevancy are among GeoBlacklight's primary use cases for federated geospatial holdings. The schema supports a variety of GIS data types and enables contextual, collection-oriented discovery applications as well as traditional portal applications. One key limitation of GIS resource discovery is the general lack of normative metadata practices, which has led to a proliferation of metadata schemas and duplicate records. The ISO 19115/19139 and FGDC standards specify metadata formats, but are intricate, lengthy, and not focused on discovery. Moreover, they require sophisticated authoring environments and cataloging expertise. Geographic metadata standards target preservation and quality measure use cases, but they do not provide for simple inter-institutional sharing of metadata for discovery use cases. To this end, our schema reuses elements from Dublin Core and GeoRSS to leverage their normative semantics, community best practices, open-source software implementations, and extensive examples already deployed in discovery contexts such as web search and mapping. Finally, we discuss a Solr implementation of the schema using a "geo" extension to MODS.

  8. Using Metadata to Build Geographic Information Sharing Environment on Internet

    Chih-hong Sun

    1999-12-01

    Full Text Available Internet provides a convenient environment to share geographic information. Web GIS (Geographic Information System even provides users a direct access environment to geographic databases through Internet. However, the complexity of geographic data makes it difficult for users to understand the real content and the limitation of geographic information. In some cases, users may misuse the geographic data and make wrong decisions. Meanwhile, geographic data are distributed across various government agencies, academic institutes, and private organizations, which make it even more difficult for users to fully understand the content of these complex data. To overcome these difficulties, this research uses metadata as a guiding mechanism for users to fully understand the content and the limitation of geographic data. We introduce three metadata standards commonly used for geographic data and metadata authoring tools available in the US. We also review the current development of geographic metadata standard in Taiwan. Two metadata authoring tools are developed in this research, which will enable users to build their own geographic metadata easily.[Article content in Chinese

  9. Automated metadata--final project report

    Schissel, David

    2016-01-01

    This report summarizes the work of the Automated Metadata, Provenance Cataloging, and Navigable Interfaces: Ensuring the Usefulness of Extreme-Scale Data Project (MPO Project) funded by the United States Department of Energy (DOE), Offices of Advanced Scientific Computing Research and Fusion Energy Sciences. Initially funded for three years starting in 2012, it was extended for 6 months with additional funding. The project was a collaboration between scientists at General Atomics, Lawrence Berkley National Laboratory (LBNL), and Massachusetts Institute of Technology (MIT). The group leveraged existing computer science technology where possible, and extended or created new capabilities where required. The MPO project was able to successfully create a suite of software tools that can be used by a scientific community to automatically document their scientific workflows. These tools were integrated into workflows for fusion energy and climate research illustrating the general applicability of the project's toolkit. Feedback was very positive on the project's toolkit and the value of such automatic workflow documentation to the scientific endeavor.

  10. Automated metadata--final project report

    Schissel, David [General Atomics, San Diego, CA (United States)

    2016-04-01

    This report summarizes the work of the Automated Metadata, Provenance Cataloging, and Navigable Interfaces: Ensuring the Usefulness of Extreme-Scale Data Project (MPO Project) funded by the United States Department of Energy (DOE), Offices of Advanced Scientific Computing Research and Fusion Energy Sciences. Initially funded for three years starting in 2012, it was extended for 6 months with additional funding. The project was a collaboration between scientists at General Atomics, Lawrence Berkley National Laboratory (LBNL), and Massachusetts Institute of Technology (MIT). The group leveraged existing computer science technology where possible, and extended or created new capabilities where required. The MPO project was able to successfully create a suite of software tools that can be used by a scientific community to automatically document their scientific workflows. These tools were integrated into workflows for fusion energy and climate research illustrating the general applicability of the project’s toolkit. Feedback was very positive on the project’s toolkit and the value of such automatic workflow documentation to the scientific endeavor.

  11. Educational Rationale Metadata for Learning Objects

    Tom Carey

    2002-10-01

    Full Text Available Instructors searching for learning objects in online repositories will be guided in their choices by the content of the object, the characteristics of the learners addressed, and the learning process embodied in the object. We report here on a feasibility study for metadata to record process-oriented information about instructional approaches for learning objects, though a set of Educational Rationale [ER] tags which would allow authors to describe the critical elements in their design intent. The prototype ER tags describe activities which have been demonstrated to be of value in learning, and authors select the activities whose support was critical in their design decisions. The prototype ER tag set consists descriptors of the instructional approach used in the design, plus optional sub-elements for Comments, Importance and Features which implement the design intent. The tag set was tested by creators of four learning object modules, three intended for post-secondary learners and one for K-12 students and their families. In each case the creators reported that the ER tag set allowed them to express succinctly the key instructional approaches embedded in their designs. These results confirmed the overall feasibility of the ER tag approach as a means of capturing design intent from creators of learning objects. Much work remains to be done before a usable ER tag set could be specified, including evaluating the impact of ER tags during design to improve instructional quality of learning objects.

  12. Evolving Metadata in NASA Earth Science Data Systems

    Mitchell, A.; Cechini, M. F.; Walter, J.

    2011-12-01

    NASA's Earth Observing System (EOS) is a coordinated series of satellites for long term global observations. NASA's Earth Observing System Data and Information System (EOSDIS) is a petabyte-scale archive of environmental data that supports global climate change research by providing end-to-end services from EOS instrument data collection to science data processing to full access to EOS and other earth science data. On a daily basis, the EOSDIS ingests, processes, archives and distributes over 3 terabytes of data from NASA's Earth Science missions representing over 3500 data products ranging from various types of science disciplines. EOSDIS is currently comprised of 12 discipline specific data centers that are collocated with centers of science discipline expertise. Metadata is used in all aspects of NASA's Earth Science data lifecycle from the initial measurement gathering to the accessing of data products. Missions use metadata in their science data products when describing information such as the instrument/sensor, operational plan, and geographically region. Acting as the curator of the data products, data centers employ metadata for preservation, access and manipulation of data. EOSDIS provides a centralized metadata repository called the Earth Observing System (EOS) ClearingHouse (ECHO) for data discovery and access via a service-oriented-architecture (SOA) between data centers and science data users. ECHO receives inventory metadata from data centers who generate metadata files that complies with the ECHO Metadata Model. NASA's Earth Science Data and Information System (ESDIS) Project established a Tiger Team to study and make recommendations regarding the adoption of the international metadata standard ISO 19115 in EOSDIS. The result was a technical report recommending an evolution of NASA data systems towards a consistent application of ISO 19115 and related standards including the creation of a NASA-specific convention for core ISO 19115 elements. Part of

  13. The Global Streamflow Indices and Metadata Archive (GSIM – Part 1: The production of a daily streamflow archive and metadata

    H. X. Do

    2018-04-01

    Full Text Available This is the first part of a two-paper series presenting the Global Streamflow Indices and Metadata archive (GSIM, a worldwide collection of metadata and indices derived from more than 35 000 daily streamflow time series. This paper focuses on the compilation of the daily streamflow time series based on 12 free-to-access streamflow databases (seven national databases and five international collections. It also describes the development of three metadata products (freely available at https://doi.pangaea.de/10.1594/PANGAEA.887477: (1 a GSIM catalogue collating basic metadata associated with each time series, (2 catchment boundaries for the contributing area of each gauge, and (3 catchment metadata extracted from 12 gridded global data products representing essential properties such as land cover type, soil type, and climate and topographic characteristics. The quality of the delineated catchment boundary is also made available and should be consulted in GSIM application. The second paper in the series then explores production and analysis of streamflow indices. Having collated an unprecedented number of stations and associated metadata, GSIM can be used to advance large-scale hydrological research and improve understanding of the global water cycle.

  14. The Global Streamflow Indices and Metadata Archive (GSIM) - Part 1: The production of a daily streamflow archive and metadata

    Do, Hong Xuan; Gudmundsson, Lukas; Leonard, Michael; Westra, Seth

    2018-04-01

    This is the first part of a two-paper series presenting the Global Streamflow Indices and Metadata archive (GSIM), a worldwide collection of metadata and indices derived from more than 35 000 daily streamflow time series. This paper focuses on the compilation of the daily streamflow time series based on 12 free-to-access streamflow databases (seven national databases and five international collections). It also describes the development of three metadata products (freely available at https://doi.pangaea.de/10.1594/PANGAEA.887477" target="_blank">https://doi.pangaea.de/10.1594/PANGAEA.887477): (1) a GSIM catalogue collating basic metadata associated with each time series, (2) catchment boundaries for the contributing area of each gauge, and (3) catchment metadata extracted from 12 gridded global data products representing essential properties such as land cover type, soil type, and climate and topographic characteristics. The quality of the delineated catchment boundary is also made available and should be consulted in GSIM application. The second paper in the series then explores production and analysis of streamflow indices. Having collated an unprecedented number of stations and associated metadata, GSIM can be used to advance large-scale hydrological research and improve understanding of the global water cycle.

  15. Using XML to encode TMA DES metadata

    Oliver Lyttleton

    2011-01-01

    Full Text Available Background: The Tissue Microarray Data Exchange Specification (TMA DES is an XML specification for encoding TMA experiment data. While TMA DES data is encoded in XML, the files that describe its syntax, structure, and semantics are not. The DTD format is used to describe the syntax and structure of TMA DES, and the ISO 11179 format is used to define the semantics of TMA DES. However, XML Schema can be used in place of DTDs, and another XML encoded format, RDF, can be used in place of ISO 11179. Encoding all TMA DES data and metadata in XML would simplify the development and usage of programs which validate and parse TMA DES data. XML Schema has advantages over DTDs such as support for data types, and a more powerful means of specifying constraints on data values. An advantage of RDF encoded in XML over ISO 11179 is that XML defines rules for encoding data, whereas ISO 11179 does not. Materials and Methods: We created an XML Schema version of the TMA DES DTD. We wrote a program that converted ISO 11179 definitions to RDF encoded in XML, and used it to convert the TMA DES ISO 11179 definitions to RDF. Results: We validated a sample TMA DES XML file that was supplied with the publication that originally specified TMA DES using our XML Schema. We successfully validated the RDF produced by our ISO 11179 converter with the W3C RDF validation service. Conclusions: All TMA DES data could be encoded using XML, which simplifies its processing. XML Schema allows datatypes and valid value ranges to be specified for CDEs, which enables a wider range of error checking to be performed using XML Schemas than could be performed using DTDs.

  16. Using XML to encode TMA DES metadata.

    Lyttleton, Oliver; Wright, Alexander; Treanor, Darren; Lewis, Paul

    2011-01-01

    The Tissue Microarray Data Exchange Specification (TMA DES) is an XML specification for encoding TMA experiment data. While TMA DES data is encoded in XML, the files that describe its syntax, structure, and semantics are not. The DTD format is used to describe the syntax and structure of TMA DES, and the ISO 11179 format is used to define the semantics of TMA DES. However, XML Schema can be used in place of DTDs, and another XML encoded format, RDF, can be used in place of ISO 11179. Encoding all TMA DES data and metadata in XML would simplify the development and usage of programs which validate and parse TMA DES data. XML Schema has advantages over DTDs such as support for data types, and a more powerful means of specifying constraints on data values. An advantage of RDF encoded in XML over ISO 11179 is that XML defines rules for encoding data, whereas ISO 11179 does not. We created an XML Schema version of the TMA DES DTD. We wrote a program that converted ISO 11179 definitions to RDF encoded in XML, and used it to convert the TMA DES ISO 11179 definitions to RDF. We validated a sample TMA DES XML file that was supplied with the publication that originally specified TMA DES using our XML Schema. We successfully validated the RDF produced by our ISO 11179 converter with the W3C RDF validation service. All TMA DES data could be encoded using XML, which simplifies its processing. XML Schema allows datatypes and valid value ranges to be specified for CDEs, which enables a wider range of error checking to be performed using XML Schemas than could be performed using DTDs.

  17. Using XML to encode TMA DES metadata

    Lyttleton, Oliver; Wright, Alexander; Treanor, Darren; Lewis, Paul

    2011-01-01

    Background: The Tissue Microarray Data Exchange Specification (TMA DES) is an XML specification for encoding TMA experiment data. While TMA DES data is encoded in XML, the files that describe its syntax, structure, and semantics are not. The DTD format is used to describe the syntax and structure of TMA DES, and the ISO 11179 format is used to define the semantics of TMA DES. However, XML Schema can be used in place of DTDs, and another XML encoded format, RDF, can be used in place of ISO 11179. Encoding all TMA DES data and metadata in XML would simplify the development and usage of programs which validate and parse TMA DES data. XML Schema has advantages over DTDs such as support for data types, and a more powerful means of specifying constraints on data values. An advantage of RDF encoded in XML over ISO 11179 is that XML defines rules for encoding data, whereas ISO 11179 does not. Materials and Methods: We created an XML Schema version of the TMA DES DTD. We wrote a program that converted ISO 11179 definitions to RDF encoded in XML, and used it to convert the TMA DES ISO 11179 definitions to RDF. Results: We validated a sample TMA DES XML file that was supplied with the publication that originally specified TMA DES using our XML Schema. We successfully validated the RDF produced by our ISO 11179 converter with the W3C RDF validation service. Conclusions: All TMA DES data could be encoded using XML, which simplifies its processing. XML Schema allows datatypes and valid value ranges to be specified for CDEs, which enables a wider range of error checking to be performed using XML Schemas than could be performed using DTDs. PMID:21969921

  18. The Concept of Business Model Scalability

    Lund, Morten; Nielsen, Christian

    2018-01-01

    -term pro table business. However, the main message of this article is that while providing a good value proposition may help the rm ‘get by’, the really successful businesses of today are those able to reach the sweet-spot of business model scalability. Design/Methodology/Approach: The article is based...... on a ve-year longitudinal action research project of over 90 companies that participated in the International Center for Innovation project aimed at building 10 global network-based business models. Findings: This article introduces and discusses the term scalability from a company-level perspective......Purpose: The purpose of the article is to de ne what scalable business models are. Central to the contemporary understanding of business models is the value proposition towards the customer and the hypotheses generated about delivering value to the customer which become a good foundation for a long...

  19. Declarative and Scalable Selection for Map Visualizations

    Kefaloukos, Pimin Konstantin Balic

    and is itself a source and cause of prolific data creation. This calls for scalable map processing techniques that can handle the data volume and which play well with the predominant data models on the Web. (4) Maps are now consumed around the clock by a global audience. While historical maps were singleuser......-defined constraints as well as custom objectives. The purpose of the language is to derive a target multi-scale database from a source database according to holistic specifications. (b) The Glossy SQL compiler allows Glossy SQL to be scalably executed in a spatial analytics system, such as a spatial relational......, there are indications that the method is scalable for databases that contain millions of records, especially if the target language of the compiler is substituted by a cluster-ready variant of SQL. While several realistic use cases for maps have been implemented in CVL, additional non-geographic data visualization uses...

  20. Scalable Density-Based Subspace Clustering

    Müller, Emmanuel; Assent, Ira; Günnemann, Stephan

    2011-01-01

    For knowledge discovery in high dimensional databases, subspace clustering detects clusters in arbitrary subspace projections. Scalability is a crucial issue, as the number of possible projections is exponential in the number of dimensions. We propose a scalable density-based subspace clustering...... method that steers mining to few selected subspace clusters. Our novel steering technique reduces subspace processing by identifying and clustering promising subspaces and their combinations directly. Thereby, it narrows down the search space while maintaining accuracy. Thorough experiments on real...... and synthetic databases show that steering is efficient and scalable, with high quality results. For future work, our steering paradigm for density-based subspace clustering opens research potential for speeding up other subspace clustering approaches as well....

  1. Mercury- Distributed Metadata Management, Data Discovery and Access System

    Palanisamy, Giri; Wilson, Bruce E.; Devarakonda, Ranjeet; Green, James M.

    2007-12-01

    Mercury is a federated metadata harvesting, search and retrieval tool based on both open source and ORNL- developed software. It was originally developed for NASA, and the Mercury development consortium now includes funding from NASA, USGS, and DOE. Mercury supports various metadata standards including XML, Z39.50, FGDC, Dublin-Core, Darwin-Core, EML, and ISO-19115 (under development). Mercury provides a single portal to information contained in disparate data management systems. It collects metadata and key data from contributing project servers distributed around the world and builds a centralized index. The Mercury search interfaces then allow the users to perform simple, fielded, spatial and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury supports various projects including: ORNL DAAC, NBII, DADDI, LBA, NARSTO, CDIAC, OCEAN, I3N, IAI, ESIP and ARM. The new Mercury system is based on a Service Oriented Architecture and supports various services such as Thesaurus Service, Gazetteer Web Service and UDDI Directory Services. This system also provides various search services including: RSS, Geo-RSS, OpenSearch, Web Services and Portlets. Other features include: Filtering and dynamic sorting of search results, book-markable search results, save, retrieve, and modify search criteria.

  2. Enhancing Scalability of Sparse Direct Methods

    Li, Xiaoye S.; Demmel, James; Grigori, Laura; Gu, Ming; Xia, Jianlin; Jardin, Steve; Sovinec, Carl; Lee, Lie-Quan

    2007-01-01

    TOPS is providing high-performance, scalable sparse direct solvers, which have had significant impacts on the SciDAC applications, including fusion simulation (CEMM), accelerator modeling (COMPASS), as well as many other mission-critical applications in DOE and elsewhere. Our recent developments have been focusing on new techniques to overcome scalability bottleneck of direct methods, in both time and memory. These include parallelizing symbolic analysis phase and developing linear-complexity sparse factorization methods. The new techniques will make sparse direct methods more widely usable in large 3D simulations on highly-parallel petascale computers

  3. Software performance and scalability a quantitative approach

    Liu, Henry H

    2009-01-01

    Praise from the Reviewers:"The practicality of the subject in a real-world situation distinguishes this book from othersavailable on the market."—Professor Behrouz Far, University of Calgary"This book could replace the computer organization texts now in use that every CS and CpEstudent must take. . . . It is much needed, well written, and thoughtful."—Professor Larry Bernstein, Stevens Institute of TechnologyA distinctive, educational text onsoftware performance and scalabilityThis is the first book to take a quantitative approach to the subject of software performance and scalability

  4. From Digital Disruption to Business Model Scalability

    Nielsen, Christian; Lund, Morten; Thomsen, Peter Poulsen

    2017-01-01

    This article discusses the terms disruption, digital disruption, business models and business model scalability. It illustrates how managers should be using these terms for the benefit of their business by developing business models capable of achieving exponentially increasing returns to scale...... will seldom lead to business model scalability capable of competing with digital disruption(s)....... as a response to digital disruption. A series of case studies illustrate that besides frequent existing messages in the business literature relating to the importance of creating agile businesses, both in growing and declining economies, as well as hard to copy value propositions or value propositions that take...

  5. Content-Aware Scalability-Type Selection for Rate Adaptation of Scalable Video

    Tekalp A Murat

    2007-01-01

    Full Text Available Scalable video coders provide different scaling options, such as temporal, spatial, and SNR scalabilities, where rate reduction by discarding enhancement layers of different scalability-type results in different kinds and/or levels of visual distortion depend on the content and bitrate. This dependency between scalability type, video content, and bitrate is not well investigated in the literature. To this effect, we first propose an objective function that quantifies flatness, blockiness, blurriness, and temporal jerkiness artifacts caused by rate reduction by spatial size, frame rate, and quantization parameter scaling. Next, the weights of this objective function are determined for different content (shot types and different bitrates using a training procedure with subjective evaluation. Finally, a method is proposed for choosing the best scaling type for each temporal segment that results in minimum visual distortion according to this objective function given the content type of temporal segments. Two subjective tests have been performed to validate the proposed procedure for content-aware selection of the best scalability type on soccer videos. Soccer videos scaled from 600 kbps to 100 kbps by the proposed content-aware selection of scalability type have been found visually superior to those that are scaled using a single scalability option over the whole sequence.

  6. Meta-Data Objects as the Basis for System Evolution

    Estrella, Florida; Tóth, N; Kovács, Z; Le Goff, J M; Clatchey, Richard Mc; Toth, Norbert; Kovacs, Zsolt; Goff, Jean-Marie Le

    2001-01-01

    One of the main factors driving object-oriented software development in the Web- age is the need for systems to evolve as user requirements change. A crucial factor in the creation of adaptable systems dealing with changing requirements is the suitability of the underlying technology in allowing the evolution of the system. A reflective system utilizes an open architecture where implicit system aspects are reified to become explicit first-class (meta-data) objects. These implicit system aspects are often fundamental structures which are inaccessible and immutable, and their reification as meta-data objects can serve as the basis for changes and extensions to the system, making it self- describing. To address the evolvability issue, this paper proposes a reflective architecture based on two orthogonal abstractions - model abstraction and information abstraction. In this architecture the modeling abstractions allow for the separation of the description meta-data from the system aspects they represent so that th...

  7. Statistical Data Processing with R – Metadata Driven Approach

    Rudi SELJAK

    2016-06-01

    Full Text Available In recent years the Statistical Office of the Republic of Slovenia has put a lot of effort into re-designing its statistical process. We replaced the classical stove-pipe oriented production system with general software solutions, based on the metadata driven approach. This means that one general program code, which is parametrized with process metadata, is used for data processing for a particular survey. Currently, the general program code is entirely based on SAS macros, but in the future we would like to explore how successfully statistical software R can be used for this approach. Paper describes the metadata driven principle for data validation, generic software solution and main issues connected with the use of statistical software R for this approach.

  8. A Generic Metadata Editor Supporting System Using Drupal CMS

    Pan, J.; Banks, N. G.; Leggott, M.

    2011-12-01

    Metadata handling is a key factor in preserving and reusing scientific data. In recent years, standardized structural metadata has become widely used in Geoscience communities. However, there exist many different standards in Geosciences, such as the current version of the Federal Geographic Data Committee's Content Standard for Digital Geospatial Metadata (FGDC CSDGM), the Ecological Markup Language (EML), the Geography Markup Language (GML), and the emerging ISO 19115 and related standards. In addition, there are many different subsets within the Geoscience subdomain such as the Biological Profile of the FGDC (CSDGM), or for geopolitical regions, such as the European Profile or the North American Profile in the ISO standards. It is therefore desirable to have a software foundation to support metadata creation and editing for multiple standards and profiles, without re-inventing the wheels. We have developed a software module as a generic, flexible software system to do just that: to facilitate the support for multiple metadata standards and profiles. The software consists of a set of modules for the Drupal Content Management System (CMS), with minimal inter-dependencies to other Drupal modules. There are two steps in using the system's metadata functions. First, an administrator can use the system to design a user form, based on an XML schema and its instances. The form definition is named and stored in the Drupal database as a XML blob content. Second, users in an editor role can then use the persisted XML definition to render an actual metadata entry form, for creating or editing a metadata record. Behind the scenes, the form definition XML is transformed into a PHP array, which is then rendered via Drupal Form API. When the form is submitted the posted values are used to modify a metadata record. Drupal hooks can be used to perform custom processing on metadata record before and after submission. It is trivial to store the metadata record as an actual XML file

  9. In Interactive, Web-Based Approach to Metadata Authoring

    Pollack, Janine; Wharton, Stephen W. (Technical Monitor)

    2001-01-01

    NASA's Global Change Master Directory (GCMD) serves a growing number of users by assisting the scientific community in the discovery of and linkage to Earth science data sets and related services. The GCMD holds over 8000 data set descriptions in Directory Interchange Format (DIF) and 200 data service descriptions in Service Entry Resource Format (SERF), encompassing the disciplines of geology, hydrology, oceanography, meteorology, and ecology. Data descriptions also contain geographic coverage information, thus allowing researchers to discover data pertaining to a particular geographic location, as well as subject of interest. The GCMD strives to be the preeminent data locator for world-wide directory level metadata. In this vein, scientists and data providers must have access to intuitive and efficient metadata authoring tools. Existing GCMD tools are not currently attracting. widespread usage. With usage being the prime indicator of utility, it has become apparent that current tools must be improved. As a result, the GCMD has released a new suite of web-based authoring tools that enable a user to create new data and service entries, as well as modify existing data entries. With these tools, a more interactive approach to metadata authoring is taken, as they feature a visual "checklist" of data/service fields that automatically update when a field is completed. In this way, the user can quickly gauge which of the required and optional fields have not been populated. With the release of these tools, the Earth science community will be further assisted in efficiently creating quality data and services metadata. Keywords: metadata, Earth science, metadata authoring tools

  10. SM4AM: A Semantic Metamodel for Analytical Metadata

    Varga, Jovan; Romero, Oscar; Pedersen, Torben Bach

    2014-01-01

    Next generation BI systems emerge as platforms where traditional BI tools meet semi-structured and unstructured data coming from the Web. In these settings, the user-centric orientation represents a key characteristic for the acceptance and wide usage by numerous and diverse end users in their data....... We present SM4AM, a Semantic Metamodel for Analytical Metadata created as an RDF formalization of the Analytical Metadata artifacts needed for user assistance exploitation purposes in next generation BI systems. We consider the Linked Data initiative and its relevance for user assistance...

  11. The Benefits and Future of Standards: Metadata and Beyond

    Stracke, Christian M.

    This article discusses the benefits and future of standards and presents the generic multi-dimensional Reference Model. First the importance and the tasks of interoperability as well as quality development and their relationship are analyzed. Especially in e-Learning their connection and interdependence is evident: Interoperability is one basic requirement for quality development. In this paper, it is shown how standards and specifications are supporting these crucial issues. The upcoming ISO metadata standard MLR (Metadata for Learning Resource) will be introduced and used as example for identifying the requirements and needs for future standardization. In conclusion a vision of the challenges and potentials for e-Learning standardization is outlined.

  12. Using scalable vector graphics to evolve art

    den Heijer, E.; Eiben, A. E.

    2016-01-01

    In this paper, we describe our investigations of the use of scalable vector graphics as a genotype representation in evolutionary art. We describe the technical aspects of using SVG in evolutionary art, and explain our custom, SVG specific operators initialisation, mutation and crossover. We perform

  13. Scalable fast multipole accelerated vortex methods

    Hu, Qi; Gumerov, Nail A.; Yokota, Rio; Barba, Lorena A.; Duraiswami, Ramani

    2014-01-01

    -node communication and load balance efficiently, with only a small parallel construction overhead. This algorithm can scale to large-sized clusters showing both strong and weak scalability. Careful error and timing trade-off analysis are also performed for the cutoff

  14. Scalable Domain Decomposed Monte Carlo Particle Transport

    O' Brien, Matthew Joseph [Univ. of California, Davis, CA (United States)

    2013-12-05

    In this dissertation, we present the parallel algorithms necessary to run domain decomposed Monte Carlo particle transport on large numbers of processors (millions of processors). Previous algorithms were not scalable, and the parallel overhead became more computationally costly than the numerical simulation.

  15. Scalable Open Source Smart Grid Simulator (SGSim)

    Ebeid, Emad Samuel Malki; Jacobsen, Rune Hylsberg; Stefanni, Francesco

    2017-01-01

    . This paper presents an open source smart grid simulator (SGSim). The simulator is based on open source SystemC Network Simulation Library (SCNSL) and aims to model scalable smart grid applications. SGSim has been tested under different smart grid scenarios that contain hundreds of thousands of households...

  16. Cooperative Scalable Moving Continuous Query Processing

    Li, Xiaohui; Karras, Panagiotis; Jensen, Christian S.

    2012-01-01

    of the global view and handle the majority of the workload. Meanwhile, moving clients, having basic memory and computation resources, handle small portions of the workload. This model is further enhanced by dynamic region allocation and grid size adjustment mechanisms that reduce the communication...... and computation cost for both servers and clients. An experimental study demonstrates that our approaches offer better scalability than competitors...

  17. Scalable optical switches for computing applications

    White, I.H.; Aw, E.T.; Williams, K.A.; Wang, Haibo; Wonfor, A.; Penty, R.V.

    2009-01-01

    A scalable photonic interconnection network architecture is proposed whereby a Clos network is populated with broadcast-and-select stages. This enables the efficient exploitation of an emerging class of photonic integrated switch fabric. A low distortion space switch technology based on recently

  18. Linked data for libraries, archives and museums how to clean, link and publish your metadata

    Hooland, Seth van

    2014-01-01

    This highly practical handbook teaches you how to unlock the value of your existing metadata through cleaning, reconciliation, enrichment and linking and how to streamline the process of new metadata creation. Libraries, archives and museums are facing up to the challenge of providing access to fast growing collections whilst managing cuts to budgets. Key to this is the creation, linking and publishing of good quality metadata as Linked Data that will allow their collections to be discovered, accessed and disseminated in a sustainable manner. This highly practical handbook teaches you how to unlock the value of your existing metadata through cleaning, reconciliation, enrichment and linking and how to streamline the process of new metadata creation. Metadata experts Seth van Hooland and Ruben Verborgh introduce the key concepts of metadata standards and Linked Data and how they can be practically applied to existing metadata, giving readers the tools and understanding to achieve maximum results with limited re...

  19. Dyniqx: a novel meta-search engine for metadata based cross search

    Zhu, Jianhan; Song, Dawei; Eisenstadt, Marc; Barladeanu, Cristi; Rüger, Stefan

    2008-01-01

    The effect of metadata in collection fusion has not been sufficiently studied. In response to this, we present a novel meta-search engine called Dyniqx for metadata based cross search. Dyniqx exploits the availability of metadata in academic search services such as PubMed and Google Scholar etc for fusing search results from heterogeneous search engines. In addition, metadata from these search engines are used for generating dynamic query controls such as sliders and tick boxes etc which are ...

  20. Fast processing of digital imaging and communications in medicine (DICOM) metadata using multiseries DICOM format

    Ismail, Mahmoud; Philbin, James

    2015-01-01

    The digital imaging and communications in medicine (DICOM) information model combines pixel data and its metadata in a single object. There are user scenarios that only need metadata manipulation, such as deidentification and study migration. Most picture archiving and communication system use a database to store and update the metadata rather than updating the raw DICOM files themselves. The multiseries DICOM (MSD) format separates metadata from pixel data and eliminates duplicate attributes...

  1. Collaborative Metadata Curation in Support of NASA Earth Science Data Stewardship

    Sisco, Adam W.; Bugbee, Kaylin; le Roux, Jeanne; Staton, Patrick; Freitag, Brian; Dixon, Valerie

    2018-01-01

    Growing collection of NASA Earth science data is archived and distributed by EOSDIS’s 12 Distributed Active Archive Centers (DAACs). Each collection and granule is described by a metadata record housed in the Common Metadata Repository (CMR). Multiple metadata standards are in use, and core elements of each are mapped to and from a common model – the Unified Metadata Model (UMM). Work done by the Analysis and Review of CMR (ARC) Team.

  2. Metadata capture in an electronic notebook: How to make it as simple as possible?

    Menzel, Julia

    2015-09-01

    Full Text Available In the last few years electronic laboratory notebooks (ELNs have become popular. ELNs offer the great possibility to capture metadata automatically. Due to the high documentation effort metadata documentation is neglected in science. To close the gap between good data documentation and high documentation effort for the scientists a first user-friendly solution to capture metadata in an easy way was developed.At first, different protocols for the Western Blot were collected within the Collaborative Research Center 1002 and analyzed. Together with existing metadata standards identified in a literature search a first version of the metadata scheme was developed. Secondly, the metadata scheme was customized for future users including the implementation of default values for automated metadata documentation.Twelve protocols for the Western Blot were used to construct one standard protocol with ten different experimental steps. Three already existing metadata standards were used as models to construct the first version of the metadata scheme consisting of 133 data fields in ten experimental steps. Through a revision with future users the final metadata scheme was shortened to 90 items in three experimental steps. Using individualized default values 51.1% of the metadata can be captured with present values in the ELN.This lowers the data documentation effort. At the same time, researcher could benefit by providing standardized metadata for data sharing and re-use.

  3. Alcohol Mixed with Energy Drink Use as an Event-Level Predictor of Physical and Verbal Aggression in Bar Conflicts.

    Miller, Kathleen E; Quigley, Brian M; Eliseo-Arras, Rebecca K; Ball, Natalie J

    2016-01-01

    Young adult use of alcohol mixed with caffeinated energy drinks (AmEDs) has been globally linked with increased odds of interpersonal aggression, compared with the use of alcohol alone. However, no prior research has linked these behaviors at the event level in bar drinking situations. The present study assessed whether AmED use is associated with the perpetration of verbal and physical aggression in bar conflicts at the event level. In Fall 2014, a community sample of 175 young adult AmED users (55% female) completed a web survey describing a recent conflict experienced while drinking in a bar. Use of both AmED and non-AmED alcoholic drinks in the incident were assessed, allowing calculation of our main predictor variable, the proportion of AmEDs consumed (AmED/total drinks consumed). To measure perpetration of aggression, participants reported on the occurrence of 6 verbal and 6 physical acts during the bar conflict incident. Linear regression analyses showed that the proportion of AmEDs consumed predicted scores for perpetration of both verbal aggression (β = 0.16, p bar environments, and total number of drinks. Results of this study suggest that in alcohol-related bar conflicts, higher levels of young adult AmED use are associated with higher levels of aggression perpetration than alcohol use alone and that the elevated risk is not attributable to individual differences between AmED users and nonusers or to contextual differences in bar drinking settings. While future research is needed to identify motivations, dosages, and sequencing issues associated with AmED use, these beverages should be considered a potential risk factor in the escalation of aggressive bar conflicts. Copyright © 2016 by the Research Society on Alcoholism.

  4. Trident: scalable compute archives: workflows, visualization, and analysis

    Gopu, Arvind; Hayashi, Soichi; Young, Michael D.; Kotulla, Ralf; Henschel, Robert; Harbeck, Daniel

    2016-08-01

    The Astronomy scientific community has embraced Big Data processing challenges, e.g. associated with time-domain astronomy, and come up with a variety of novel and efficient data processing solutions. However, data processing is only a small part of the Big Data challenge. Efficient knowledge discovery and scientific advancement in the Big Data era requires new and equally efficient tools: modern user interfaces for searching, identifying and viewing data online without direct access to the data; tracking of data provenance; searching, plotting and analyzing metadata; interactive visual analysis, especially of (time-dependent) image data; and the ability to execute pipelines on supercomputing and cloud resources with minimal user overhead or expertise even to novice computing users. The Trident project at Indiana University offers a comprehensive web and cloud-based microservice software suite that enables the straight forward deployment of highly customized Scalable Compute Archive (SCA) systems; including extensive visualization and analysis capabilities, with minimal amount of additional coding. Trident seamlessly scales up or down in terms of data volumes and computational needs, and allows feature sets within a web user interface to be quickly adapted to meet individual project requirements. Domain experts only have to provide code or business logic about handling/visualizing their domain's data products and about executing their pipelines and application work flows. Trident's microservices architecture is made up of light-weight services connected by a REST API and/or a message bus; a web interface elements are built using NodeJS, AngularJS, and HighCharts JavaScript libraries among others while backend services are written in NodeJS, PHP/Zend, and Python. The software suite currently consists of (1) a simple work flow execution framework to integrate, deploy, and execute pipelines and applications (2) a progress service to monitor work flows and sub

  5. Metadata Harvesting in Regional Digital Libraries in the PIONIER Network

    Mazurek, Cezary; Stroinski, Maciej; Werla, Marcin; Weglarz, Jan

    2006-01-01

    Purpose: The paper aims to present the concept of the functionality of metadata harvesting for regional digital libraries, based on the OAI-PMH protocol. This functionality is a part of regional digital libraries platform created in Poland. The platform was required to reach one of main objectives of the Polish PIONIER Programme--to enrich the…

  6. Metadata Quality Improvement : DASISH deliverable 5.2A

    L'Hours, Hervé; Offersgaard, Lene; Wittenberg, M.; Wloka, Bartholomäus

    2014-01-01

    The aim of this task was to analyse and compare the different metadata strategies of CLARIN, DARIAH and CESSDA, and to identify possibilities of cross-fertilization to take profit from each other solutions where possible. To have a better understanding in which stages of the research lifecycle

  7. Competence Based Educational Metadata for Supporting Lifelong Competence Development Programmes

    Sampson, Demetrios; Fytros, Demetrios

    2008-01-01

    Sampson, D., & Fytros, D. (2008). Competence Based Educational Metadata for Supporting Lifelong Competence Development Programmes. In P. Diaz, Kinshuk, I. Aedo & E. Mora (Eds.), Proceedings of the 8th IEEE International Conference on Advanced Learning Technologies (ICALT 2008), pp. 288-292. July,

  8. Metadata Schema Used in OCLC Sampled Web Pages

    Fei Yu

    2005-12-01

    Full Text Available The tremendous growth of Web resources has made information organization and retrieval more and more difficult. As one approach to this problem, metadata schemas have been developed to characterize Web resources. However, many questions have been raised about the use of metadata schemas such as which metadata schemas have been used on the Web? How did they describe Web accessible information? What is the distribution of these metadata schemas among Web pages? Do certain schemas dominate the others? To address these issues, this study analyzed 16,383 Web pages with meta tags extracted from 200,000 OCLC sampled Web pages in 2000. It found that only 8.19% Web pages used meta tags; description tags, keyword tags, and Dublin Core tags were the only three schemas used in the Web pages. This article revealed the use of meta tags in terms of their function distribution, syntax characteristics, granularity of the Web pages, and the length distribution and word number distribution of both description and keywords tags.

  9. A metadata schema for data objects in clinical research.

    Canham, Steve; Ohmann, Christian

    2016-11-24

    A large number of stakeholders have accepted the need for greater transparency in clinical research and, in the context of various initiatives and systems, have developed a diverse and expanding number of repositories for storing the data and documents created by clinical studies (collectively known as data objects). To make the best use of such resources, we assert that it is also necessary for stakeholders to agree and deploy a simple, consistent metadata scheme. The relevant data objects and their likely storage are described, and the requirements for metadata to support data sharing in clinical research are identified. Issues concerning persistent identifiers, for both studies and data objects, are explored. A scheme is proposed that is based on the DataCite standard, with extensions to cover the needs of clinical researchers, specifically to provide (a) study identification data, including links to clinical trial registries; (b) data object characteristics and identifiers; and (c) data covering location, ownership and access to the data object. The components of the metadata scheme are described. The metadata schema is proposed as a natural extension of a widely agreed standard to fill a gap not tackled by other standards related to clinical research (e.g., Clinical Data Interchange Standards Consortium, Biomedical Research Integrated Domain Group). The proposal could be integrated with, but is not dependent on, other moves to better structure data in clinical research.

  10. Standardizing metadata and taxonomic identification in metabarcoding studies

    Tedersoo, Leho; Ramirez, Kelly; Nilsson, R; Kaljuvee, Aivi; Koljalg, Urmas; Abarenkov, Kessy

    2015-01-01

    High-throughput sequencing-based metabarcoding studies produce vast amounts of ecological data, but a lack of consensus on standardization of metadata and how to refer to the species recovered severely hampers reanalysis and comparisons among studies. Here we propose an automated workflow covering

  11. The evolution of chondrichthyan research through a metadata ...

    We compiled metadata from Sharks Down Under (1991) and the two Sharks International conferences (2010 and 2014), spanning 23 years. Analysis of the data highlighted taxonomic biases towards charismatic species, a declining number of studies in fundamental science such as those related to taxonomy and basic life ...

  12. Transforming and enhancing metadata for enduser discovery: a case study

    Edward M. Corrado

    2014-05-01

    The Libraries’ workflow and portions of code will be shared; issues and challenges involved will be discussed. While this case study is specific to Binghamton University Libraries, examples of strategies used at other institutions will also be introduced. This paper should be useful to anyone interested in describing large quantities of photographs or other materials with preexisting embedded metadata.

  13. Training and Best Practice Guidelines: Implications for Metadata Creation

    Chuttur, Mohammad Y.

    2012-01-01

    In response to the rapid development of digital libraries over the past decade, researchers have focused on the use of metadata as an effective means to support resource discovery within online repositories. With the increasing involvement of libraries in digitization projects and the growing number of institutional repositories, it is anticipated…

  14. MMI's Metadata and Vocabulary Solutions: 10 Years and Growing

    Graybeal, J.; Gayanilo, F.; Rueda-Velasquez, C. A.

    2014-12-01

    The Marine Metadata Interoperability project (http://marinemetadata.org) held its public opening at AGU's 2004 Fall Meeting. For 10 years since that debut, the MMI guidance and vocabulary sites have served over 100,000 visitors, with 525 community members and continuous Steering Committee leadership. Originally funded by the National Science Foundation, over the years multiple organizations have supported the MMI mission: "Our goal is to support collaborative research in the marine science domain, by simplifying the incredibly complex world of metadata into specific, straightforward guidance. MMI encourages scientists and data managers at all levels to apply good metadata practices from the start of a project, by providing the best guidance and resources for data management, and developing advanced metadata tools and services needed by the community." Now hosted by the Harte Research Institute at Texas A&M University at Corpus Christi, MMI continues to provide guidance and services to the community, and is planning for marine science and technology needs for the next 10 years. In this presentation we will highlight our major accomplishments, describe our recent achievements and imminent goals, and propose a vision for improving marine data interoperability for the next 10 years, including Ontology Registry and Repository (http://mmisw.org/orr) advancements and applications (http://mmisw.org/cfsn).

  15. Metafier - a Tool for Annotating and Structuring Building Metadata

    Holmegaard, Emil; Johansen, Aslak; Kjærgaard, Mikkel Baun

    2017-01-01

    in achieving this goal, but often they work as silos. Improving at scale the energy performance of buildings depends on applications breaking these silos and being portable among buildings. To enable portable building applications, the building instrumentation should be supported by a metadata layer...

  16. Big Earth Data Initiative: Metadata Improvement: Case Studies

    Kozimor, John; Habermann, Ted; Farley, John

    2016-01-01

    Big Earth Data Initiative (BEDI) The Big Earth Data Initiative (BEDI) invests in standardizing and optimizing the collection, management and delivery of U.S. Government's civil Earth observation data to improve discovery, access use, and understanding of Earth observations by the broader user community. Complete and consistent standard metadata helps address all three goals.

  17. Provenance metadata gathering and cataloguing of EFIT++ code execution

    Lupelli, I.; Muir, D.G.; Appel, L.; Akers, R.; Carr, M.; Abreu, P.

    2015-01-01

    Highlights: • An approach for automatic gathering of provenance metadata has been presented. • A provenance metadata catalogue has been created. • The overhead in the code runtime is less than 10%. • The metadata/data size ratio is about ∼20%. • A visualization interface based on Gephi, has been presented. - Abstract: Journal publications, as the final product of research activity, are the result of an extensive complex modeling and data analysis effort. It is of paramount importance, therefore, to capture the origins and derivation of the published data in order to achieve high levels of scientific reproducibility, transparency, internal and external data reuse and dissemination. The consequence of the modern research paradigm is that high performance computing and data management systems, together with metadata cataloguing, have become crucial elements within the nuclear fusion scientific data lifecycle. This paper describes an approach to the task of automatically gathering and cataloguing provenance metadata, currently under development and testing at Culham Center for Fusion Energy. The approach is being applied to a machine-agnostic code that calculates the axisymmetric equilibrium force balance in tokamaks, EFIT++, as a proof of principle test. The proposed approach avoids any code instrumentation or modification. It is based on the observation and monitoring of input preparation, workflow and code execution, system calls, log file data collection and interaction with the version control system. Pre-processing, post-processing, and data export and storage are monitored during the code runtime. Input data signals are captured using a data distribution platform called IDAM. The final objective of the catalogue is to create a complete description of the modeling activity, including user comments, and the relationship between data output, the main experimental database and the execution environment. For an intershot or post-pulse analysis (∼1000

  18. Provenance metadata gathering and cataloguing of EFIT++ code execution

    Lupelli, I., E-mail: ivan.lupelli@ccfe.ac.uk [CCFE, Culham Science Centre, Abingdon, Oxon OX14 3DB (United Kingdom); Muir, D.G.; Appel, L.; Akers, R.; Carr, M. [CCFE, Culham Science Centre, Abingdon, Oxon OX14 3DB (United Kingdom); Abreu, P. [Instituto de Plasmas e Fusão Nuclear, Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisboa (Portugal)

    2015-10-15

    Highlights: • An approach for automatic gathering of provenance metadata has been presented. • A provenance metadata catalogue has been created. • The overhead in the code runtime is less than 10%. • The metadata/data size ratio is about ∼20%. • A visualization interface based on Gephi, has been presented. - Abstract: Journal publications, as the final product of research activity, are the result of an extensive complex modeling and data analysis effort. It is of paramount importance, therefore, to capture the origins and derivation of the published data in order to achieve high levels of scientific reproducibility, transparency, internal and external data reuse and dissemination. The consequence of the modern research paradigm is that high performance computing and data management systems, together with metadata cataloguing, have become crucial elements within the nuclear fusion scientific data lifecycle. This paper describes an approach to the task of automatically gathering and cataloguing provenance metadata, currently under development and testing at Culham Center for Fusion Energy. The approach is being applied to a machine-agnostic code that calculates the axisymmetric equilibrium force balance in tokamaks, EFIT++, as a proof of principle test. The proposed approach avoids any code instrumentation or modification. It is based on the observation and monitoring of input preparation, workflow and code execution, system calls, log file data collection and interaction with the version control system. Pre-processing, post-processing, and data export and storage are monitored during the code runtime. Input data signals are captured using a data distribution platform called IDAM. The final objective of the catalogue is to create a complete description of the modeling activity, including user comments, and the relationship between data output, the main experimental database and the execution environment. For an intershot or post-pulse analysis (∼1000

  19. A Metadata Standard for Hydroinformatic Data Conforming to International Standards

    Notay, Vikram; Carstens, Georg; Lehfeldt, Rainer

    2017-04-01

    The affordable availability of computing power and digital storage has been a boon for the scientific community. The hydroinformatics community has also benefitted from the so-called digital revolution, which has enabled the tackling of more and more complex physical phenomena using hydroinformatic models, instruments, sensors, etc. With models getting more and more complex, computational domains getting larger and the resolution of computational grids and measurement data getting finer, a large amount of data is generated and consumed in any hydroinformatics related project. The ubiquitous availability of internet also contributes to this phenomenon with data being collected through sensor networks connected to telecommunications networks and the internet long before the term Internet of Things existed. Although generally good, this exponential increase in the number of available datasets gives rise to the need to describe this data in a standardised way to not only be able to get a quick overview about the data but to also facilitate interoperability of data from different sources. The Federal Waterways Engineering and Research Institute (BAW) is a federal authority of the German Federal Ministry of Transport and Digital Infrastructure. BAW acts as a consultant for the safe and efficient operation of the German waterways. As part of its consultation role, BAW operates a number of physical and numerical models for sections of inland and marine waterways. In order to uniformly describe the data produced and consumed by these models throughout BAW and to ensure interoperability with other federal and state institutes on the one hand and with EU countries on the other, a metadata profile for hydroinformatic data has been developed at BAW. The metadata profile is composed in its entirety using the ISO 19115 international standard for metadata related to geographic information. Due to the widespread use of the ISO 19115 standard in the existing geodata infrastructure

  20. ONEMercury: Towards Automatic Annotation of Earth Science Metadata

    Tuarob, S.; Pouchard, L. C.; Noy, N.; Horsburgh, J. S.; Palanisamy, G.

    2012-12-01

    Earth sciences have become more data-intensive, requiring access to heterogeneous data collected from multiple places, times, and thematic scales. For example, research on climate change may involve exploring and analyzing observational data such as the migration of animals and temperature shifts across the earth, as well as various model-observation inter-comparison studies. Recently, DataONE, a federated data network built to facilitate access to and preservation of environmental and ecological data, has come to exist. ONEMercury has recently been implemented as part of the DataONE project to serve as a portal for discovering and accessing environmental and observational data across the globe. ONEMercury harvests metadata from the data hosted by multiple data repositories and makes it searchable via a common search interface built upon cutting edge search engine technology, allowing users to interact with the system, intelligently filter the search results on the fly, and fetch the data from distributed data sources. Linking data from heterogeneous sources always has a cost. A problem that ONEMercury faces is the different levels of annotation in the harvested metadata records. Poorly annotated records tend to be missed during the search process as they lack meaningful keywords. Furthermore, such records would not be compatible with the advanced search functionality offered by ONEMercury as the interface requires a metadata record be semantically annotated. The explosion of the number of metadata records harvested from an increasing number of data repositories makes it impossible to annotate the harvested records manually, urging the need for a tool capable of automatically annotating poorly curated metadata records. In this paper, we propose a topic-model (TM) based approach for automatic metadata annotation. Our approach mines topics in the set of well annotated records and suggests keywords for poorly annotated records based on topic similarity. We utilize the

  1. ASDC Collaborations and Processes to Ensure Quality Metadata and Consistent Data Availability

    Trapasso, T. J.

    2017-12-01

    With the introduction of new tools, faster computing, and less expensive storage, increased volumes of data are expected to be managed with existing or fewer resources. Metadata management is becoming a heightened challenge from the increase in data volume, resulting in more metadata records needed to be curated for each product. To address metadata availability and completeness, NASA ESDIS has taken significant strides with the creation of the United Metadata Model (UMM) and Common Metadata Repository (CMR). These UMM helps address hurdles experienced by the increasing number of metadata dialects and the CMR provides a primary repository for metadata so that required metadata fields can be served through a growing number of tools and services. However, metadata quality remains an issue as metadata is not always inherent to the end-user. In response to these challenges, the NASA Atmospheric Science Data Center (ASDC) created the Collaboratory for quAlity Metadata Preservation (CAMP) and defined the Product Lifecycle Process (PLP) to work congruently. CAMP is unique in that it provides science team members a UI to directly supply metadata that is complete, compliant, and accurate for their data products. This replaces back-and-forth communication that often results in misinterpreted metadata. Upon review by ASDC staff, metadata is submitted to CMR for broader distribution through Earthdata. Further, approval of science team metadata in CAMP automatically triggers the ASDC PLP workflow to ensure appropriate services are applied throughout the product lifecycle. This presentation will review the design elements of CAMP and PLP as well as demonstrate interfaces to each. It will show the benefits that CAMP and PLP provide to the ASDC that could potentially benefit additional NASA Earth Science Data and Information System (ESDIS) Distributed Active Archive Centers (DAACs).

  2. GSKY: A scalable distributed geospatial data server on the cloud

    Rozas Larraondo, Pablo; Pringle, Sean; Antony, Joseph; Evans, Ben

    2017-04-01

    Earth systems, environmental and geophysical datasets are an extremely valuable sources of information about the state and evolution of the Earth. Being able to combine information coming from different geospatial collections is in increasing demand by the scientific community, and requires managing and manipulating data with different formats and performing operations such as map reprojections, resampling and other transformations. Due to the large data volume inherent in these collections, storing multiple copies of them is unfeasible and so such data manipulation must be performed on-the-fly using efficient, high performance techniques. Ideally this should be performed using a trusted data service and common system libraries to ensure wide use and reproducibility. Recent developments in distributed computing based on dynamic access to significant cloud infrastructure opens the door for such new ways of processing geospatial data on demand. The National Computational Infrastructure (NCI), hosted at the Australian National University (ANU), has over 10 Petabytes of nationally significant research data collections. Some of these collections, which comprise a variety of observed and modelled geospatial data, are now made available via a highly distributed geospatial data server, called GSKY (pronounced [jee-skee]). GSKY supports on demand processing of large geospatial data products such as satellite earth observation data as well as numerical weather products, allowing interactive exploration and analysis of the data. It dynamically and efficiently distributes the required computations among cloud nodes providing a scalable analysis framework that can adapt to serve large number of concurrent users. Typical geospatial workflows handling different file formats and data types, or blending data in different coordinate projections and spatio-temporal resolutions, is handled transparently by GSKY. This is achieved by decoupling the data ingestion and indexing process as

  3. Scalable Algorithms for Adaptive Statistical Designs

    Robert Oehmke

    2000-01-01

    Full Text Available We present a scalable, high-performance solution to multidimensional recurrences that arise in adaptive statistical designs. Adaptive designs are an important class of learning algorithms for a stochastic environment, and we focus on the problem of optimally assigning patients to treatments in clinical trials. While adaptive designs have significant ethical and cost advantages, they are rarely utilized because of the complexity of optimizing and analyzing them. Computational challenges include massive memory requirements, few calculations per memory access, and multiply-nested loops with dynamic indices. We analyze the effects of various parallelization options, and while standard approaches do not work well, with effort an efficient, highly scalable program can be developed. This allows us to solve problems thousands of times more complex than those solved previously, which helps make adaptive designs practical. Further, our work applies to many other problems involving neighbor recurrences, such as generalized string matching.

  4. Scalable Packet Classification with Hash Tables

    Wang, Pi-Chung

    In the last decade, the technique of packet classification has been widely deployed in various network devices, including routers, firewalls and network intrusion detection systems. In this work, we improve the performance of packet classification by using multiple hash tables. The existing hash-based algorithms have superior scalability with respect to the required space; however, their search performance may not be comparable to other algorithms. To improve the search performance, we propose a tuple reordering algorithm to minimize the number of accessed hash tables with the aid of bitmaps. We also use pre-computation to ensure the accuracy of our search procedure. Performance evaluation based on both real and synthetic filter databases shows that our scheme is effective and scalable and the pre-computation cost is moderate.

  5. Scalable fabrication of perovskite solar cells

    Li, Zhen; Klein, Talysa R.; Kim, Dong Hoe; Yang, Mengjin; Berry, Joseph J.; van Hest, Maikel F. A. M.; Zhu, Kai

    2018-03-27

    Perovskite materials use earth-abundant elements, have low formation energies for deposition and are compatible with roll-to-roll and other high-volume manufacturing techniques. These features make perovskite solar cells (PSCs) suitable for terawatt-scale energy production with low production costs and low capital expenditure. Demonstrations of performance comparable to that of other thin-film photovoltaics (PVs) and improvements in laboratory-scale cell stability have recently made scale up of this PV technology an intense area of research focus. Here, we review recent progress and challenges in scaling up PSCs and related efforts to enable the terawatt-scale manufacturing and deployment of this PV technology. We discuss common device and module architectures, scalable deposition methods and progress in the scalable deposition of perovskite and charge-transport layers. We also provide an overview of device and module stability, module-level characterization techniques and techno-economic analyses of perovskite PV modules.

  6. Scalable Atomistic Simulation Algorithms for Materials Research

    Aiichiro Nakano

    2002-01-01

    Full Text Available A suite of scalable atomistic simulation programs has been developed for materials research based on space-time multiresolution algorithms. Design and analysis of parallel algorithms are presented for molecular dynamics (MD simulations and quantum-mechanical (QM calculations based on the density functional theory. Performance tests have been carried out on 1,088-processor Cray T3E and 1,280-processor IBM SP3 computers. The linear-scaling algorithms have enabled 6.44-billion-atom MD and 111,000-atom QM calculations on 1,024 SP3 processors with parallel efficiency well over 90%. production-quality programs also feature wavelet-based computational-space decomposition for adaptive load balancing, spacefilling-curve-based adaptive data compression with user-defined error bound for scalable I/O, and octree-based fast visibility culling for immersive and interactive visualization of massive simulation data.

  7. Scalable manufacturing processes with soft materials

    White, Edward; Case, Jennifer; Kramer, Rebecca

    2014-01-01

    The emerging field of soft robotics will benefit greatly from new scalable manufacturing techniques for responsive materials. Currently, most of soft robotic examples are fabricated one-at-a-time, using techniques borrowed from lithography and 3D printing to fabricate molds. This limits both the maximum and minimum size of robots that can be fabricated, and hinders batch production, which is critical to gain wider acceptance for soft robotic systems. We have identified electrical structures, ...

  8. Architecture Knowledge for Evaluating Scalable Databases

    2015-01-16

    Architecture Knowledge for Evaluating Scalable Databases 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) Nurgaliev... Scala , Erlang, Javascript Cursor-based queries Supported, Not Supported JOIN queries Supported, Not Supported Complex data types Lists, maps, sets...is therefore needed, using technology such as machine learning to extract content from product documentation. The terminology used in the database

  9. Randomized Algorithms for Scalable Machine Learning

    Kleiner, Ariel Jacob

    2012-01-01

    Many existing procedures in machine learning and statistics are computationally intractable in the setting of large-scale data. As a result, the advent of rapidly increasing dataset sizes, which should be a boon yielding improved statistical performance, instead severely blunts the usefulness of a variety of existing inferential methods. In this work, we use randomness to ameliorate this lack of scalability by reducing complex, computationally difficult inferential problems to larger sets o...

  10. Bitcoin-NG: A Scalable Blockchain Protocol

    Eyal, Ittay; Gencer, Adem Efe; Sirer, Emin Gun; van Renesse, Robbert

    2015-01-01

    Cryptocurrencies, based on and led by Bitcoin, have shown promise as infrastructure for pseudonymous online payments, cheap remittance, trustless digital asset exchange, and smart contracts. However, Bitcoin-derived blockchain protocols have inherent scalability limits that trade-off between throughput and latency and withhold the realization of this potential. This paper presents Bitcoin-NG, a new blockchain protocol designed to scale. Based on Bitcoin's blockchain protocol, Bitcoin-NG is By...

  11. Scuba: scalable kernel-based gene prioritization.

    Zampieri, Guido; Tran, Dinh Van; Donini, Michele; Navarin, Nicolò; Aiolli, Fabio; Sperduti, Alessandro; Valle, Giorgio

    2018-01-25

    The uncovering of genes linked to human diseases is a pressing challenge in molecular biology and precision medicine. This task is often hindered by the large number of candidate genes and by the heterogeneity of the available information. Computational methods for the prioritization of candidate genes can help to cope with these problems. In particular, kernel-based methods are a powerful resource for the integration of heterogeneous biological knowledge, however, their practical implementation is often precluded by their limited scalability. We propose Scuba, a scalable kernel-based method for gene prioritization. It implements a novel multiple kernel learning approach, based on a semi-supervised perspective and on the optimization of the margin distribution. Scuba is optimized to cope with strongly unbalanced settings where known disease genes are few and large scale predictions are required. Importantly, it is able to efficiently deal both with a large amount of candidate genes and with an arbitrary number of data sources. As a direct consequence of scalability, Scuba integrates also a new efficient strategy to select optimal kernel parameters for each data source. We performed cross-validation experiments and simulated a realistic usage setting, showing that Scuba outperforms a wide range of state-of-the-art methods. Scuba achieves state-of-the-art performance and has enhanced scalability compared to existing kernel-based approaches for genomic data. This method can be useful to prioritize candidate genes, particularly when their number is large or when input data is highly heterogeneous. The code is freely available at https://github.com/gzampieri/Scuba .

  12. A scalable distributed RRT for motion planning

    Jacobs, Sam Ade

    2013-05-01

    Rapidly-exploring Random Tree (RRT), like other sampling-based motion planning methods, has been very successful in solving motion planning problems. Even so, sampling-based planners cannot solve all problems of interest efficiently, so attention is increasingly turning to parallelizing them. However, one challenge in parallelizing RRT is the global computation and communication overhead of nearest neighbor search, a key operation in RRTs. This is a critical issue as it limits the scalability of previous algorithms. We present two parallel algorithms to address this problem. The first algorithm extends existing work by introducing a parameter that adjusts how much local computation is done before a global update. The second algorithm radially subdivides the configuration space into regions, constructs a portion of the tree in each region in parallel, and connects the subtrees,i removing cycles if they exist. By subdividing the space, we increase computation locality enabling a scalable result. We show that our approaches are scalable. We present results demonstrating almost linear scaling to hundreds of processors on a Linux cluster and a Cray XE6 machine. © 2013 IEEE.

  13. DISP: Optimizations towards Scalable MPI Startup

    Fu, Huansong [Florida State University, Tallahassee; Pophale, Swaroop S [ORNL; Gorentla Venkata, Manjunath [ORNL; Yu, Weikuan [Florida State University, Tallahassee

    2016-01-01

    Despite the popularity of MPI for high performance computing, the startup of MPI programs faces a scalability challenge as both the execution time and memory consumption increase drastically at scale. We have examined this problem using the collective modules of Cheetah and Tuned in Open MPI as representative implementations. Previous improvements for collectives have focused on algorithmic advances and hardware off-load. In this paper, we examine the startup cost of the collective module within a communicator and explore various techniques to improve its efficiency and scalability. Accordingly, we have developed a new scalable startup scheme with three internal techniques, namely Delayed Initialization, Module Sharing and Prediction-based Topology Setup (DISP). Our DISP scheme greatly benefits the collective initialization of the Cheetah module. At the same time, it helps boost the performance of non-collective initialization in the Tuned module. We evaluate the performance of our implementation on Titan supercomputer at ORNL with up to 4096 processes. The results show that our delayed initialization can speed up the startup of Tuned and Cheetah by an average of 32.0% and 29.2%, respectively, our module sharing can reduce the memory consumption of Tuned and Cheetah by up to 24.1% and 83.5%, respectively, and our prediction-based topology setup can speed up the startup of Cheetah by up to 80%.

  14. A scalable distributed RRT for motion planning

    Jacobs, Sam Ade; Stradford, Nicholas; Rodriguez, Cesar; Thomas, Shawna; Amato, Nancy M.

    2013-01-01

    Rapidly-exploring Random Tree (RRT), like other sampling-based motion planning methods, has been very successful in solving motion planning problems. Even so, sampling-based planners cannot solve all problems of interest efficiently, so attention is increasingly turning to parallelizing them. However, one challenge in parallelizing RRT is the global computation and communication overhead of nearest neighbor search, a key operation in RRTs. This is a critical issue as it limits the scalability of previous algorithms. We present two parallel algorithms to address this problem. The first algorithm extends existing work by introducing a parameter that adjusts how much local computation is done before a global update. The second algorithm radially subdivides the configuration space into regions, constructs a portion of the tree in each region in parallel, and connects the subtrees,i removing cycles if they exist. By subdividing the space, we increase computation locality enabling a scalable result. We show that our approaches are scalable. We present results demonstrating almost linear scaling to hundreds of processors on a Linux cluster and a Cray XE6 machine. © 2013 IEEE.

  15. Scalable robotic biofabrication of tissue spheroids

    Mehesz, A Nagy; Hajdu, Z; Visconti, R P; Markwald, R R; Mironov, V; Brown, J; Beaver, W; Da Silva, J V L

    2011-01-01

    Development of methods for scalable biofabrication of uniformly sized tissue spheroids is essential for tissue spheroid-based bioprinting of large size tissue and organ constructs. The most recent scalable technique for tissue spheroid fabrication employs a micromolded recessed template prepared in a non-adhesive hydrogel, wherein the cells loaded into the template self-assemble into tissue spheroids due to gravitational force. In this study, we present an improved version of this technique. A new mold was designed to enable generation of 61 microrecessions in each well of a 96-well plate. The microrecessions were seeded with cells using an EpMotion 5070 automated pipetting machine. After 48 h of incubation, tissue spheroids formed at the bottom of each microrecession. To assess the quality of constructs generated using this technology, 600 tissue spheroids made by this method were compared with 600 spheroids generated by the conventional hanging drop method. These analyses showed that tissue spheroids fabricated by the micromolded method are more uniform in diameter. Thus, use of micromolded recessions in a non-adhesive hydrogel, combined with automated cell seeding, is a reliable method for scalable robotic fabrication of uniform-sized tissue spheroids.

  16. Scalable robotic biofabrication of tissue spheroids

    Mehesz, A Nagy; Hajdu, Z; Visconti, R P; Markwald, R R; Mironov, V [Advanced Tissue Biofabrication Center, Department of Regenerative Medicine and Cell Biology, Medical University of South Carolina, Charleston, SC (United States); Brown, J [Department of Mechanical Engineering, Clemson University, Clemson, SC (United States); Beaver, W [York Technical College, Rock Hill, SC (United States); Da Silva, J V L, E-mail: mironovv@musc.edu [Renato Archer Information Technology Center-CTI, Campinas (Brazil)

    2011-06-15

    Development of methods for scalable biofabrication of uniformly sized tissue spheroids is essential for tissue spheroid-based bioprinting of large size tissue and organ constructs. The most recent scalable technique for tissue spheroid fabrication employs a micromolded recessed template prepared in a non-adhesive hydrogel, wherein the cells loaded into the template self-assemble into tissue spheroids due to gravitational force. In this study, we present an improved version of this technique. A new mold was designed to enable generation of 61 microrecessions in each well of a 96-well plate. The microrecessions were seeded with cells using an EpMotion 5070 automated pipetting machine. After 48 h of incubation, tissue spheroids formed at the bottom of each microrecession. To assess the quality of constructs generated using this technology, 600 tissue spheroids made by this method were compared with 600 spheroids generated by the conventional hanging drop method. These analyses showed that tissue spheroids fabricated by the micromolded method are more uniform in diameter. Thus, use of micromolded recessions in a non-adhesive hydrogel, combined with automated cell seeding, is a reliable method for scalable robotic fabrication of uniform-sized tissue spheroids.

  17. Treating metadata as annotations: separating the content markup from the content

    Fredrik Paulsson

    2007-11-01

    Full Text Available The use of digital learning resources creates an increasing need for semantic metadata, describing the whole resource, as well as parts of resources. Traditionally, schemas such as Text Encoding Initiative (TEI have been used to add semantic markup for parts of resources. This is not sufficient for use in a ”metadata ecology”, where metadata is distributed, coherent to different Application Profiles, and added by different actors. A new methodology, where metadata is “pointed in” as annotations, using XPointers, and RDF is proposed. A suggestion for how such infrastructure can be implemented, using existing open standards for metadata, and for the web is presented. We argue that such methodology and infrastructure is necessary to realize the decentralized metadata infrastructure needed for a “metadata ecology".

  18. Numeric Analysis for Relationship-Aware Scalable Streaming Scheme

    Heung Ki Lee

    2014-01-01

    Full Text Available Frequent packet loss of media data is a critical problem that degrades the quality of streaming services over mobile networks. Packet loss invalidates frames containing lost packets and other related frames at the same time. Indirect loss caused by losing packets decreases the quality of streaming. A scalable streaming service can decrease the amount of dropped multimedia resulting from a single packet loss. Content providers typically divide one large media stream into several layers through a scalable streaming service and then provide each scalable layer to the user depending on the mobile network. Also, a scalable streaming service makes it possible to decode partial multimedia data depending on the relationship between frames and layers. Therefore, a scalable streaming service provides a way to decrease the wasted multimedia data when one packet is lost. However, the hierarchical structure between frames and layers of scalable streams determines the service quality of the scalable streaming service. Even if whole packets of layers are transmitted successfully, they cannot be decoded as a result of the absence of reference frames and layers. Therefore, the complicated relationship between frames and layers in a scalable stream increases the volume of abandoned layers. For providing a high-quality scalable streaming service, we choose a proper relationship between scalable layers as well as the amount of transmitted multimedia data depending on the network situation. We prove that a simple scalable scheme outperforms a complicated scheme in an error-prone network. We suggest an adaptive set-top box (AdaptiveSTB to lower the dependency between scalable layers in a scalable stream. Also, we provide a numerical model to obtain the indirect loss of multimedia data and apply it to various multimedia streams. Our AdaptiveSTB enhances the quality of a scalable streaming service by removing indirect loss.

  19. Scalable Earth-observation Analytics for Geoscientists: Spacetime Extensions to the Array Database SciDB

    Appel, Marius; Lahn, Florian; Pebesma, Edzer; Buytaert, Wouter; Moulds, Simon

    2016-04-01

    Today's amount of freely available data requires scientists to spend large parts of their work on data management. This is especially true in environmental sciences when working with large remote sensing datasets, such as obtained from earth-observation satellites like the Sentinel fleet. Many frameworks like SpatialHadoop or Apache Spark address the scalability but target programmers rather than data analysts, and are not dedicated to imagery or array data. In this work, we use the open-source data management and analytics system SciDB to bring large earth-observation datasets closer to analysts. Its underlying data representation as multidimensional arrays fits naturally to earth-observation datasets, distributes storage and computational load over multiple instances by multidimensional chunking, and also enables efficient time-series based analyses, which is usually difficult using file- or tile-based approaches. Existing interfaces to R and Python furthermore allow for scalable analytics with relatively little learning effort. However, interfacing SciDB and file-based earth-observation datasets that come as tiled temporal snapshots requires a lot of manual bookkeeping during ingestion, and SciDB natively only supports loading data from CSV-like and custom binary formatted files, which currently limits its practical use in earth-observation analytics. To make it easier to work with large multi-temporal datasets in SciDB, we developed software tools that enrich SciDB with earth observation metadata and allow working with commonly used file formats: (i) the SciDB extension library scidb4geo simplifies working with spatiotemporal arrays by adding relevant metadata to the database and (ii) the Geospatial Data Abstraction Library (GDAL) driver implementation scidb4gdal allows to ingest and export remote sensing imagery from and to a large number of file formats. Using added metadata on temporal resolution and coverage, the GDAL driver supports time-based ingestion of

  20. Scalable and balanced dynamic hybrid data assimilation

    Kauranne, Tuomo; Amour, Idrissa; Gunia, Martin; Kallio, Kari; Lepistö, Ahti; Koponen, Sampsa

    2017-04-01

    Scalability of complex weather forecasting suites is dependent on the technical tools available for implementing highly parallel computational kernels, but to an equally large extent also on the dependence patterns between various components of the suite, such as observation processing, data assimilation and the forecast model. Scalability is a particular challenge for 4D variational assimilation methods that necessarily couple the forecast model into the assimilation process and subject this combination to an inherently serial quasi-Newton minimization process. Ensemble based assimilation methods are naturally more parallel, but large models force ensemble sizes to be small and that results in poor assimilation accuracy, somewhat akin to shooting with a shotgun in a million-dimensional space. The Variational Ensemble Kalman Filter (VEnKF) is an ensemble method that can attain the accuracy of 4D variational data assimilation with a small ensemble size. It achieves this by processing a Gaussian approximation of the current error covariance distribution, instead of a set of ensemble members, analogously to the Extended Kalman Filter EKF. Ensemble members are re-sampled every time a new set of observations is processed from a new approximation of that Gaussian distribution which makes VEnKF a dynamic assimilation method. After this a smoothing step is applied that turns VEnKF into a dynamic Variational Ensemble Kalman Smoother VEnKS. In this smoothing step, the same process is iterated with frequent re-sampling of the ensemble but now using past iterations as surrogate observations until the end result is a smooth and balanced model trajectory. In principle, VEnKF could suffer from similar scalability issues as 4D-Var. However, this can be avoided by isolating the forecast model completely from the minimization process by implementing the latter as a wrapper code whose only link to the model is calling for many parallel and totally independent model runs, all of them

  1. Content-aware network storage system supporting metadata retrieval

    Liu, Ke; Qin, Leihua; Zhou, Jingli; Nie, Xuejun

    2008-12-01

    Nowadays, content-based network storage has become the hot research spot of academy and corporation[1]. In order to solve the problem of hit rate decline causing by migration and achieve the content-based query, we exploit a new content-aware storage system which supports metadata retrieval to improve the query performance. Firstly, we extend the SCSI command descriptor block to enable system understand those self-defined query requests. Secondly, the extracted metadata is encoded by extensible markup language to improve the universality. Thirdly, according to the demand of information lifecycle management (ILM), we store those data in different storage level and use corresponding query strategy to retrieval them. Fourthly, as the file content identifier plays an important role in locating data and calculating block correlation, we use it to fetch files and sort query results through friendly user interface. Finally, the experiments indicate that the retrieval strategy and sort algorithm have enhanced the retrieval efficiency and precision.

  2. Leveraging Python to improve ebook metadata selection, ingest, and management

    Kelly Thompson

    2017-10-01

    Full Text Available Libraries face many challenges in managing descriptive metadata for ebooks, including quality control, completeness of coverage, and ongoing management. The recent emergence of library management systems that automatically provide descriptive metadata for e-resources activated in system knowledge bases means that ebook management models are moving toward both greater efficiency and more complex implementation and maintenance choices. Automated and data-driven processes for ebook management have always been desirable, but in the current environment, they become necessary. In addition to initial selection of a record source, automation can be applied to quality control processes and ongoing maintenance in order to keep manual, eyes-on work to a minimum while providing the best possible discovery and access. In this article, we describe how we are using Python scripts to address these challenges.

  3. Data Bookkeeping Service 3 - Providing event metadata in CMS

    Giffels, Manuel; Riley, Daniel

    2014-01-01

    The Data Bookkeeping Service 3 provides a catalog of event metadata for Monte Carlo and recorded data of the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) at CERN, Geneva. It comprises all necessary information for tracking datasets, their processing history and associations between runs, files and datasets, on a large scale of about $200,000$ datasets and more than $40$ million files, which adds up in around $700$ GB of metadata. The DBS is an essential part of the CMS Data Management and Workload Management (DMWM) systems, all kind of data-processing like Monte Carlo production, processing of recorded event data as well as physics analysis done by the users are heavily relying on the information stored in DBS.

  4. ATLAS Metadata Infrastructure Evolution for Run 2 and Beyond

    van Gemmeren, Peter; The ATLAS collaboration; Malon, David; Vaniachine, Alexandre

    2015-01-01

    ATLAS developed and employed for Run 1 of the Large Hadron Collider a sophisticated infrastructure for metadata handling in event processing jobs. This infrastructure profits from a rich feature set provided by the ATLAS execution control framework, including standardized interfaces and invocation mechanisms for tools and services, segregation of transient data stores with concomitant object lifetime management, and mechanisms for handling occurrences asynchronous to the control framework’s state machine transitions. This metadata infrastructure is evolving and being extended for Run 2 to allow its use and reuse in downstream physics analyses, analyses that may or may not utilize the ATLAS control framework. At the same time, multiprocessing versions of the control framework and the requirements of future multithreaded frameworks are leading to redesign of components that use an incident-handling approach to asynchrony. The increased use of scatter-gather architectures, both local and distributed, requires ...

  5. Indexing of ATLAS data management and analysis system metadata

    Grigoryeva, Maria; The ATLAS collaboration

    2017-01-01

    This manuscript is devoted to the development of the system to manage metainformation of modern HENP experiments. The main purpose of the system is to provide scientists with transparent access to the actual and historical metadata related to data analysis, processing and modeling. The system design addresses the following goals : providing a flexible and fast search for metadata on various combinations of keywords, generating aggregated reports, categorized according to selected parameters, such as the studied physical process, scientific topic, physical group, etc. The article presents the architecture of the developed indexing and search system, as well as the results of performance tests. The comparison of the query execution speed within the developed system and in case of querying the original relational databases showed that the developed system provides results faster. Also the new system allows much more complex search requests, than the original storages.

  6. Conditions and configuration metadata for the ATLAS experiment

    Gallas, E J; Pachal, K E; Tseng, J C L; Albrand, S; Fulachier, J; Lambert, F; Zhang, Q

    2012-01-01

    In the ATLAS experiment, a system called COMA (Conditions/Configuration Metadata for ATLAS), has been developed to make globally important run-level metadata more readily accessible. It is based on a relational database storing directly extracted, refined, reduced, and derived information from system-specific database sources as well as information from non-database sources. This information facilitates a variety of unique dynamic interfaces and provides information to enhance the functionality of other systems. This presentation will give an overview of the components of the COMA system, enumerate its diverse data sources, and give examples of some of the interfaces it facilitates. We list important principles behind COMA schema and interface design, and how features of these principles create coherence and eliminate redundancy among the components of the overall system. In addition, we elucidate how interface logging data has been used to refine COMA content and improve the value and performance of end-user reports and browsers.

  7. Conditions and configuration metadata for the ATLAS experiment

    Gallas, E J; Albrand, S; Fulachier, J; Lambert, F; Pachal, K E; Tseng, J C L; Zhang, Q

    2012-01-01

    In the ATLAS experiment, a system called COMA (Conditions/Configuration Metadata for ATLAS), has been developed to make globally important run-level metadata more readily accessible. It is based on a relational database storing directly extracted, refined, reduced, and derived information from system-specific database sources as well as information from non-database sources. This information facilitates a variety of unique dynamic interfaces and provides information to enhance the functionality of other systems. This presentation will give an overview of the components of the COMA system, enumerate its diverse data sources, and give examples of some of the interfaces it facilitates. We list important principles behind COMA schema and interface design, and how features of these principles create coherence and eliminate redundancy among the components of the overall system. In addition, we elucidate how interface logging data has been used to refine COMA content and improve the value and performance of end-user...

  8. A case for user-generated sensor metadata

    Nüst, Daniel

    2015-04-01

    Cheap and easy to use sensing technology and new developments in ICT towards a global network of sensors and actuators promise previously unthought of changes for our understanding of the environment. Large professional as well as amateur sensor networks exist, and they are used for specific yet diverse applications across domains such as hydrology, meteorology or early warning systems. However the impact this "abundance of sensors" had so far is somewhat disappointing. There is a gap between (community-driven) sensor networks that could provide very useful data and the users of the data. In our presentation, we argue this is due to a lack of metadata which allows determining the fitness of use of a dataset. Syntactic or semantic interoperability for sensor webs have made great progress and continue to be an active field of research, yet they often are quite complex, which is of course due to the complexity of the problem at hand. But still, we see the most generic information to determine fitness for use is a dataset's provenance, because it allows users to make up their own minds independently from existing classification schemes for data quality. In this work we will make the case how curated user-contributed metadata has the potential to improve this situation. This especially applies for scenarios in which an observed property is applicable in different domains, and for set-ups where the understanding about metadata concepts and (meta-)data quality differs between data provider and user. On the one hand a citizen does not understand the ISO provenance metadata. On the other hand a researcher might find issues in publicly accessible time series published by citizens, which the latter might not be aware of or care about. Because users will have to determine fitness for use for each application on their own anyway, we suggest an online collaboration platform for user-generated metadata based on an extremely simplified data model. In the most basic fashion

  9. Data Bookkeeping Service 3 - Providing Event Metadata in CMS

    Giffels, Manuel [CERN; Guo, Y. [Fermilab; Riley, Daniel [Cornell U.

    2014-01-01

    The Data Bookkeeping Service 3 provides a catalog of event metadata for Monte Carlo and recorded data of the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) at CERN, Geneva. It comprises all necessary information for tracking datasets, their processing history and associations between runs, files and datasets, on a large scale of about 200, 000 datasets and more than 40 million files, which adds up in around 700 GB of metadata. The DBS is an essential part of the CMS Data Management and Workload Management (DMWM) systems [1], all kind of data-processing like Monte Carlo production, processing of recorded event data as well as physics analysis done by the users are heavily relying on the information stored in DBS.

  10. An institutional repository initiative and issues concerning metadata

    BAYRAM, Özlem; ATILGAN, Doğan; ARSLANTEKİN, Sacit

    2006-01-01

    Ankara University has become one of the fist open access initiatives in Turkey. Ankara University Open Access Program (AUO) was formed as part of the Open Access project (http://acikarsiv.ankara.edu.tr ) and supported by the University with an example of an open access institutional repository. As for the further step, the system will require the metadata tools to enable international recognization. According to Budapest Open Access Initiative, as suggested two strategies for open access t...

  11. Standardizing metadata and taxonomic identification in metabarcoding studies.

    Tedersoo, Leho; Ramirez, Kelly S; Nilsson, R Henrik; Kaljuvee, Aivi; Kõljalg, Urmas; Abarenkov, Kessy

    2015-01-01

    High-throughput sequencing-based metabarcoding studies produce vast amounts of ecological data, but a lack of consensus on standardization of metadata and how to refer to the species recovered severely hampers reanalysis and comparisons among studies. Here we propose an automated workflow covering data submission, compression, storage and public access to allow easy data retrieval and inter-study communication. Such standardized and readily accessible datasets facilitate data management, taxonomic comparisons and compilation of global metastudies.

  12. Automated Atmospheric Composition Dataset Level Metadata Discovery. Difficulties and Surprises

    Strub, R. F.; Falke, S. R.; Kempler, S.; Fialkowski, E.; Goussev, O.; Lynnes, C.

    2015-12-01

    The Atmospheric Composition Portal (ACP) is an aggregator and curator of information related to remotely sensed atmospheric composition data and analysis. It uses existing tools and technologies and, where needed, enhances those capabilities to provide interoperable access, tools, and contextual guidance for scientists and value-adding organizations using remotely sensed atmospheric composition data. The initial focus is on Essential Climate Variables identified by the Global Climate Observing System - CH4, CO, CO2, NO2, O3, SO2 and aerosols. This poster addresses our efforts in building the ACP Data Table, an interface to help discover and understand remotely sensed data that are related to atmospheric composition science and applications. We harvested GCMD, CWIC, GEOSS metadata catalogs using machine to machine technologies - OpenSearch, Web Services. We also manually investigated the plethora of CEOS data providers portals and other catalogs where that data might be aggregated. This poster is our experience of the excellence, variety, and challenges we encountered.Conclusions:1.The significant benefits that the major catalogs provide are their machine to machine tools like OpenSearch and Web Services rather than any GUI usability improvements due to the large amount of data in their catalog.2.There is a trend at the large catalogs towards simulating small data provider portals through advanced services. 3.Populating metadata catalogs using ISO19115 is too complex for users to do in a consistent way, difficult to parse visually or with XML libraries, and too complex for Java XML binders like CASTOR.4.The ability to search for Ids first and then for data (GCMD and ECHO) is better for machine to machine operations rather than the timeouts experienced when returning the entire metadata entry at once. 5.Metadata harvest and export activities between the major catalogs has led to a significant amount of duplication. (This is currently being addressed) 6.Most (if not

  13. Improving Earth Science Metadata: Modernizing ncISO

    O'Brien, K.; Schweitzer, R.; Neufeld, D.; Burger, E. F.; Signell, R. P.; Arms, S. C.; Wilcox, K.

    2016-12-01

    ncISO is a package of tools developed at NOAA's National Center for Environmental Information (NCEI) that facilitates the generation of ISO 19115-2 metadata from NetCDF data sources. The tool currently exists in two iterations: a command line utility and a web-accessible service within the THREDDS Data Server (TDS). Several projects, including NOAA's Unified Access Framework (UAF), depend upon ncISO to generate the ISO-compliant metadata from their data holdings and use the resulting information to populate discovery tools such as NCEI's ESRI Geoportal and NOAA's data.noaa.gov CKAN system. In addition to generating ISO 19115-2 metadata, the tool calculates a rubric score based on how well the dataset follows the Attribute Conventions for Dataset Discovery (ACDD). The result of this rubric calculation, along with information about what has been included and what is missing is displayed in an HTML document generated by the ncISO software package. Recently ncISO has fallen behind in terms of supporting updates to conventions such updates to the ACDD. With the blessing of the original programmer, NOAA's UAF has been working to modernize the ncISO software base. In addition to upgrading ncISO to utilize version1.3 of the ACDD, we have been working with partners at Unidata and IOOS to unify the tool's code base. In essence, we are merging the command line capabilities into the same software that will now be used by the TDS service, allowing easier updates when conventions such as ACDD are updated in the future. In this presentation, we will discuss the work the UAF project has done to support updated conventions within ncISO, as well as describe how the updated tool is helping to improve metadata throughout the earth and ocean sciences.

  14. Fast processing of digital imaging and communications in medicine (DICOM) metadata using multiseries DICOM format.

    Ismail, Mahmoud; Philbin, James

    2015-04-01

    The digital imaging and communications in medicine (DICOM) information model combines pixel data and its metadata in a single object. There are user scenarios that only need metadata manipulation, such as deidentification and study migration. Most picture archiving and communication system use a database to store and update the metadata rather than updating the raw DICOM files themselves. The multiseries DICOM (MSD) format separates metadata from pixel data and eliminates duplicate attributes. This work promotes storing DICOM studies in MSD format to reduce the metadata processing time. A set of experiments are performed that update the metadata of a set of DICOM studies for deidentification and migration. The studies are stored in both the traditional single frame DICOM (SFD) format and the MSD format. The results show that it is faster to update studies' metadata in MSD format than in SFD format because the bulk data is separated in MSD and is not retrieved from the storage system. In addition, it is space efficient to store the deidentified studies in MSD format as it shares the same bulk data object with the original study. In summary, separation of metadata from pixel data using the MSD format provides fast metadata access and speeds up applications that process only the metadata.

  15. ORGANIZATION OF DIGITAL RESOURCES IN REPEC THROUGH REDIF METADATA

    Salvador Enrique Vazquez Moctezuma

    2018-06-01

    Full Text Available Introduction: The disciplinary repository RePEc (Research Papers in Economics provides access to a wide range of preprints, journal articles, books, book chapters and software about economic and administrative sciences. This repository adds bibliographic records produced by different universities, institutes, editors and authors that work collaboratively following the norms of the documentary organization. Objective: In this paper, mainly, we identify and analyze the functioning of RePEc, which includes the organization of the files, which is characterized using the protocol Guildford and metadata ReDIF (Research Documentation Information Format templates own for the documentary description. Methodology: Part of this research was studied theoretically in the literature; another part was carried out by observing a series of features visible on the RePEc website and in the archives of a journal that collaborates in this repository. Results: The repository is a decentralized collaborative project and it also provides several services derived from the metadata analysis. Conclusions: We conclude that the ReDIF templates and the Guildford communication protocol are key elements for organizing records in RePEc, and there is a similarity with the Dublin Core metadata

  16. Embedding Metadata and Other Semantics in Word Processing Documents

    Peter Sefton

    2009-10-01

    Full Text Available This paper describes a technique for embedding document metadata, and potentially other semantic references inline in word processing documents, which the authors have implemented with the help of a software development team. Several assumptions underly the approach; It must be available across computing platforms and work with both Microsoft Word (because of its user base and OpenOffice.org (because of its free availability. Further the application needs to be acceptable to and usable by users, so the initial implementation covers only small number of features, which will only be extended after user-testing. Within these constraints the system provides a mechanism for encoding not only simple metadata, but for inferring hierarchical relationships between metadata elements from a ‘flat’ word processing file.The paper includes links to open source code implementing the techniques as part of a broader suite of tools for academic writing. This addresses tools and software, semantic web and data curation, integrating curation into research workflows and will provide a platform for integrating work on ontologies, vocabularies and folksonomies into word processing tools.

  17. The ATLAS Eventlndex: data flow and inclusion of other metadata

    Barberis, D.; Cárdenas Zárate, S. E.; Favareto, A.; Fernandez Casani, A.; Gallas, E. J.; Garcia Montoro, C.; Gonzalez de la Hoz, S.; Hrivnac, J.; Malon, D.; Prokoshin, F.; Salt, J.; Sanchez, J.; Toebbicke, R.; Yuan, R.; ATLAS Collaboration

    2016-10-01

    The ATLAS EventIndex is the catalogue of the event-related metadata for the information collected from the ATLAS detector. The basic unit of this information is the event record, containing the event identification parameters, pointers to the files containing this event as well as trigger decision information. The main use case for the EventIndex is event picking, as well as data consistency checks for large production campaigns. The EventIndex employs the Hadoop platform for data storage and handling, as well as a messaging system for the collection of information. The information for the EventIndex is collected both at Tier-0, when the data are first produced, and from the Grid, when various types of derived data are produced. The EventIndex uses various types of auxiliary information from other ATLAS sources for data collection and processing: trigger tables from the condition metadata database (COMA), dataset information from the data catalogue AMI and the Rucio data management system and information on production jobs from the ATLAS production system. The ATLAS production system is also used for the collection of event information from the Grid jobs. EventIndex developments started in 2012 and in the middle of 2015 the system was commissioned and started collecting event metadata, as a part of ATLAS Distributed Computing operations.

  18. Metabolonote: A wiki-based database for managing hierarchical metadata of metabolome analyses

    Takeshi eAra

    2015-04-01

    Full Text Available Metabolomics—technology for comprehensive detection of small molecules in an organism—lags behind the other omics in terms of publication and dissemination of experimental data. Among the reasons for this are difficulty precisely recording information about complicated analytical experiments (metadata, existence of various databases with their own metadata descriptions, and low reusability of the published data, resulting in submitters (the researchers who generate the data being insufficiently motivated. To tackle these issues, we developed Metabolonote, a Semantic MediaWiki-based database designed specifically for managing metabolomic metadata. We also defined a metadata and data description format, called TogoMD, with an ID system that is required for unique access to each level of the tree-structured metadata such as study purpose, sample, analytical method, and data analysis. Separation of the management of metadata from that of data and permission to attach related information to the metadata provide advantages for submitters, readers, and database developers. The metadata are enriched with information such as links to comparable data, thereby functioning as a hub of related data resources. They also enhance not only readers' understanding and use of data, but also submitters' motivation to publish the data. The metadata are computationally shared among other systems via APIs, which facilitates the construction of novel databases by database developers. A permission system that allows publication of immature metadata and feedback from readers also helps submitters to improve their metadata. Hence, this aspect of Metabolonote, as a metadata preparation tool, is complementary to high-quality and persistent data repositories such as MetaboLights. A total of 808 metadata for analyzed data obtained from 35 biological species are published currently. Metabolonote and related tools are available free of cost at http://metabolonote.kazusa.or.jp/.

  19. Programming Scala Scalability = Functional Programming + Objects

    Wampler, Dean

    2009-01-01

    Learn how to be more productive with Scala, a new multi-paradigm language for the Java Virtual Machine (JVM) that integrates features of both object-oriented and functional programming. With this book, you'll discover why Scala is ideal for highly scalable, component-based applications that support concurrency and distribution. Programming Scala clearly explains the advantages of Scala as a JVM language. You'll learn how to leverage the wealth of Java class libraries to meet the practical needs of enterprise and Internet projects more easily. Packed with code examples, this book provides us

  20. Tip-Based Nanofabrication for Scalable Manufacturing

    Huan Hu

    2017-03-01

    Full Text Available Tip-based nanofabrication (TBN is a family of emerging nanofabrication techniques that use a nanometer scale tip to fabricate nanostructures. In this review, we first introduce the history of the TBN and the technology development. We then briefly review various TBN techniques that use different physical or chemical mechanisms to fabricate features and discuss some of the state-of-the-art techniques. Subsequently, we focus on those TBN methods that have demonstrated potential to scale up the manufacturing throughput. Finally, we discuss several research directions that are essential for making TBN a scalable nano-manufacturing technology.

  1. Tip-Based Nanofabrication for Scalable Manufacturing

    Hu, Huan; Somnath, Suhas

    2017-01-01

    Tip-based nanofabrication (TBN) is a family of emerging nanofabrication techniques that use a nanometer scale tip to fabricate nanostructures. Here in this review, we first introduce the history of the TBN and the technology development. We then briefly review various TBN techniques that use different physical or chemical mechanisms to fabricate features and discuss some of the state-of-the-art techniques. Subsequently, we focus on those TBN methods that have demonstrated potential to scale up the manufacturing throughput. Finally, we discuss several research directions that are essential for making TBN a scalable nano-manufacturing technology.

  2. Towards a Scalable, Biomimetic, Antibacterial Coating

    Dickson, Mary Nora

    Corneal afflictions are the second leading cause of blindness worldwide. When a corneal transplant is unavailable or contraindicated, an artificial cornea device is the only chance to save sight. Bacterial or fungal biofilm build up on artificial cornea devices can lead to serious complications including the need for systemic antibiotic treatment and even explantation. As a result, much emphasis has been placed on anti-adhesion chemical coatings and antibiotic leeching coatings. These methods are not long-lasting, and microorganisms can eventually circumvent these measures. Thus, I have developed a surface topographical antimicrobial coating. Various surface structures including rough surfaces, superhydrophobic surfaces, and the natural surfaces of insects' wings and sharks' skin are promising anti-biofilm candidates, however none meet the criteria necessary for implementation on the surface of an artificial cornea device. In this thesis I: 1) developed scalable fabrication protocols for a library of biomimetic nanostructure polymer surfaces 2) assessed the potential these for poly(methyl methacrylate) nanopillars to kill or prevent formation of biofilm by E. coli bacteria and species of Pseudomonas and Staphylococcus bacteria and improved upon a proposed mechanism for the rupture of Gram-negative bacterial cell walls 3) developed a scalable, commercially viable method for producing antibacterial nanopillars on a curved, PMMA artificial cornea device and 4) developed scalable fabrication protocols for implantation of antibacterial nanopatterned surfaces on the surfaces of thermoplastic polyurethane materials, commonly used in catheter tubings. This project constitutes a first step towards fabrication of the first entirely PMMA artificial cornea device. The major finding of this work is that by precisely controlling the topography of a polymer surface at the nano-scale, we can kill adherent bacteria and prevent biofilm formation of certain pathogenic bacteria

  3. Scalable Optical-Fiber Communication Networks

    Chow, Edward T.; Peterson, John C.

    1993-01-01

    Scalable arbitrary fiber extension network (SAFEnet) is conceptual fiber-optic communication network passing digital signals among variety of computers and input/output devices at rates from 200 Mb/s to more than 100 Gb/s. Intended for use with very-high-speed computers and other data-processing and communication systems in which message-passing delays must be kept short. Inherent flexibility makes it possible to match performance of network to computers by optimizing configuration of interconnections. In addition, interconnections made redundant to provide tolerance to faults.

  4. Scalable Tensor Factorizations with Missing Data

    Acar, Evrim; Dunlavy, Daniel M.; Kolda, Tamara G.

    2010-01-01

    of missing data, many important data sets will be discarded or improperly analyzed. Therefore, we need a robust and scalable approach for factorizing multi-way arrays (i.e., tensors) in the presence of missing data. We focus on one of the most well-known tensor factorizations, CANDECOMP/PARAFAC (CP...... is shown to successfully factor tensors with noise and up to 70% missing data. Moreover, our approach is significantly faster than the leading alternative and scales to larger problems. To show the real-world usefulness of CP-WOPT, we illustrate its applicability on a novel EEG (electroencephalogram...

  5. Scalable and Anonymous Group Communication with MTor

    Lin Dong

    2016-04-01

    Full Text Available This paper presents MTor, a low-latency anonymous group communication system. We construct MTor as an extension to Tor, allowing the construction of multi-source multicast trees on top of the existing Tor infrastructure. MTor does not depend on an external service to broker the group communication, and avoids central points of failure and trust. MTor’s substantial bandwidth savings and graceful scalability enable new classes of anonymous applications that are currently too bandwidth-intensive to be viable through traditional unicast Tor communication-e.g., group file transfer, collaborative editing, streaming video, and real-time audio conferencing.

  6. Grassmann Averages for Scalable Robust PCA

    Hauberg, Søren; Feragen, Aasa; Black, Michael J.

    2014-01-01

    As the collection of large datasets becomes increasingly automated, the occurrence of outliers will increase—“big data” implies “big outliers”. While principal component analysis (PCA) is often used to reduce the size of data, and scalable solutions exist, it is well-known that outliers can...... to vectors (subspaces) or elements of vectors; we focus on the latter and use a trimmed average. The resulting Trimmed Grassmann Average (TGA) is particularly appropriate for computer vision because it is robust to pixel outliers. The algorithm has low computational complexity and minimal memory requirements...

  7. Metadata Wizard: an easy-to-use tool for creating FGDC-CSDGM metadata for geospatial datasets in ESRI ArcGIS Desktop

    Ignizio, Drew A.; O'Donnell, Michael S.; Talbert, Colin B.

    2014-01-01

    Creating compliant metadata for scientific data products is mandated for all federal Geographic Information Systems professionals and is a best practice for members of the geospatial data community. However, the complexity of the The Federal Geographic Data Committee’s Content Standards for Digital Geospatial Metadata, the limited availability of easy-to-use tools, and recent changes in the ESRI software environment continue to make metadata creation a challenge. Staff at the U.S. Geological Survey Fort Collins Science Center have developed a Python toolbox for ESRI ArcDesktop to facilitate a semi-automated workflow to create and update metadata records in ESRI’s 10.x software. The U.S. Geological Survey Metadata Wizard tool automatically populates several metadata elements: the spatial reference, spatial extent, geospatial presentation format, vector feature count or raster column/row count, native system/processing environment, and the metadata creation date. Once the software auto-populates these elements, users can easily add attribute definitions and other relevant information in a simple Graphical User Interface. The tool, which offers a simple design free of esoteric metadata language, has the potential to save many government and non-government organizations a significant amount of time and costs by facilitating the development of The Federal Geographic Data Committee’s Content Standards for Digital Geospatial Metadata compliant metadata for ESRI software users. A working version of the tool is now available for ESRI ArcDesktop, version 10.0, 10.1, and 10.2 (downloadable at http:/www.sciencebase.gov/metadatawizard).

  8. Data catalog project—A browsable, searchable, metadata system

    Stillerman, Joshua; Fredian, Thomas; Greenwald, Martin; Manduchi, Gabriele

    2016-01-01

    Modern experiments are typically conducted by large, extended groups, where researchers rely on other team members to produce much of the data they use. The experiments record very large numbers of measurements that can be difficult for users to find, access and understand. We are developing a system for users to annotate their data products with structured metadata, providing data consumers with a discoverable, browsable data index. Machine understandable metadata captures the underlying semantics of the recorded data, which can then be consumed by both programs, and interactively by users. Collaborators can use these metadata to select and understand recorded measurements. The data catalog project is a data dictionary and index which enables users to record general descriptive metadata, use cases and rendering information as well as providing them a transparent data access mechanism (URI). Users describe their diagnostic including references, text descriptions, units, labels, example data instances, author contact information and data access URIs. The list of possible attribute labels is extensible, but limiting the vocabulary of names increases the utility of the system. The data catalog is focused on the data products and complements process-based systems like the Metadata Ontology Provenance project [Greenwald, 2012; Schissel, 2015]. This system can be coupled with MDSplus to provide a simple platform for data driven display and analysis programs. Sites which use MDSplus can describe tree branches, and if desired create ‘processed data trees’ with homogeneous node structures for measurements. Sites not currently using MDSplus can either use the database to reference local data stores, or construct an MDSplus tree whose leaves reference the local data store. A data catalog system can provide a useful roadmap of data acquired from experiments or simulations making it easier for researchers to find and access important data and understand the meaning of the

  9. Data catalog project—A browsable, searchable, metadata system

    Stillerman, Joshua, E-mail: jas@psfc.mit.edu [MIT Plasma Science and Fusion Center, Cambridge, MA (United States); Fredian, Thomas; Greenwald, Martin [MIT Plasma Science and Fusion Center, Cambridge, MA (United States); Manduchi, Gabriele [Consorzio RFX, Euratom-ENEA Association, Corso Stati Uniti 4, Padova 35127 (Italy)

    2016-11-15

    Modern experiments are typically conducted by large, extended groups, where researchers rely on other team members to produce much of the data they use. The experiments record very large numbers of measurements that can be difficult for users to find, access and understand. We are developing a system for users to annotate their data products with structured metadata, providing data consumers with a discoverable, browsable data index. Machine understandable metadata captures the underlying semantics of the recorded data, which can then be consumed by both programs, and interactively by users. Collaborators can use these metadata to select and understand recorded measurements. The data catalog project is a data dictionary and index which enables users to record general descriptive metadata, use cases and rendering information as well as providing them a transparent data access mechanism (URI). Users describe their diagnostic including references, text descriptions, units, labels, example data instances, author contact information and data access URIs. The list of possible attribute labels is extensible, but limiting the vocabulary of names increases the utility of the system. The data catalog is focused on the data products and complements process-based systems like the Metadata Ontology Provenance project [Greenwald, 2012; Schissel, 2015]. This system can be coupled with MDSplus to provide a simple platform for data driven display and analysis programs. Sites which use MDSplus can describe tree branches, and if desired create ‘processed data trees’ with homogeneous node structures for measurements. Sites not currently using MDSplus can either use the database to reference local data stores, or construct an MDSplus tree whose leaves reference the local data store. A data catalog system can provide a useful roadmap of data acquired from experiments or simulations making it easier for researchers to find and access important data and understand the meaning of the

  10. ncISO Facilitating Metadata and Scientific Data Discovery

    Neufeld, D.; Habermann, T.

    2011-12-01

    Increasing the usability and availability climate and oceanographic datasets for environmental research requires improved metadata and tools to rapidly locate and access relevant information for an area of interest. Because of the distributed nature of most environmental geospatial data, a common approach is to use catalog services that support queries on metadata harvested from remote map and data services. A key component to effectively using these catalog services is the availability of high quality metadata associated with the underlying data sets. In this presentation, we examine the use of ncISO, and Geoportal as open source tools that can be used to document and facilitate access to ocean and climate data available from Thematic Realtime Environmental Distributed Data Services (THREDDS) data services. Many atmospheric and oceanographic spatial data sets are stored in the Network Common Data Format (netCDF) and served through the Unidata THREDDS Data Server (TDS). NetCDF and THREDDS are becoming increasingly accepted in both the scientific and geographic research communities as demonstrated by the recent adoption of netCDF as an Open Geospatial Consortium (OGC) standard. One important source for ocean and atmospheric based data sets is NOAA's Unified Access Framework (UAF) which serves over 3000 gridded data sets from across NOAA and NOAA-affiliated partners. Due to the large number of datasets, browsing the data holdings to locate data is impractical. Working with Unidata, we have created a new service for the TDS called "ncISO", which allows automatic generation of ISO 19115-2 metadata from attributes and variables in TDS datasets. The ncISO metadata records can be harvested by catalog services such as ESSI-labs GI-Cat catalog service, and ESRI's Geoportal which supports query through a number of services, including OpenSearch and Catalog Services for the Web (CSW). ESRI's Geoportal Server provides a number of user friendly search capabilities for end users

  11. Scalability Optimization of Seamless Positioning Service

    Juraj Machaj

    2016-01-01

    Full Text Available Recently positioning services are getting more attention not only within research community but also from service providers. From the service providers point of view positioning service that will be able to work seamlessly in all environments, for example, indoor, dense urban, and rural, has a huge potential to open new markets. However, such system does not only need to provide accurate position estimates but have to be scalable and resistant to fake positioning requests. In the previous works we have proposed a modular system, which is able to provide seamless positioning in various environments. The system automatically selects optimal positioning module based on available radio signals. The system currently consists of three positioning modules—GPS, GSM based positioning, and Wi-Fi based positioning. In this paper we will propose algorithm which will reduce time needed for position estimation and thus allow higher scalability of the modular system and thus allow providing positioning services to higher amount of users. Such improvement is extremely important, for real world application where large number of users will require position estimates, since positioning error is affected by response time of the positioning server.

  12. Highly scalable Ab initio genomic motif identification

    Marchand, Benoit; Bajic, Vladimir B.; Kaushik, Dinesh

    2011-01-01

    We present results of scaling an ab initio motif family identification system, Dragon Motif Finder (DMF), to 65,536 processor cores of IBM Blue Gene/P. DMF seeks groups of mutually similar polynucleotide patterns within a set of genomic sequences and builds various motif families from them. Such information is of relevance to many problems in life sciences. Prior attempts to scale such ab initio motif-finding algorithms achieved limited success. We solve the scalability issues using a combination of mixed-mode MPI-OpenMP parallel programming, master-slave work assignment, multi-level workload distribution, multi-level MPI collectives, and serial optimizations. While the scalability of our algorithm was excellent (94% parallel efficiency on 65,536 cores relative to 256 cores on a modest-size problem), the final speedup with respect to the original serial code exceeded 250,000 when serial optimizations are included. This enabled us to carry out many large-scale ab initio motiffinding simulations in a few hours while the original serial code would have needed decades of execution time. Copyright 2011 ACM.

  13. Algorithmic psychometrics and the scalable subject.

    Stark, Luke

    2018-04-01

    Recent public controversies, ranging from the 2014 Facebook 'emotional contagion' study to psychographic data profiling by Cambridge Analytica in the 2016 American presidential election, Brexit referendum and elsewhere, signal watershed moments in which the intersecting trajectories of psychology and computer science have become matters of public concern. The entangled history of these two fields grounds the application of applied psychological techniques to digital technologies, and an investment in applying calculability to human subjectivity. Today, a quantifiable psychological subject position has been translated, via 'big data' sets and algorithmic analysis, into a model subject amenable to classification through digital media platforms. I term this position the 'scalable subject', arguing it has been shaped and made legible by algorithmic psychometrics - a broad set of affordances in digital platforms shaped by psychology and the behavioral sciences. In describing the contours of this 'scalable subject', this paper highlights the urgent need for renewed attention from STS scholars on the psy sciences, and on a computational politics attentive to psychology, emotional expression, and sociality via digital media.

  14. Scalable Simulation of Electromagnetic Hybrid Codes

    Perumalla, Kalyan S.; Fujimoto, Richard; Karimabadi, Dr. Homa

    2006-01-01

    New discrete-event formulations of physics simulation models are emerging that can outperform models based on traditional time-stepped techniques. Detailed simulation of the Earth's magnetosphere, for example, requires execution of sub-models that are at widely differing timescales. In contrast to time-stepped simulation which requires tightly coupled updates to entire system state at regular time intervals, the new discrete event simulation (DES) approaches help evolve the states of sub-models on relatively independent timescales. However, parallel execution of DES-based models raises challenges with respect to their scalability and performance. One of the key challenges is to improve the computation granularity to offset synchronization and communication overheads within and across processors. Our previous work was limited in scalability and runtime performance due to the parallelization challenges. Here we report on optimizations we performed on DES-based plasma simulation models to improve parallel performance. The net result is the capability to simulate hybrid particle-in-cell (PIC) models with over 2 billion ion particles using 512 processors on supercomputing platforms

  15. Towards Scalable Graph Computation on Mobile Devices.

    Chen, Yiqi; Lin, Zhiyuan; Pienta, Robert; Kahng, Minsuk; Chau, Duen Horng

    2014-10-01

    Mobile devices have become increasingly central to our everyday activities, due to their portability, multi-touch capabilities, and ever-improving computational power. Such attractive features have spurred research interest in leveraging mobile devices for computation. We explore a novel approach that aims to use a single mobile device to perform scalable graph computation on large graphs that do not fit in the device's limited main memory, opening up the possibility of performing on-device analysis of large datasets, without relying on the cloud. Based on the familiar memory mapping capability provided by today's mobile operating systems, our approach to scale up computation is powerful and intentionally kept simple to maximize its applicability across the iOS and Android platforms. Our experiments demonstrate that an iPad mini can perform fast computation on large real graphs with as many as 272 million edges (Google+ social graph), at a speed that is only a few times slower than a 13″ Macbook Pro. Through creating a real world iOS app with this technique, we demonstrate the strong potential application for scalable graph computation on a single mobile device using our approach.

  16. Scalable fast multipole accelerated vortex methods

    Hu, Qi

    2014-05-01

    The fast multipole method (FMM) is often used to accelerate the calculation of particle interactions in particle-based methods to simulate incompressible flows. To evaluate the most time-consuming kernels - the Biot-Savart equation and stretching term of the vorticity equation, we mathematically reformulated it so that only two Laplace scalar potentials are used instead of six. This automatically ensuring divergence-free far-field computation. Based on this formulation, we developed a new FMM-based vortex method on heterogeneous architectures, which distributed the work between multicore CPUs and GPUs to best utilize the hardware resources and achieve excellent scalability. The algorithm uses new data structures which can dynamically manage inter-node communication and load balance efficiently, with only a small parallel construction overhead. This algorithm can scale to large-sized clusters showing both strong and weak scalability. Careful error and timing trade-off analysis are also performed for the cutoff functions induced by the vortex particle method. Our implementation can perform one time step of the velocity+stretching calculation for one billion particles on 32 nodes in 55.9 seconds, which yields 49.12 Tflop/s.

  17. Computational scalability of large size image dissemination

    Kooper, Rob; Bajcsy, Peter

    2011-01-01

    We have investigated the computational scalability of image pyramid building needed for dissemination of very large image data. The sources of large images include high resolution microscopes and telescopes, remote sensing and airborne imaging, and high resolution scanners. The term 'large' is understood from a user perspective which means either larger than a display size or larger than a memory/disk to hold the image data. The application drivers for our work are digitization projects such as the Lincoln Papers project (each image scan is about 100-150MB or about 5000x8000 pixels with the total number to be around 200,000) and the UIUC library scanning project for historical maps from 17th and 18th century (smaller number but larger images). The goal of our work is understand computational scalability of the web-based dissemination using image pyramids for these large image scans, as well as the preservation aspects of the data. We report our computational benchmarks for (a) building image pyramids to be disseminated using the Microsoft Seadragon library, (b) a computation execution approach using hyper-threading to generate image pyramids and to utilize the underlying hardware, and (c) an image pyramid preservation approach using various hard drive configurations of Redundant Array of Independent Disks (RAID) drives for input/output operations. The benchmarks are obtained with a map (334.61 MB, JPEG format, 17591x15014 pixels). The discussion combines the speed and preservation objectives.

  18. Towards Scalable Graph Computation on Mobile Devices

    Chen, Yiqi; Lin, Zhiyuan; Pienta, Robert; Kahng, Minsuk; Chau, Duen Horng

    2015-01-01

    Mobile devices have become increasingly central to our everyday activities, due to their portability, multi-touch capabilities, and ever-improving computational power. Such attractive features have spurred research interest in leveraging mobile devices for computation. We explore a novel approach that aims to use a single mobile device to perform scalable graph computation on large graphs that do not fit in the device's limited main memory, opening up the possibility of performing on-device analysis of large datasets, without relying on the cloud. Based on the familiar memory mapping capability provided by today's mobile operating systems, our approach to scale up computation is powerful and intentionally kept simple to maximize its applicability across the iOS and Android platforms. Our experiments demonstrate that an iPad mini can perform fast computation on large real graphs with as many as 272 million edges (Google+ social graph), at a speed that is only a few times slower than a 13″ Macbook Pro. Through creating a real world iOS app with this technique, we demonstrate the strong potential application for scalable graph computation on a single mobile device using our approach. PMID:25859564

  19. Big data integration: scalability and sustainability

    Zhang, Zhang

    2016-01-26

    Integration of various types of omics data is critically indispensable for addressing most important and complex biological questions. In the era of big data, however, data integration becomes increasingly tedious, time-consuming and expensive, posing a significant obstacle to fully exploit the wealth of big biological data. Here we propose a scalable and sustainable architecture that integrates big omics data through community-contributed modules. Community modules are contributed and maintained by different committed groups and each module corresponds to a specific data type, deals with data collection, processing and visualization, and delivers data on-demand via web services. Based on this community-based architecture, we build Information Commons for Rice (IC4R; http://ic4r.org), a rice knowledgebase that integrates a variety of rice omics data from multiple community modules, including genome-wide expression profiles derived entirely from RNA-Seq data, resequencing-based genomic variations obtained from re-sequencing data of thousands of rice varieties, plant homologous genes covering multiple diverse plant species, post-translational modifications, rice-related literatures, and community annotations. Taken together, such architecture achieves integration of different types of data from multiple community-contributed modules and accordingly features scalable, sustainable and collaborative integration of big data as well as low costs for database update and maintenance, thus helpful for building IC4R into a comprehensive knowledgebase covering all aspects of rice data and beneficial for both basic and translational researches.

  20. A Novel Architecture of Metadata Management System Based on Intelligent Cache

    SONG Baoyan; ZHAO Hongwei; WANG Yan; GAO Nan; XU Jin

    2006-01-01

    This paper introduces a novel architecture of metadata management system based on intelligent cache called Metadata Intelligent Cache Controller (MICC). By using an intelligent cache to control the metadata system, MICC can deal with different scenarios such as splitting and merging of queries into sub-queries for available metadata sets in local, in order to reduce access time of remote queries. Application can find results patially from local cache and the remaining portion of the metadata that can be fetched from remote locations. Using the existing metadata, it can not only enhance the fault tolerance and load balancing of system effectively, but also improve the efficiency of access while ensuring the access quality.

  1. An Assessment of the Evolving Common Metadata Repository Standards for Airborne Field Campaigns

    Northup, E. A.; Chen, G.; Early, A. B.; Beach, A. L., III; Walter, J.; Conover, H.

    2016-12-01

    The NASA Earth Venture Program has led to a dramatic increase in airborne observations, requiring updated data management practices with clearly defined data standards and protocols for metadata. While the current data management practices demonstrate some success in serving airborne science team data user needs, existing metadata models and standards such as NASA's Unified Metadata Model (UMM) for Collections (UMM-C) present challenges with respect to accommodating certain features of airborne science metadata. UMM is the model implemented in the Common Metadata Repository (CMR), which catalogs all metadata records for NASA's Earth Observing System Data and Information System (EOSDIS). One example of these challenges is with representation of spatial and temporal metadata. In addition, many airborne missions target a particular geophysical event, such as a developing hurricane. In such cases, metadata about the event is also important for understanding the data. While coverage of satellite missions is highly predictable based on orbit characteristics, airborne missions feature complicated flight patterns where measurements can be spatially and temporally discontinuous. Therefore, existing metadata models will need to be expanded for airborne measurements and sampling strategies. An Airborne Metadata Working Group was established under the auspices of NASA's Earth Science Data Systems Working Group (ESDSWG) to identify specific features of airborne metadata that can not be currently represented in the UMM and to develop new recommendations. The group includes representation from airborne data users and providers. This presentation will discuss the challenges and recommendations in an effort to demonstrate how airborne metadata curation/management can be improved to streamline data ingest and discoverability to a broader user community.

  2. Extraction of CT dose information from DICOM metadata: automated Matlab-based approach.

    Dave, Jaydev K; Gingold, Eric L

    2013-01-01

    The purpose of this study was to extract exposure parameters and dose-relevant indexes of CT examinations from information embedded in DICOM metadata. DICOM dose report files were identified and retrieved from a PACS. An automated software program was used to extract from these files information from the structured elements in the DICOM metadata relevant to exposure. Extracting information from DICOM metadata eliminated potential errors inherent in techniques based on optical character recognition, yielding 100% accuracy.

  3. Improving Access to NASA Earth Science Data through Collaborative Metadata Curation

    Sisco, A. W.; Bugbee, K.; Shum, D.; Baynes, K.; Dixon, V.; Ramachandran, R.

    2017-12-01

    The NASA-developed Common Metadata Repository (CMR) is a high-performance metadata system that currently catalogs over 375 million Earth science metadata records. It serves as the authoritative metadata management system of NASA's Earth Observing System Data and Information System (EOSDIS), enabling NASA Earth science data to be discovered and accessed by a worldwide user community. The size of the EOSDIS data archive is steadily increasing, and the ability to manage and query this archive depends on the input of high quality metadata to the CMR. Metadata that does not provide adequate descriptive information diminishes the CMR's ability to effectively find and serve data to users. To address this issue, an innovative and collaborative review process is underway to systematically improve the completeness, consistency, and accuracy of metadata for approximately 7,000 data sets archived by NASA's twelve EOSDIS data centers, or Distributed Active Archive Centers (DAACs). The process involves automated and manual metadata assessment of both collection and granule records by a team of Earth science data specialists at NASA Marshall Space Flight Center. The team communicates results to DAAC personnel, who then make revisions and reingest improved metadata into the CMR. Implementation of this process relies on a network of interdisciplinary collaborators leveraging a variety of communication platforms and long-range planning strategies. Curating metadata at this scale and resolving metadata issues through community consensus improves the CMR's ability to serve current and future users and also introduces best practices for stewarding the next generation of Earth Observing System data. This presentation will detail the metadata curation process, its outcomes thus far, and also share the status of ongoing curation activities.

  4. DESIGN AND PRACTICE ON METADATA SERVICE SYSTEM OF SURVEYING AND MAPPING RESULTS BASED ON GEONETWORK

    Z. Zha

    2012-08-01

    Full Text Available Based on the analysis and research on the current geographic information sharing and metadata service,we design, develop and deploy a distributed metadata service system based on GeoNetwork covering more than 30 nodes in provincial units of China.. By identifying the advantages of GeoNetwork, we design a distributed metadata service system of national surveying and mapping results. It consists of 31 network nodes, a central node and a portal. Network nodes are the direct system metadata source, and are distributed arround the country. Each network node maintains a metadata service system, responsible for metadata uploading and management. The central node harvests metadata from network nodes using OGC CSW 2.0.2 standard interface. The portal shows all metadata in the central node, provides users with a variety of methods and interface for metadata search or querying. It also provides management capabilities on connecting the central node and the network nodes together. There are defects with GeoNetwork too. Accordingly, we made improvement and optimization on big-amount metadata uploading, synchronization and concurrent access. For metadata uploading and synchronization, by carefully analysis the database and index operation logs, we successfully avoid the performance bottlenecks. And with a batch operation and dynamic memory management solution, data throughput and system performance are significantly improved; For concurrent access, , through a request coding and results cache solution, query performance is greatly improved. To smoothly respond to huge concurrent requests, a web cluster solution is deployed. This paper also gives an experiment analysis and compares the system performance before and after improvement and optimization. Design and practical results have been applied in national metadata service system of surveying and mapping results. It proved that the improved GeoNetwork service architecture can effectively adaptive for

  5. Flexible Authoring of Metadata for Learning : Assembling forms from a declarative data and view model

    Enoksson, Fredrik

    2011-01-01

    With the vast amount of information in various formats that is produced today it becomes necessary for consumers ofthis information to be able to judge if it is relevant for them. One way to enable that is to provide information abouteach piece of information, i.e. provide metadata. When metadata is to be edited by a human being, a metadata editorneeds to be provided. This thesis describes the design and practical use of a configuration mechanism for metadataeditors called annotation profiles...

  6. Defense Virtual Library: Technical Metadata for the Long-Term Management of Digital Materials: Preliminary Guidelines

    Flynn, Marcy

    2002-01-01

    ... of the digital materials being preserved. This report, prepared by Silver Image Management (SIM), proposes technical metadata elements appropriate for digital objects in the Defense Virtual Library...

  7. A Shared Infrastructure for Federated Search Across Distributed Scientific Metadata Catalogs

    Reed, S. A.; Truslove, I.; Billingsley, B. W.; Grauch, A.; Harper, D.; Kovarik, J.; Lopez, L.; Liu, M.; Brandt, M.

    2013-12-01

    The vast amount of science metadata can be overwhelming and highly complex. Comprehensive analysis and sharing of metadata is difficult since institutions often publish to their own repositories. There are many disjoint standards used for publishing scientific data, making it difficult to discover and share information from different sources. Services that publish metadata catalogs often have different protocols, formats, and semantics. The research community is limited by the exclusivity of separate metadata catalogs and thus it is desirable to have federated search interfaces capable of unified search queries across multiple sources. Aggregation of metadata catalogs also enables users to critique metadata more rigorously. With these motivations in mind, the National Snow and Ice Data Center (NSIDC) and Advanced Cooperative Arctic Data and Information Service (ACADIS) implemented two search interfaces for the community. Both the NSIDC Search and ACADIS Arctic Data Explorer (ADE) use a common infrastructure which keeps maintenance costs low. The search clients are designed to make OpenSearch requests against Solr, an Open Source search platform. Solr applies indexes to specific fields of the metadata which in this instance optimizes queries containing keywords, spatial bounds and temporal ranges. NSIDC metadata is reused by both search interfaces but the ADE also brokers additional sources. Users can quickly find relevant metadata with minimal effort and ultimately lowers costs for research. This presentation will highlight the reuse of data and code between NSIDC and ACADIS, discuss challenges and milestones for each project, and will identify creation and use of Open Source libraries.

  8. Improving Scientific Metadata Interoperability And Data Discoverability using OAI-PMH

    Devarakonda, Ranjeet; Palanisamy, Giri; Green, James M.; Wilson, Bruce E.

    2010-12-01

    While general-purpose search engines (such as Google or Bing) are useful for finding many things on the Internet, they are often of limited usefulness for locating Earth Science data relevant (for example) to a specific spatiotemporal extent. By contrast, tools that search repositories of structured metadata can locate relevant datasets with fairly high precision, but the search is limited to that particular repository. Federated searches (such as Z39.50) have been used, but can be slow and the comprehensiveness can be limited by downtime in any search partner. An alternative approach to improve comprehensiveness is for a repository to harvest metadata from other repositories, possibly with limits based on subject matter or access permissions. Searches through harvested metadata can be extremely responsive, and the search tool can be customized with semantic augmentation appropriate to the community of practice being served. However, there are a number of different protocols for harvesting metadata, with some challenges for ensuring that updates are propagated and for collaborations with repositories using differing metadata standards. The Open Archive Initiative Protocol for Metadata Handling (OAI-PMH) is a standard that is seeing increased use as a means for exchanging structured metadata. OAI-PMH implementations must support Dublin Core as a metadata standard, with other metadata formats as optional. We have developed tools which enable our structured search tool (Mercury; http://mercury.ornl.gov) to consume metadata from OAI-PMH services in any of the metadata formats we support (Dublin Core, Darwin Core, FCDC CSDGM, GCMD DIF, EML, and ISO 19115/19137). We are also making ORNL DAAC metadata available through OAI-PMH for other metadata tools to utilize, such as the NASA Global Change Master Directory, GCMD). This paper describes Mercury capabilities with multiple metadata formats, in general, and, more specifically, the results of our OAI-PMH implementations and

  9. The Theory and Implementation for Metadata in Digital Library/Museum

    Hsueh-hua Chen

    1998-12-01

    Full Text Available Digital Libraries and Museums (DL/M have become one of the important research issues of Library and Information Science as well as other related fields. This paper describes the basic concepts of DL/M and briefly introduces the development of Taiwan Digital Museum Project. Based on the features of various collections, wediscuss how to maintain, to manage and to exchange metadata, especially from the viewpoint of users. We propose the draft of metadata, MICI (Metadata Interchange for Chinese Information , developed by ROSS (Resources Organization and SearchingSpecification team. Finally, current problems and future development of metadata will be touched.[Article content in Chinese

  10. Scalable conditional induction variables (CIV) analysis

    Oancea, Cosmin Eugen; Rauchwerger, Lawrence

    2015-01-01

    parallelizing compiler and evaluated its impact on five Fortran benchmarks. We have found that that there are many important loops using CIV subscripts and that our analysis can lead to their scalable parallelization. This in turn has led to the parallelization of the benchmark programs they appear in.......Subscripts using induction variables that cannot be expressed as a formula in terms of the enclosing-loop indices appear in the low-level implementation of common programming abstractions such as filter, or stack operations and pose significant challenges to automatic parallelization. Because...... the complexity of such induction variables is often due to their conditional evaluation across the iteration space of loops we name them Conditional Induction Variables (CIV). This paper presents a flow-sensitive technique that summarizes both such CIV-based and affine subscripts to program level, using the same...

  11. Scalable Faceted Ranking in Tagging Systems

    Orlicki, José I.; Alvarez-Hamelin, J. Ignacio; Fierens, Pablo I.

    Nowadays, web collaborative tagging systems which allow users to upload, comment on and recommend contents, are growing. Such systems can be represented as graphs where nodes correspond to users and tagged-links to recommendations. In this paper we analyze the problem of computing a ranking of users with respect to a facet described as a set of tags. A straightforward solution is to compute a PageRank-like algorithm on a facet-related graph, but it is not feasible for online computation. We propose an alternative: (i) a ranking for each tag is computed offline on the basis of tag-related subgraphs; (ii) a faceted order is generated online by merging rankings corresponding to all the tags in the facet. Based on the graph analysis of YouTube and Flickr, we show that step (i) is scalable. We also present efficient algorithms for step (ii), which are evaluated by comparing their results with two gold standards.

  12. A graph algebra for scalable visual analytics.

    Shaverdian, Anna A; Zhou, Hao; Michailidis, George; Jagadish, Hosagrahar V

    2012-01-01

    Visual analytics (VA), which combines analytical techniques with advanced visualization features, is fast becoming a standard tool for extracting information from graph data. Researchers have developed many tools for this purpose, suggesting a need for formal methods to guide these tools' creation. Increased data demands on computing requires redesigning VA tools to consider performance and reliability in the context of analysis of exascale datasets. Furthermore, visual analysts need a way to document their analyses for reuse and results justification. A VA graph framework encapsulated in a graph algebra helps address these needs. Its atomic operators include selection and aggregation. The framework employs a visual operator and supports dynamic attributes of data to enable scalable visual exploration of data.

  13. Parallel scalability of Hartree-Fock calculations

    Chow, Edmond; Liu, Xing; Smelyanskiy, Mikhail; Hammond, Jeff R.

    2015-03-01

    Quantum chemistry is increasingly performed using large cluster computers consisting of multiple interconnected nodes. For a fixed molecular problem, the efficiency of a calculation usually decreases as more nodes are used, due to the cost of communication between the nodes. This paper empirically investigates the parallel scalability of Hartree-Fock calculations. The construction of the Fock matrix and the density matrix calculation are analyzed separately. For the former, we use a parallelization of Fock matrix construction based on a static partitioning of work followed by a work stealing phase. For the latter, we use density matrix purification from the linear scaling methods literature, but without using sparsity. When using large numbers of nodes for moderately sized problems, density matrix computations are network-bandwidth bound, making purification methods potentially faster than eigendecomposition methods.

  14. iSIGHT-FD scalability test report.

    Clay, Robert L.; Shneider, Max S.

    2008-07-01

    The engineering analysis community at Sandia National Laboratories uses a number of internal and commercial software codes and tools, including mesh generators, preprocessors, mesh manipulators, simulation codes, post-processors, and visualization packages. We define an analysis workflow as the execution of an ordered, logical sequence of these tools. Various forms of analysis (and in particular, methodologies that use multiple function evaluations or samples) involve executing parameterized variations of these workflows. As part of the DART project, we are evaluating various commercial workflow management systems, including iSIGHT-FD from Engineous. This report documents the results of a scalability test that was driven by DAKOTA and conducted on a parallel computer (Thunderbird). The purpose of this experiment was to examine the suitability and performance of iSIGHT-FD for large-scale, parameterized analysis workflows. As the results indicate, we found iSIGHT-FD to be suitable for this type of application.

  15. Scalable group level probabilistic sparse factor analysis

    Hinrich, Jesper Løve; Nielsen, Søren Føns Vind; Riis, Nicolai Andre Brogaard

    2017-01-01

    Many data-driven approaches exist to extract neural representations of functional magnetic resonance imaging (fMRI) data, but most of them lack a proper probabilistic formulation. We propose a scalable group level probabilistic sparse factor analysis (psFA) allowing spatially sparse maps, component...... pruning using automatic relevance determination (ARD) and subject specific heteroscedastic spatial noise modeling. For task-based and resting state fMRI, we show that the sparsity constraint gives rise to components similar to those obtained by group independent component analysis. The noise modeling...... shows that noise is reduced in areas typically associated with activation by the experimental design. The psFA model identifies sparse components and the probabilistic setting provides a natural way to handle parameter uncertainties. The variational Bayesian framework easily extends to more complex...

  16. Scalable on-chip quantum state tomography

    Titchener, James G.; Gräfe, Markus; Heilmann, René; Solntsev, Alexander S.; Szameit, Alexander; Sukhorukov, Andrey A.

    2018-03-01

    Quantum information systems are on a path to vastly exceed the complexity of any classical device. The number of entangled qubits in quantum devices is rapidly increasing, and the information required to fully describe these systems scales exponentially with qubit number. This scaling is the key benefit of quantum systems, however it also presents a severe challenge. To characterize such systems typically requires an exponentially long sequence of different measurements, becoming highly resource demanding for large numbers of qubits. Here we propose and demonstrate a novel and scalable method for characterizing quantum systems based on expanding a multi-photon state to larger dimensionality. We establish that the complexity of this new measurement technique only scales linearly with the number of qubits, while providing a tomographically complete set of data without a need for reconfigurability. We experimentally demonstrate an integrated photonic chip capable of measuring two- and three-photon quantum states with statistical reconstruction fidelity of 99.71%.

  17. A versatile scalable PET processing system

    Dong, H.; Weisenberger, A.; McKisson, J.; Wenze, Xi; Cuevas, C.; Wilson, J.; Zukerman, L.

    2011-01-01

    Positron Emission Tomography (PET) historically has major clinical and preclinical applications in cancerous oncology, neurology, and cardiovascular diseases. Recently, in a new direction, an application specific PET system is being developed at Thomas Jefferson National Accelerator Facility (Jefferson Lab) in collaboration with Duke University, University of Maryland at Baltimore (UMAB), and West Virginia University (WVU) targeted for plant eco-physiology research. The new plant imaging PET system is versatile and scalable such that it could adapt to several plant imaging needs - imaging many important plant organs including leaves, roots, and stems. The mechanical arrangement of the detectors is designed to accommodate the unpredictable and random distribution in space of the plant organs without requiring the plant be disturbed. Prototyping such a system requires a new data acquisition system (DAQ) and data processing system which are adaptable to the requirements of these unique and versatile detectors.

  18. The Concept of Business Model Scalability

    Nielsen, Christian; Lund, Morten

    2015-01-01

    The power of business models lies in their ability to visualize and clarify how firms’ may configure their value creation processes. Among the key aspects of business model thinking are a focus on what the customer values, how this value is best delivered to the customer and how strategic partners...... are leveraged in this value creation, delivery and realization exercise. Central to the mainstream understanding of business models is the value proposition towards the customer and the hypothesis generated is that if the firm delivers to the customer what he/she requires, then there is a good foundation...... for a long-term profitable business. However, the message conveyed in this article is that while providing a good value proposition may help the firm ‘get by’, the really successful businesses of today are those able to reach the sweet-spot of business model scalability. This article introduces and discusses...

  19. The scalable coherent interface, IEEE P1596

    Gustavson, D.B.

    1990-01-01

    IEEE P1596, the scalable coherent interface (formerly known as SuperBus) is based on experience gained while developing Fastbus (ANSI/IEEE 960--1986, IEC 935), Futurebus (IEEE P896.x) and other modern 32-bit buses. SCI goals include a minimum bandwidth of 1 GByte/sec per processor in multiprocessor systems with thousands of processors; efficient support of a coherent distributed-cache image of distributed shared memory; support for repeaters which interface to existing or future buses; and support for inexpensive small rings as well as for general switched interconnections like Banyan, Omega, or crossbar networks. This paper presents a summary of current directions, reports the status of the work in progress, and suggests some applications in data acquisition and physics

  20. BASSET: Scalable Gateway Finder in Large Graphs

    Tong, H; Papadimitriou, S; Faloutsos, C; Yu, P S; Eliassi-Rad, T

    2010-11-03

    Given a social network, who is the best person to introduce you to, say, Chris Ferguson, the poker champion? Or, given a network of people and skills, who is the best person to help you learn about, say, wavelets? The goal is to find a small group of 'gateways': persons who are close enough to us, as well as close enough to the target (person, or skill) or, in other words, are crucial in connecting us to the target. The main contributions are the following: (a) we show how to formulate this problem precisely; (b) we show that it is sub-modular and thus it can be solved near-optimally; (c) we give fast, scalable algorithms to find such gateways. Experiments on real data sets validate the effectiveness and efficiency of the proposed methods, achieving up to 6,000,000x speedup.

  1. Scalable quantum search using trapped ions

    Ivanov, S. S.; Ivanov, P. A.; Linington, I. E.; Vitanov, N. V.

    2010-01-01

    We propose a scalable implementation of Grover's quantum search algorithm in a trapped-ion quantum information processor. The system is initialized in an entangled Dicke state by using adiabatic techniques. The inversion-about-average and oracle operators take the form of single off-resonant laser pulses. This is made possible by utilizing the physical symmetries of the trapped-ion linear crystal. The physical realization of the algorithm represents a dramatic simplification: each logical iteration (oracle and inversion about average) requires only two physical interaction steps, in contrast to the large number of concatenated gates required by previous approaches. This not only facilitates the implementation but also increases the overall fidelity of the algorithm.

  2. Scalable graphene aptasensors for drug quantification

    Vishnubhotla, Ramya; Ping, Jinglei; Gao, Zhaoli; Lee, Abigail; Saouaf, Olivia; Vrudhula, Amey; Johnson, A. T. Charlie

    2017-11-01

    Simpler and more rapid approaches for therapeutic drug-level monitoring are highly desirable to enable use at the point-of-care. We have developed an all-electronic approach for detection of the HIV drug tenofovir based on scalable fabrication of arrays of graphene field-effect transistors (GFETs) functionalized with a commercially available DNA aptamer. The shift in the Dirac voltage of the GFETs varied systematically with the concentration of tenofovir in deionized water, with a detection limit less than 1 ng/mL. Tests against a set of negative controls confirmed the specificity of the sensor response. This approach offers the potential for further development into a rapid and convenient point-of-care tool with clinically relevant performance.

  3. Revision of IRIS/IDA Seismic Station Metadata

    Xu, W.; Davis, P.; Auerbach, D.; Klimczak, E.

    2017-12-01

    Trustworthy data quality assurance has always been one of the goals of seismic network operators and data management centers. This task is considerably complex and evolving due to the huge quantities as well as the rapidly changing characteristics and complexities of seismic data. Published metadata usually reflect instrument response characteristics and their accuracies, which includes zero frequency sensitivity for both seismometer and data logger as well as other, frequency-dependent elements. In this work, we are mainly focused studying the variation of the seismometer sensitivity with time of IRIS/IDA seismic recording systems with a goal to improve the metadata accuracy for the history of the network. There are several ways to measure the accuracy of seismometer sensitivity for the seismic stations in service. An effective practice recently developed is to collocate a reference seismometer in proximity to verify the in-situ sensors' calibration. For those stations with a secondary broadband seismometer, IRIS' MUSTANG metric computation system introduced a transfer function metric to reflect two sensors' gain ratios in the microseism frequency band. In addition, a simulation approach based on M2 tidal measurements has been proposed and proven to be effective. In this work, we compare and analyze the results from three different methods, and concluded that the collocated-sensor method is most stable and reliable with the minimum uncertainties all the time. However, for epochs without both the collocated sensor and secondary seismometer, we rely on the analysis results from tide method. For the data since 1992 on IDA stations, we computed over 600 revised seismometer sensitivities for all the IRIS/IDA network calibration epochs. Hopefully further revision procedures will help to guarantee that the data is accurately reflected by the metadata of these stations.

  4. Scalable Transactions for Web Applications in the Cloud

    Zhou, W.; Pierre, G.E.O.; Chi, C.-H.

    2009-01-01

    Cloud Computing platforms provide scalability and high availability properties for web applications but they sacrifice data consistency at the same time. However, many applications cannot afford any data inconsistency. We present a scalable transaction manager for NoSQL cloud database services to

  5. New Complexity Scalable MPEG Encoding Techniques for Mobile Applications

    Stephan Mietens

    2004-03-01

    Full Text Available Complexity scalability offers the advantage of one-time design of video applications for a large product family, including mobile devices, without the need of redesigning the applications on the algorithmic level to meet the requirements of the different products. In this paper, we present complexity scalable MPEG encoding having core modules with modifications for scalability. The interdependencies of the scalable modules and the system performance are evaluated. Experimental results show scalability giving a smooth change in complexity and corresponding video quality. Scalability is basically achieved by varying the number of computed DCT coefficients and the number of evaluated motion vectors but other modules are designed such they scale with the previous parameters. In the experiments using the “Stefan” sequence, the elapsed execution time of the scalable encoder, reflecting the computational complexity, can be gradually reduced to roughly 50% of its original execution time. The video quality scales between 20 dB and 48 dB PSNR with unity quantizer setting, and between 21.5 dB and 38.5 dB PSNR for different sequences targeting 1500 kbps. The implemented encoder and the scalability techniques can be successfully applied in mobile systems based on MPEG video compression.

  6. Building scalable apps with Redis and Node.js

    Johanan, Joshua

    2014-01-01

    If the phrase scalability sounds alien to you, then this is an ideal book for you. You will not need much Node.js experience as each framework is demonstrated in a way that requires no previous knowledge of the framework. You will be building scalable Node.js applications in no time! Knowledge of JavaScript is required.

  7. Fourier transform based scalable image quality measure.

    Narwaria, Manish; Lin, Weisi; McLoughlin, Ian; Emmanuel, Sabu; Chia, Liang-Tien

    2012-08-01

    We present a new image quality assessment (IQA) algorithm based on the phase and magnitude of the 2D (twodimensional) Discrete Fourier Transform (DFT). The basic idea is to compare the phase and magnitude of the reference and distorted images to compute the quality score. However, it is well known that the Human Visual Systems (HVSs) sensitivity to different frequency components is not the same. We accommodate this fact via a simple yet effective strategy of nonuniform binning of the frequency components. This process also leads to reduced space representation of the image thereby enabling the reduced-reference (RR) prospects of the proposed scheme. We employ linear regression to integrate the effects of the changes in phase and magnitude. In this way, the required weights are determined via proper training and hence more convincing and effective. Lastly, using the fact that phase usually conveys more information than magnitude, we use only the phase for RR quality assessment. This provides the crucial advantage of further reduction in the required amount of reference image information. The proposed method is therefore further scalable for RR scenarios. We report extensive experimental results using a total of 9 publicly available databases: 7 image (with a total of 3832 distorted images with diverse distortions) and 2 video databases (totally 228 distorted videos). These show that the proposed method is overall better than several of the existing fullreference (FR) algorithms and two RR algorithms. Additionally, there is a graceful degradation in prediction performance as the amount of reference image information is reduced thereby confirming its scalability prospects. To enable comparisons and future study, a Matlab implementation of the proposed algorithm is available at http://www.ntu.edu.sg/home/wslin/reduced_phase.rar.

  8. Improving diabetes medication adherence: successful, scalable interventions

    Zullig LL

    2015-01-01

    Full Text Available Leah L Zullig,1,2 Walid F Gellad,3,4 Jivan Moaddeb,2,5 Matthew J Crowley,1,2 William Shrank,6 Bradi B Granger,7 Christopher B Granger,8 Troy Trygstad,9 Larry Z Liu,10 Hayden B Bosworth1,2,7,11 1Center for Health Services Research in Primary Care, Durham Veterans Affairs Medical Center, Durham, NC, USA; 2Department of Medicine, Duke University, Durham, NC, USA; 3Center for Health Equity Research and Promotion, Pittsburgh Veterans Affairs Medical Center, Pittsburgh, PA, USA; 4Division of General Internal Medicine, University of Pittsburgh, Pittsburgh, PA, USA; 5Institute for Genome Sciences and Policy, Duke University, Durham, NC, USA; 6CVS Caremark Corporation; 7School of Nursing, Duke University, Durham, NC, USA; 8Department of Medicine, Division of Cardiology, Duke University School of Medicine, Durham, NC, USA; 9North Carolina Community Care Networks, Raleigh, NC, USA; 10Pfizer, Inc., and Weill Medical College of Cornell University, New York, NY, USA; 11Department of Psychiatry and Behavioral Sciences, Duke University School of Medicine, Durham, NC, USA Abstract: Effective medications are a cornerstone of prevention and disease treatment, yet only about half of patients take their medications as prescribed, resulting in a common and costly public health challenge for the US healthcare system. Since poor medication adherence is a complex problem with many contributing causes, there is no one universal solution. This paper describes interventions that were not only effective in improving medication adherence among patients with diabetes, but were also potentially scalable (ie, easy to implement to a large population. We identify key characteristics that make these interventions effective and scalable. This information is intended to inform healthcare systems seeking proven, low resource, cost-effective solutions to improve medication adherence. Keywords: medication adherence, diabetes mellitus, chronic disease, dissemination research

  9. A Metadata based Knowledge Discovery Methodology for Seeding Translational Research.

    Kothari, Cartik R; Payne, Philip R O

    2015-01-01

    In this paper, we present a semantic, metadata based knowledge discovery methodology for identifying teams of researchers from diverse backgrounds who can collaborate on interdisciplinary research projects: projects in areas that have been identified as high-impact areas at The Ohio State University. This methodology involves the semantic annotation of keywords and the postulation of semantic metrics to improve the efficiency of the path exploration algorithm as well as to rank the results. Results indicate that our methodology can discover groups of experts from diverse areas who can collaborate on translational research projects.

  10. Radiological dose and metadata management; Radiologisches Dosis- und Metadatenmanagement

    Walz, M.; Madsack, B. [TUeV SUeD Life Service GmbH, Aerztliche Stelle fuer Qualitaetssicherung in der Radiologie, Nuklearmedizin und Strahlentherapie Hessen, Frankfurt (Germany); Kolodziej, M. [INFINITT Europe GmbH, Frankfurt/M (Germany)

    2016-12-15

    This article describes the features of management systems currently available in Germany for extraction, registration and evaluation of metadata from radiological examinations, particularly in the digital imaging and communications in medicine (DICOM) environment. In addition, the probable relevant developments in this area concerning radiation protection legislation, terminology, standardization and information technology are presented. (orig.) [German] Dieser Artikel stellt die aktuell in Deutschland verfuegbaren Funktionen von Managementsystemen zur Erfassung und Auswertung von Metadaten zu radiologischen Untersuchungen insbesondere im DICOM-Umfeld (Digital Imaging and Communications in Medicine) vor. Ausserdem werden die in diesem Bereich voraussichtlich relevanten Entwicklungen von Strahlenschutzgesetzgebung ueber Terminologie und Standardisierung bis zu informationstechnischen Aspekten dargestellt. (orig.)

  11. Twenty-first century metadata operations challenges, opportunities, directions

    Lee Eden, Bradford

    2014-01-01

    It has long been apparent to academic library administrators that the current technical services operations within libraries need to be redirected and refocused in terms of both format priorities and human resources. A number of developments and directions have made this reorganization imperative, many of which have been accelerated by the current economic crisis. All of the chapters detail some aspect of technical services reorganization due to downsizing and/or reallocation of human resources, retooling professional and support staff in higher level duties and/or non-MARC metadata, ""value-a

  12. Generation of Multiple Metadata Formats from a Geospatial Data Repository

    Hudspeth, W. B.; Benedict, K. K.; Scott, S.

    2012-12-01

    The Earth Data Analysis Center (EDAC) at the University of New Mexico is partnering with the CYBERShARE and Environmental Health Group from the Center for Environmental Resource Management (CERM), located at the University of Texas, El Paso (UTEP), the Biodiversity Institute at the University of Kansas (KU), and the New Mexico Geo- Epidemiology Research Network (GERN) to provide a technical infrastructure that enables investigation of a variety of climate-driven human/environmental systems. Two significant goals of this NASA-funded project are: a) to increase the use of NASA Earth observational data at EDAC by various modeling communities through enabling better discovery, access, and use of relevant information, and b) to expose these communities to the benefits of provenance for improving understanding and usability of heterogeneous data sources and derived model products. To realize these goals, EDAC has leveraged the core capabilities of its Geographic Storage, Transformation, and Retrieval Engine (Gstore) platform, developed with support of the NSF EPSCoR Program. The Gstore geospatial services platform provides general purpose web services based upon the REST service model, and is capable of data discovery, access, and publication functions, metadata delivery functions, data transformation, and auto-generated OGC services for those data products that can support those services. Central to the NASA ACCESS project is the delivery of geospatial metadata in a variety of formats, including ISO 19115-2/19139, FGDC CSDGM, and the Proof Markup Language (PML). This presentation details the extraction and persistence of relevant metadata in the Gstore data store, and their transformation into multiple metadata formats that are increasingly utilized by the geospatial community to document not only core library catalog elements (e.g. title, abstract, publication data, geographic extent, projection information, and database elements), but also the processing steps used to

  13. Latest developments for the IAGOS database: Interoperability and metadata

    Boulanger, Damien; Gautron, Benoit; Thouret, Valérie; Schultz, Martin; van Velthoven, Peter; Broetz, Bjoern; Rauthe-Schöch, Armin; Brissebrat, Guillaume

    2014-05-01

    In-service Aircraft for a Global Observing System (IAGOS, http://www.iagos.org) aims at the provision of long-term, frequent, regular, accurate, and spatially resolved in situ observations of the atmospheric composition. IAGOS observation systems are deployed on a fleet of commercial aircraft. The IAGOS database is an essential part of the global atmospheric monitoring network. Data access is handled by open access policy based on the submission of research requests which are reviewed by the PIs. Users can access the data through the following web sites: http://www.iagos.fr or http://www.pole-ether.fr as the IAGOS database is part of the French atmospheric chemistry data centre ETHER (CNES and CNRS). The database is in continuous development and improvement. In the framework of the IGAS project (IAGOS for GMES/COPERNICUS Atmospheric Service), major achievements will be reached, such as metadata and format standardisation in order to interoperate with international portals and other databases, QA/QC procedures and traceability, CARIBIC (Civil Aircraft for the Regular Investigation of the Atmosphere Based on an Instrument Container) data integration within the central database, and the real-time data transmission. IGAS work package 2 aims at providing the IAGOS data to users in a standardized format including the necessary metadata and information on data processing, data quality and uncertainties. We are currently redefining and standardizing the IAGOS metadata for interoperable use within GMES/Copernicus. The metadata are compliant with the ISO 19115, INSPIRE and NetCDF-CF conventions. IAGOS data will be provided to users in NetCDF or NASA Ames format. We also are implementing interoperability between all the involved IAGOS data services, including the central IAGOS database, the former MOZAIC and CARIBIC databases, Aircraft Research DLR database and the Jülich WCS web application JOIN (Jülich OWS Interface) which combines model outputs with in situ data for

  14. Improve data integration performance by employing metadata management utility

    Wei, M.; Sung, A.H. [New Mexico Petroleum Recovery Research Center, Socorro, NM (United States)

    2005-07-01

    This conference paper explored ways of integrating petroleum and exploration data obtained from different sources in order to provide more comprehensive data for various analysis purposes and to improve the integrity and consistency of integrated data. This paper proposes a methodology to enhance oil and gas industry data integration performance by cooperating data management utilities in Microsoft SQL Server database management system (DBMS) for small scale data integration without support of commercial software. By semi-automatically capturing metadata, data sources are investigated in detail, data quality problems are partially cleansed, and the performance of data integration is improved. 20 refs., 7 tabs., 1 fig.

  15. Evolution of the architecture of the ATLAS Metadata Interface (AMI)

    Odier, J.; Aidel, O.; Albrand, S.; Fulachier, J.; Lambert, F.

    2015-12-01

    The ATLAS Metadata Interface (AMI) is now a mature application. Over the years, the number of users and the number of provided functions has dramatically increased. It is necessary to adapt the hardware infrastructure in a seamless way so that the quality of service re - mains high. We describe the AMI evolution since its beginning being served by a single MySQL backend database server to the current state having a cluster of virtual machines at French Tier1, an Oracle database at Lyon with complementary replication to the Oracle DB at CERN and AMI back-up server.

  16. Evolution of the Architecture of the ATLAS Metadata Interface (AMI)

    Odier, Jerome; The ATLAS collaboration; Fulachier, Jerome; Lambert, Fabian

    2015-01-01

    The ATLAS Metadata Interface (AMI) is now a mature application. Over the years, the number of users and the number of provided functions has dramatically increased. It is necessary to adapt the hardware infrastructure in a seamless way so that the quality of service remains high. We describe the evolution from the beginning of the application life, using one server with a MySQL backend database, to the current state in which a cluster of virtual machines on the French Tier 1 cloud at Lyon, an Oracle database also at Lyon, with replication to Oracle at CERN and a back-up server are used.

  17. Scalable and Media Aware Adaptive Video Streaming over Wireless Networks

    Béatrice Pesquet-Popescu

    2008-07-01

    Full Text Available This paper proposes an advanced video streaming system based on scalable video coding in order to optimize resource utilization in wireless networks with retransmission mechanisms at radio protocol level. The key component of this system is a packet scheduling algorithm which operates on the different substreams of a main scalable video stream and which is implemented in a so-called media aware network element. The concerned type of transport channel is a dedicated channel subject to parameters (bitrate, loss rate variations on the long run. Moreover, we propose a combined scalability approach in which common temporal and SNR scalability features can be used jointly with a partitioning of the image into regions of interest. Simulation results show that our approach provides substantial quality gain compared to classical packet transmission methods and they demonstrate how ROI coding combined with SNR scalability allows to improve again the visual quality.

  18. Towards Precise Metadata-set for Discovering 3D Geospatial Models in Geo-portals

    Zamyadi, A.; Pouliot, J.; Bédard, Y.

    2013-09-01

    Accessing 3D geospatial models, eventually at no cost and for unrestricted use, is certainly an important issue as they become popular among participatory communities, consultants, and officials. Various geo-portals, mainly established for 2D resources, have tried to provide access to existing 3D resources such as digital elevation model, LIDAR or classic topographic data. Describing the content of data, metadata is a key component of data discovery in geo-portals. An inventory of seven online geo-portals and commercial catalogues shows that the metadata referring to 3D information is very different from one geo-portal to another as well as for similar 3D resources in the same geo-portal. The inventory considered 971 data resources affiliated with elevation. 51% of them were from three geo-portals running at Canadian federal and municipal levels whose metadata resources did not consider 3D model by any definition. Regarding the remaining 49% which refer to 3D models, different definition of terms and metadata were found, resulting in confusion and misinterpretation. The overall assessment of these geo-portals clearly shows that the provided metadata do not integrate specific and common information about 3D geospatial models. Accordingly, the main objective of this research is to improve 3D geospatial model discovery in geo-portals by adding a specific metadata-set. Based on the knowledge and current practices on 3D modeling, and 3D data acquisition and management, a set of metadata is proposed to increase its suitability for 3D geospatial models. This metadata-set enables the definition of genuine classes, fields, and code-lists for a 3D metadata profile. The main structure of the proposal contains 21 metadata classes. These classes are classified in three packages as General and Complementary on contextual and structural information, and Availability on the transition from storage to delivery format. The proposed metadata set is compared with Canadian Geospatial

  19. openPDS: protecting the privacy of metadata through SafeAnswers.

    Yves-Alexandre de Montjoye

    Full Text Available The rise of smartphones and web services made possible the large-scale collection of personal metadata. Information about individuals' location, phone call logs, or web-searches, is collected and used intensively by organizations and big data researchers. Metadata has however yet to realize its full potential. Privacy and legal concerns, as well as the lack of technical solutions for personal metadata management is preventing metadata from being shared and reconciled under the control of the individual. This lack of access and control is furthermore fueling growing concerns, as it prevents individuals from understanding and managing the risks associated with the collection and use of their data. Our contribution is two-fold: (1 we describe openPDS, a personal metadata management framework that allows individuals to collect, store, and give fine-grained access to their metadata to third parties. It has been implemented in two field studies; (2 we introduce and analyze SafeAnswers, a new and practical way of protecting the privacy of metadata at an individual level. SafeAnswers turns a hard anonymization problem into a more tractable security one. It allows services to ask questions whose answers are calculated against the metadata instead of trying to anonymize individuals' metadata. The dimensionality of the data shared with the services is reduced from high-dimensional metadata to low-dimensional answers that are less likely to be re-identifiable and to contain sensitive information. These answers can then be directly shared individually or in aggregate. openPDS and SafeAnswers provide a new way of dynamically protecting personal metadata, thereby supporting the creation of smart data-driven services and data science research.

  20. Metadata Design in the New PDS4 Standards - Something for Everybody

    Raugh, Anne C.; Hughes, John S.

    2015-11-01

    The Planetary Data System (PDS) archives, supports, and distributes data of diverse targets, from diverse sources, to diverse users. One of the core problems addressed by the PDS4 data standard redesign was that of metadata - how to accommodate the increasingly sophisticated demands of search interfaces, analytical software, and observational documentation into label standards without imposing limits and constraints that would impinge on the quality or quantity of metadata that any particular observer or team could supply. And yet, as an archive, PDS must have detailed documentation for the metadata in the labels it supports, or the institutional knowledge encoded into those attributes will be lost - putting the data at risk.The PDS4 metadata solution is based on a three-step approach. First, it is built on two key ISO standards: ISO 11179 "Information Technology - Metadata Registries", which provides a common framework and vocabulary for defining metadata attributes; and ISO 14721 "Space Data and Information Transfer Systems - Open Archival Information System (OAIS) Reference Model", which provides the framework for the information architecture that enforces the object-oriented paradigm for metadata modeling. Second, PDS has defined a hierarchical system that allows it to divide its metadata universe into namespaces ("data dictionaries", conceptually), and more importantly to delegate stewardship for a single namespace to a local authority. This means that a mission can develop its own data model with a high degree of autonomy and effectively extend the PDS model to accommodate its own metadata needs within the common ISO 11179 framework. Finally, within a single namespace - even the core PDS namespace - existing metadata structures can be extended and new structures added to the model as new needs are identifiedThis poster illustrates the PDS4 approach to metadata management and highlights the expected return on the development investment for PDS, users and data

  1. Using Semantic Web technologies for the generation of domain-specific templates to support clinical study metadata standards.

    Jiang, Guoqian; Evans, Julie; Endle, Cory M; Solbrig, Harold R; Chute, Christopher G

    2016-01-01

    The Biomedical Research Integrated Domain Group (BRIDG) model is a formal domain analysis model for protocol-driven biomedical research, and serves as a semantic foundation for application and message development in the standards developing organizations (SDOs). The increasing sophistication and complexity of the BRIDG model requires new approaches to the management and utilization of the underlying semantics to harmonize domain-specific standards. The objective of this study is to develop and evaluate a Semantic Web-based approach that integrates the BRIDG model with ISO 21090 data types to generate domain-specific templates to support clinical study metadata standards development. We developed a template generation and visualization system based on an open source Resource Description Framework (RDF) store backend, a SmartGWT-based web user interface, and a "mind map" based tool for the visualization of generated domain-specific templates. We also developed a RESTful Web Service informed by the Clinical Information Modeling Initiative (CIMI) reference model for access to the generated domain-specific templates. A preliminary usability study is performed and all reviewers (n = 3) had very positive responses for the evaluation questions in terms of the usability and the capability of meeting the system requirements (with the average score of 4.6). Semantic Web technologies provide a scalable infrastructure and have great potential to enable computable semantic interoperability of models in the intersection of health care and clinical research.

  2. ARIADNE: a tracking system for relationships in LHCb metadata

    Shapoval, I; Clemencic, M; Cattaneo, M

    2014-01-01

    The data processing model of the LHCb experiment implies handling of an evolving set of heterogeneous metadata entities and relationships between them. The entities range from software and databases states to architecture specificators and software/data deployment locations. For instance, there is an important relationship between the LHCb Conditions Database (CondDB), which provides versioned, time dependent geometry and conditions data, and the LHCb software, which is the data processing applications (used for simulation, high level triggering, reconstruction and analysis of physics data). The evolution of CondDB and of the LHCb applications is a weakly-homomorphic process. It means that relationships between a CondDB state and LHCb application state may not be preserved across different database and application generations. These issues may lead to various kinds of problems in the LHCb production, varying from unexpected application crashes to incorrect data processing results. In this paper we present Ariadne – a generic metadata relationships tracking system based on the novel NoSQL Neo4j graph database. Its aim is to track and analyze many thousands of evolving relationships for cases such as the one described above, and several others, which would otherwise remain unmanaged and potentially harmful. The highlights of the paper include the system's implementation and management details, infrastructure needed for running it, security issues, first experience of usage in the LHCb production and potential of the system to be applied to a wider set of LHCb tasks.

  3. ARIADNE: a Tracking System for Relationships in LHCb Metadata

    Shapoval, I.; Clemencic, M.; Cattaneo, M.

    2014-06-01

    The data processing model of the LHCb experiment implies handling of an evolving set of heterogeneous metadata entities and relationships between them. The entities range from software and databases states to architecture specificators and software/data deployment locations. For instance, there is an important relationship between the LHCb Conditions Database (CondDB), which provides versioned, time dependent geometry and conditions data, and the LHCb software, which is the data processing applications (used for simulation, high level triggering, reconstruction and analysis of physics data). The evolution of CondDB and of the LHCb applications is a weakly-homomorphic process. It means that relationships between a CondDB state and LHCb application state may not be preserved across different database and application generations. These issues may lead to various kinds of problems in the LHCb production, varying from unexpected application crashes to incorrect data processing results. In this paper we present Ariadne - a generic metadata relationships tracking system based on the novel NoSQL Neo4j graph database. Its aim is to track and analyze many thousands of evolving relationships for cases such as the one described above, and several others, which would otherwise remain unmanaged and potentially harmful. The highlights of the paper include the system's implementation and management details, infrastructure needed for running it, security issues, first experience of usage in the LHCb production and potential of the system to be applied to a wider set of LHCb tasks.

  4. DataNet: A flexible metadata overlay over file resources

    CERN. Geneva

    2014-01-01

    Managing and sharing data stored in files results in a challenge due to data amounts produced by various scientific experiments [1]. While solutions such as Globus Online [2] focus on file transfer and synchronization, in this work we propose an additional layer of metadata over file resources which helps to categorize and structure the data, as well as to make it efficient in integration with web-based research gateways. A basic concept of the proposed solution [3] is a data model consisting of entities built from primitive types such as numbers, texts and also from files and relationships among different entities. This allows for building complex data structure definitions and mix metadata and file data into a single model tailored for a given scientific field. A data model becomes actionable after being deployed as a data repository which is done automatically by the proposed framework by using one of the available PaaS (platform-as-a-service) platforms and is exposed to the world as a REST service, which...

  5. Metadata In, Library Out. A Simple, Robust Digital Library System

    Tonio Loewald

    2010-06-01

    Full Text Available Tired of being held hostage to expensive systems that did not meet our needs, the University of Alabama Libraries developed an XML schema-agnostic, light-weight digital library delivery system based on the principles of "Keep It Simple, Stupid!" Metadata and derivatives reside in openly accessible web directories, which support the development of web agents and new usability software, as well as modification and complete retrieval at any time. The file name structure is echoed in the file system structure, enabling the delivery software to make inferences about relationships, sequencing, and complex object structure without having to encapsulate files in complex metadata schemas. The web delivery system, Acumen, is built of PHP, JSON, JavaScript and HTML5, using MySQL to support fielded searching. Recognizing that spreadsheets are more user-friendly than XML, an accompanying widget, Archivists Utility, transforms spreadsheets into MODS based on rules selected by the user. Acumen, Archivists Utility, and all supporting software scripts will be made available as open source.

  6. Inconsistencies between Academic E-Book Platforms: A Comparison of Metadata and Search Results

    Wiersma, Gabrielle; Tovstiadi, Esta

    2017-01-01

    This article presents the results of a study of academic e-books that compared the metadata and search results from major academic e-book platforms. The authors collected data and performed a series of test searches designed to produce the same result regardless of platform. Testing, however, revealed metadata-related errors and significant…

  7. Document Classification in Support of Automated Metadata Extraction Form Heterogeneous Collections

    Flynn, Paul K.

    2014-01-01

    A number of federal agencies, universities, laboratories, and companies are placing their documents online and making them searchable via metadata fields such as author, title, and publishing organization. To enable this, every document in the collection must be catalogued using the metadata fields. Though time consuming, the task of identifying…

  8. iLOG: A Framework for Automatic Annotation of Learning Objects with Empirical Usage Metadata

    Miller, L. D.; Soh, Leen-Kiat; Samal, Ashok; Nugent, Gwen

    2012-01-01

    Learning objects (LOs) are digital or non-digital entities used for learning, education or training commonly stored in repositories searchable by their associated metadata. Unfortunately, based on the current standards, such metadata is often missing or incorrectly entered making search difficult or impossible. In this paper, we investigate…

  9. An Assistant for Loading Learning Object Metadata: An Ontology Based Approach

    Casali, Ana; Deco, Claudia; Romano, Agustín; Tomé, Guillermo

    2013-01-01

    In the last years, the development of different Repositories of Learning Objects has been increased. Users can retrieve these resources for reuse and personalization through searches in web repositories. The importance of high quality metadata is key for a successful retrieval. Learning Objects are described with metadata usually in the standard…

  10. OAI-PMH repositories : quality issues regarding metadata and protocol compliance, tutorial 1

    CERN. Geneva; Cole, Tim

    2005-01-01

    This tutorial will provide an overview of emerging guidelines and best practices for OAI data providers and how they relate to expectations and needs of service providers. The audience should already be familiar with OAI protocol basics and have at least some experience with either data provider or service provider implementations. The speakers will present both protocol compliance best practices and general recommendations for creating and disseminating high-quality "shareable metadata". Protocol best practices discussion will include coverage of OAI identifiers, date-stamps, deleted records, sets, resumption tokens, about containers, branding, errors conditions, HTTP server issues, and repository lifecycle issues. Discussion of what makes for good, shareable metadata will cover topics including character encoding, namespace and XML schema issues, metadata crosswalk issues, support of multiple metadata formats, general metadata authoring recommendations, specific recommendations for use of Dublin Core elemen...

  11. Oracle database performance and scalability a quantitative approach

    Liu, Henry H

    2011-01-01

    A data-driven, fact-based, quantitative text on Oracle performance and scalability With database concepts and theories clearly explained in Oracle's context, readers quickly learn how to fully leverage Oracle's performance and scalability capabilities at every stage of designing and developing an Oracle-based enterprise application. The book is based on the author's more than ten years of experience working with Oracle, and is filled with dependable, tested, and proven performance optimization techniques. Oracle Database Performance and Scalability is divided into four parts that enable reader

  12. Scalable-to-lossless transform domain distributed video coding

    Huang, Xin; Ukhanova, Ann; Veselov, Anton

    2010-01-01

    Distributed video coding (DVC) is a novel approach providing new features as low complexity encoding by mainly exploiting the source statistics at the decoder based on the availability of decoder side information. In this paper, scalable-tolossless DVC is presented based on extending a lossy Tran...... codec provides frame by frame encoding. Comparing the lossless coding efficiency, the proposed scalable-to-lossless TDWZ video codec can save up to 5%-13% bits compared to JPEG LS and H.264 Intra frame lossless coding and do so as a scalable-to-lossless coding....

  13. Design issues for numerical libraries on scalable multicore architectures

    Heroux, M A

    2008-01-01

    Future generations of scalable computers will rely on multicore nodes for a significant portion of overall system performance. At present, most applications and libraries cannot exploit multiple cores beyond running addition MPI processes per node. In this paper we discuss important multicore architecture issues, programming models, algorithms requirements and software design related to effective use of scalable multicore computers. In particular, we focus on important issues for library research and development, making recommendations for how to effectively develop libraries for future scalable computer systems

  14. Scalable inference for stochastic block models

    Peng, Chengbin

    2017-12-08

    Community detection in graphs is widely used in social and biological networks, and the stochastic block model is a powerful probabilistic tool for describing graphs with community structures. However, in the era of "big data," traditional inference algorithms for such a model are increasingly limited due to their high time complexity and poor scalability. In this paper, we propose a multi-stage maximum likelihood approach to recover the latent parameters of the stochastic block model, in time linear with respect to the number of edges. We also propose a parallel algorithm based on message passing. Our algorithm can overlap communication and computation, providing speedup without compromising accuracy as the number of processors grows. For example, to process a real-world graph with about 1.3 million nodes and 10 million edges, our algorithm requires about 6 seconds on 64 cores of a contemporary commodity Linux cluster. Experiments demonstrate that the algorithm can produce high quality results on both benchmark and real-world graphs. An example of finding more meaningful communities is illustrated consequently in comparison with a popular modularity maximization algorithm.

  15. CODA: A scalable, distributed data acquisition system

    Watson, W.A. III; Chen, J.; Heyes, G.; Jastrzembski, E.; Quarrie, D.

    1994-01-01

    A new data acquisition system has been designed for physics experiments scheduled to run at CEBAF starting in the summer of 1994. This system runs on Unix workstations connected via ethernet, FDDI, or other network hardware to multiple intelligent front end crates -- VME, CAMAC or FASTBUS. CAMAC crates may either contain intelligent processors, or may be interfaced to VME. The system is modular and scalable, from a single front end crate and one workstation linked by ethernet, to as may as 32 clusters of front end crates ultimately connected via a high speed network to a set of analysis workstations. The system includes an extensible, device independent slow controls package with drivers for CAMAC, VME, and high voltage crates, as well as a link to CEBAF accelerator controls. All distributed processes are managed by standard remote procedure calls propagating change-of-state requests, or reading and writing program variables. Custom components may be easily integrated. The system is portable to any front end processor running the VxWorks real-time kernel, and to most workstations supplying a few standard facilities such as rsh and X-windows, and Motif and socket libraries. Sample implementations exist for 2 Unix workstation families connected via ethernet or FDDI to VME (with interfaces to FASTBUS or CAMAC), and via ethernet to FASTBUS or CAMAC

  16. Ancestors protocol for scalable key management

    Dieter Gollmann

    2010-06-01

    Full Text Available Group key management is an important functional building block for secure multicast architecture. Thereby, it has been extensively studied in the literature. The main proposed protocol is Adaptive Clustering for Scalable Group Key Management (ASGK. According to ASGK protocol, the multicast group is divided into clusters, where each cluster consists of areas of members. Each cluster uses its own Traffic Encryption Key (TEK. These clusters are updated periodically depending on the dynamism of the members during the secure session. The modified protocol has been proposed based on ASGK with some modifications to balance the number of affected members and the encryption/decryption overhead with any number of the areas when a member joins or leaves the group. This modified protocol is called Ancestors protocol. According to Ancestors protocol, every area receives the dynamism of the members from its parents. The main objective of the modified protocol is to reduce the number of affected members during the leaving and joining members, then 1 affects n overhead would be reduced. A comparative study has been done between ASGK protocol and the modified protocol. According to the comparative results, it found that the modified protocol is always outperforming the ASGK protocol.

  17. Scalable Combinatorial Tools for Health Disparities Research

    Michael A. Langston

    2014-10-01

    Full Text Available Despite staggering investments made in unraveling the human genome, current estimates suggest that as much as 90% of the variance in cancer and chronic diseases can be attributed to factors outside an individual’s genetic endowment, particularly to environmental exposures experienced across his or her life course. New analytical approaches are clearly required as investigators turn to complicated systems theory and ecological, place-based and life-history perspectives in order to understand more clearly the relationships between social determinants, environmental exposures and health disparities. While traditional data analysis techniques remain foundational to health disparities research, they are easily overwhelmed by the ever-increasing size and heterogeneity of available data needed to illuminate latent gene x environment interactions. This has prompted the adaptation and application of scalable combinatorial methods, many from genome science research, to the study of population health. Most of these powerful tools are algorithmically sophisticated, highly automated and mathematically abstract. Their utility motivates the main theme of this paper, which is to describe real applications of innovative transdisciplinary models and analyses in an effort to help move the research community closer toward identifying the causal mechanisms and associated environmental contexts underlying health disparities. The public health exposome is used as a contemporary focus for addressing the complex nature of this subject.

  18. Percolator: Scalable Pattern Discovery in Dynamic Graphs

    Choudhury, Sutanay; Purohit, Sumit; Lin, Peng; Wu, Yinghui; Holder, Lawrence B.; Agarwal, Khushbu

    2018-02-06

    We demonstrate Percolator, a distributed system for graph pattern discovery in dynamic graphs. In contrast to conventional mining systems, Percolator advocates efficient pattern mining schemes that (1) support pattern detection with keywords; (2) integrate incremental and parallel pattern mining; and (3) support analytical queries such as trend analysis. The core idea of Percolator is to dynamically decide and verify a small fraction of patterns and their in- stances that must be inspected in response to buffered updates in dynamic graphs, with a total mining cost independent of graph size. We demonstrate a) the feasibility of incremental pattern mining by walking through each component of Percolator, b) the efficiency and scalability of Percolator over the sheer size of real-world dynamic graphs, and c) how the user-friendly GUI of Percolator inter- acts with users to support keyword-based queries that detect, browse and inspect trending patterns. We also demonstrate two user cases of Percolator, in social media trend analysis and academic collaboration analysis, respectively.

  19. Scalable Notch Antenna System for Multiport Applications

    Abdurrahim Toktas

    2016-01-01

    Full Text Available A novel and compact scalable antenna system is designed for multiport applications. The basic design is built on a square patch with an electrical size of 0.82λ0×0.82λ0 (at 2.4 GHz on a dielectric substrate. The design consists of four symmetrical and orthogonal triangular notches with circular feeding slots at the corners of the common patch. The 4-port antenna can be simply rearranged to 8-port and 12-port systems. The operating band of the system can be tuned by scaling (S the size of the system while fixing the thickness of the substrate. The antenna system with S: 1/1 in size of 103.5×103.5 mm2 operates at the frequency band of 2.3–3.0 GHz. By scaling the antenna with S: 1/2.3, a system of 45×45 mm2 is achieved, and thus the operating band is tuned to 4.7–6.1 GHz with the same scattering characteristic. A parametric study is also conducted to investigate the effects of changing the notch dimensions. The performance of the antenna is verified in terms of the antenna characteristics as well as diversity and multiplexing parameters. The antenna system can be tuned by scaling so that it is applicable to the multiport WLAN, WIMAX, and LTE devices with port upgradability.

  20. Scalable conditional induction variables (CIV) analysis

    Oancea, Cosmin E.

    2015-02-01

    Subscripts using induction variables that cannot be expressed as a formula in terms of the enclosing-loop indices appear in the low-level implementation of common programming abstractions such as Alter, or stack operations and pose significant challenges to automatic parallelization. Because the complexity of such induction variables is often due to their conditional evaluation across the iteration space of loops we name them Conditional Induction Variables (CIV). This paper presents a flow-sensitive technique that summarizes both such CIV-based and affine subscripts to program level, using the same representation. Our technique requires no modifications of our dependence tests, which is agnostic to the original shape of the subscripts, and is more powerful than previously reported dependence tests that rely on the pairwise disambiguation of read-write references. We have implemented the CIV analysis in our parallelizing compiler and evaluated its impact on five Fortran benchmarks. We have found that that there are many important loops using CIV subscripts and that our analysis can lead to their scalable parallelization. This in turn has led to the parallelization of the benchmark programs they appear in.

  1. A Programmable, Scalable-Throughput Interleaver

    E. J. C. Rijshouwer

    2010-01-01

    Full Text Available The interleaver stages of digital communication standards show a surprisingly large variation in throughput, state sizes, and permutation functions. Furthermore, data rates for 4G standards such as LTE-Advanced will exceed typical baseband clock frequencies of handheld devices. Multistream operation for Software Defined Radio and iterative decoding algorithms will call for ever higher interleave data rates. Our interleave machine is built around 8 single-port SRAM banks and can be programmed to generate up to 8 addresses every clock cycle. The scalable architecture combines SIMD and VLIW concepts with an efficient resolution of bank conflicts. A wide range of cellular, connectivity, and broadcast interleavers have been mapped on this machine, with throughputs up to more than 0.5 Gsymbol/second. Although it was designed for channel interleaving, the application domain of the interleaver extends also to Turbo interleaving. The presented configuration of the architecture is designed as a part of a programmable outer receiver on a prototype board. It offers (near universal programmability to enable the implementation of new interleavers. The interleaver measures 2.09 mm2 in 65 nm CMOS (including memories and proves functional on silicon.

  2. A Programmable, Scalable-Throughput Interleaver

    Rijshouwer EJC

    2010-01-01

    Full Text Available The interleaver stages of digital communication standards show a surprisingly large variation in throughput, state sizes, and permutation functions. Furthermore, data rates for 4G standards such as LTE-Advanced will exceed typical baseband clock frequencies of handheld devices. Multistream operation for Software Defined Radio and iterative decoding algorithms will call for ever higher interleave data rates. Our interleave machine is built around 8 single-port SRAM banks and can be programmed to generate up to 8 addresses every clock cycle. The scalable architecture combines SIMD and VLIW concepts with an efficient resolution of bank conflicts. A wide range of cellular, connectivity, and broadcast interleavers have been mapped on this machine, with throughputs up to more than 0.5 Gsymbol/second. Although it was designed for channel interleaving, the application domain of the interleaver extends also to Turbo interleaving. The presented configuration of the architecture is designed as a part of a programmable outer receiver on a prototype board. It offers (near universal programmability to enable the implementation of new interleavers. The interleaver measures 2.09 m in 65 nm CMOS (including memories and proves functional on silicon.

  3. NERIES: Seismic Data Gateways and User Composed Datasets Metadata Management

    Spinuso, Alessandro; Trani, Luca; Kamb, Linus; Frobert, Laurent

    2010-05-01

    One of the NERIES EC project main objectives is to establish and improve the networking of seismic waveform data exchange and access among four main data centers in Europe: INGV, GFZ, ORFEUS and IPGP. Besides the implementation of the data backbone, several investigations and developments have been conducted in order to offer to the users the data available from this network, either programmatically or interactively. One of the challenges is to understand how to enable users` activities such as discovering, aggregating, describing and sharing datasets to obtain a decrease in the replication of similar data queries towards the network, exempting the data centers to guess and create useful pre-packed products. We`ve started to transfer this task more and more towards the users community, where the users` composed data products could be extensively re-used. The main link to the data is represented by a centralized webservice (SeismoLink) acting like a single access point to the whole data network. Users can download either waveform data or seismic station inventories directly from their own software routines by connecting to this webservice, which routes the request to the data centers. The provenance of the data is maintained and transferred to the users in the form of URIs, that identify the dataset and implicitly refer to the data provider. SeismoLink, combined with other webservices (eg EMSC-QuakeML earthquakes catalog service), is used from a community gateway such as the NERIES web portal (http://www.seismicportal.eu). Here the user interacts with a map based portlet which allows the dynamic composition of a data product, binding seismic event`s parameters with a set of seismic stations. The requested data is collected by the back-end processes of the portal, preserved and offered to the user in a personal data cart, where metadata can be generated interactively on-demand. The metadata, expressed in RDF, can also be remotely ingested. They offer rating

  4. A Semantically Enabled Metadata Repository for Solar Irradiance Data Products

    Wilson, A.; Cox, M.; Lindholm, D. M.; Nadiadi, I.; Traver, T.

    2014-12-01

    The Laboratory for Atmospheric and Space Physics, LASP, has been conducting research in Atmospheric and Space science for over 60 years, and providing the associated data products to the public. LASP has a long history, in particular, of making space-based measurements of the solar irradiance, which serves as crucial input to several areas of scientific research, including solar-terrestrial interactions, atmospheric, and climate. LISIRD, the LASP Interactive Solar Irradiance Data Center, serves these datasets to the public, including solar spectral irradiance (SSI) and total solar irradiance (TSI) data. The LASP extended metadata repository, LEMR, is a database of information about the datasets served by LASP, such as parameters, uncertainties, temporal and spectral ranges, current version, alerts, etc. It serves as the definitive, single source of truth for that information. The database is populated with information garnered via web forms and automated processes. Dataset owners keep the information current and verified for datasets under their purview. This information can be pulled dynamically for many purposes. Web sites such as LISIRD can include this information in web page content as it is rendered, ensuring users get current, accurate information. It can also be pulled to create metadata records in various metadata formats, such as SPASE (for heliophysics) and ISO 19115. Once these records are be made available to the appropriate registries, our data will be discoverable by users coming in via those organizations. The database is implemented as a RDF triplestore, a collection of instances of subject-object-predicate data entities identifiable with a URI. This capability coupled with SPARQL over HTTP read access enables semantic queries over the repository contents. To create the repository we leveraged VIVO, an open source semantic web application, to manage and create new ontologies and populate repository content. A variety of ontologies were used in

  5. Oceanotron, Scalable Server for Marine Observations

    Loubrieu, T.; Bregent, S.; Blower, J. D.; Griffiths, G.

    2013-12-01

    Ifremer, French marine institute, is deeply involved in data management for different ocean in-situ observation programs (ARGO, OceanSites, GOSUD, ...) or other European programs aiming at networking ocean in-situ observation data repositories (myOcean, seaDataNet, Emodnet). To capitalize the effort for implementing advance data dissemination services (visualization, download with subsetting) for these programs and generally speaking water-column observations repositories, Ifremer decided to develop the oceanotron server (2010). Knowing the diversity of data repository formats (RDBMS, netCDF, ODV, ...) and the temperamental nature of the standard interoperability interface profiles (OGC/WMS, OGC/WFS, OGC/SOS, OpeNDAP, ...), the server is designed to manage plugins: - StorageUnits : which enable to read specific data repository formats (netCDF/OceanSites, RDBMS schema, ODV binary format). - FrontDesks : which get external requests and send results for interoperable protocols (OGC/WMS, OGC/SOS, OpenDAP). In between a third type of plugin may be inserted: - TransformationUnits : which enable ocean business related transformation of the features (for example conversion of vertical coordinates from pressure in dB to meters under sea surface). The server is released under open-source license so that partners can develop their own plugins. Within MyOcean project, University of Reading has plugged a WMS implementation as an oceanotron frontdesk. The modules are connected together by sharing the same information model for marine observations (or sampling features: vertical profiles, point series and trajectories), dataset metadata and queries. The shared information model is based on OGC/Observation & Measurement and Unidata/Common Data Model initiatives. The model is implemented in java (http://www.ifremer.fr/isi/oceanotron/javadoc/). This inner-interoperability level enables to capitalize ocean business expertise in software development without being indentured to

  6. ARIADNE: a Tracking System for Relationships in LHCb Metadata

    Shapoval, I; Cattaneo, M

    2014-01-01

    The data processing model of the LHCb experiment implies handling of an evolving set of heterogeneous metadata entities and relationships between them. The entities range from software and databases states to architecture specificators and software/data deployment locations. For instance, there is an important relationship between the LHCb Conditions Database (CondDB), which provides versioned, time dependent geometry and conditions data, and the LHCb software, which is the data processing applications (used for simulation, high level triggering, reconstruction and analysis of physics data). The evolution of CondDB and of the LHCb applications is a weakly-homomorphic process. It means that relationships between a CondDB state and LHCb application state may not be preserved across different database and application generations. These issues may lead to various kinds of problems in the LHCb production, varying from unexpected application crashes to incorrect data processing results. In this paper we present Ari...

  7. Metadata behind the Interoperability of Wireless Sensor Networks

    Miguel Angel Manso Callejo

    2009-05-01

    Full Text Available Wireless Sensor Networks (WSNs produce changes of status that are frequent, dynamic and unpredictable, and cannot be represented using a linear cause-effect approach. Consequently, a new approach is needed to handle these changes in order to support dynamic interoperability. Our approach is to introduce the notion of context as an explicit representation of changes of a WSN status inferred from metadata elements, which in turn, leads towards a decision-making process about how to maintain dynamic interoperability. This paper describes the developed context model to represent and reason over different WSN status based on four types of contexts, which have been identified as sensing, node, network and organisational contexts. The reasoning has been addressed by developing contextualising and bridges rules. As a result, we were able to demonstrate how contextualising rules have been used to reason on changes of WSN status as a first step towards maintaining dynamic interoperability.

  8. QualityML: a dictionary for quality metadata encoding

    Ninyerola, Miquel; Sevillano, Eva; Serral, Ivette; Pons, Xavier; Zabala, Alaitz; Bastin, Lucy; Masó, Joan

    2014-05-01

    The scenario of rapidly growing geodata catalogues requires tools focused on facilitate users the choice of products. Having quality fields populated in metadata allow the users to rank and then select the best fit-for-purpose products. In this direction, we have developed the QualityML (http://qualityml.geoviqua.org), a dictionary that contains hierarchically structured concepts to precisely define and relate quality levels: from quality classes to quality measurements. Generically, a quality element is the path that goes from the higher level (quality class) to the lowest levels (statistics or quality metrics). This path is used to encode quality of datasets in the corresponding metadata schemas. The benefits of having encoded quality, in the case of data producers, are related with improvements in their product discovery and better transmission of their characteristics. In the case of data users, particularly decision-makers, they would find quality and uncertainty measures to take the best decisions as well as perform dataset intercomparison. Also it allows other components (such as visualization, discovery, or comparison tools) to be quality-aware and interoperable. On one hand, the QualityML is a profile of the ISO geospatial metadata standards providing a set of rules for precisely documenting quality indicator parameters that is structured in 6 levels. On the other hand, QualityML includes semantics and vocabularies for the quality concepts. Whenever possible, if uses statistic expressions from the UncertML dictionary (http://www.uncertml.org) encoding. However it also extends UncertML to provide list of alternative metrics that are commonly used to quantify quality. A specific example, based on a temperature dataset, is shown below. The annual mean temperature map has been validated with independent in-situ measurements to obtain a global error of 0.5 ° C. Level 0: Quality class (e.g., Thematic accuracy) Level 1: Quality indicator (e.g., Quantitative

  9. Metadata and their impact on processes in Building Information Modeling

    Vladimir Nyvlt

    2014-04-01

    Full Text Available Building Information Modeling (BIM itself contains huge potential, how to increase effectiveness of every project in its all life cycle. It means from initial investment plan through project and building-up activities to long-term usage and property maintenance and finally demolition. Knowledge Management or better say Knowledge Sharing covers two sets of tools, managerial and technological. Manager`s needs are real expectations and desires of final users in terms of how could they benefit from managing long-term projects, covering whole life cycle in terms of sparing investment money and other resources. Technology employed can help BIM processes to support and deliver these benefits to users. How to use this technology for data and metadata collection, storage and sharing, which processes may these new technologies deploy. We will touch how to cover optimized processes proposal for better and smooth support of knowledge sharing within project time-scale, and covering all its life cycle.

  10. ATLAS File and Dataset Metadata Collection and Use

    Albrand, S; The ATLAS collaboration; Lambert, F; Gallas, E J

    2012-01-01

    The ATLAS Metadata Interface (“AMI”) was designed as a generic cataloguing system, and as such it has found many uses in the experiment including software release management, tracking of reconstructed event sizes and control of dataset nomenclature. The primary use of AMI is to provide a catalogue of datasets (file collections) which is searchable using physics criteria. In this paper we discuss the various mechanisms used for filling the AMI dataset and file catalogues. By correlating information from different sources we can derive aggregate information which is important for physics analysis; for example the total number of events contained in dataset, and possible reasons for missing events such as a lost file. Finally we will describe some specialized interfaces which were developed for the Data Preparation and reprocessing coordinators. These interfaces manipulate information from both the dataset domain held in AMI, and the run-indexed information held in the ATLAS COMA application (Conditions and ...

  11. Enhancing Media Personalization by Extracting Similarity Knowledge from Metadata

    Butkus, Andrius

    be seen as a cognitive foundation for modeling concepts. Conceptual Spaces is applied in this thesis to analyze media in terms of its dimensions and knowledge domains, which in return defines properties and concepts. One of the most important domains in terms of describing media is the emotional one...... only “more of the same” type of content which does not necessarily lead to the meaningful personalization. Another way to approach similarity is to find a similar underlying meaning in the content. Aspects of meaning in media can be represented using Gardenfors Conceptual Spaces theory, which can......) using Latent Semantic Analysis (one of the unsupervised machine learning techniques). It presents three separate cases to illustrate the similarity knowledge extraction from the metadata, where the emotional components in each case represents different abstraction levels – genres, synopsis and lyrics...

  12. PROGRAM SYSTEM AND INFORMATION METADATA BANK OF TERTIARY PROTEIN STRUCTURES

    T. A. Nikitin

    2013-01-01

    Full Text Available The article deals with the architecture of metadata storage model for check results of three-dimensional protein structures. Concept database model was built. The service and procedure of database update as well as data transformation algorithms for protein structures and their quality were presented. Most important information about entries and their submission forms to store, access, and delivery to users were highlighted. Software suite was developed for the implementation of functional tasks using Java programming language in the NetBeans v.7.0 environment and JQL to query and interact with the database JavaDB. The service was tested and results have shown system effectiveness while protein structures filtration.

  13. Visualizing and Validating Metadata Traceability within the CDISC Standards.

    Hume, Sam; Sarnikar, Surendra; Becnel, Lauren; Bennett, Dorine

    2017-01-01

    The Food & Drug Administration has begun requiring that electronic submissions of regulated clinical studies utilize the Clinical Data Information Standards Consortium data standards. Within regulated clinical research, traceability is a requirement and indicates that the analysis results can be traced back to the original source data. Current solutions for clinical research data traceability are limited in terms of querying, validation and visualization capabilities. This paper describes (1) the development of metadata models to support computable traceability and traceability visualizations that are compatible with industry data standards for the regulated clinical research domain, (2) adaptation of graph traversal algorithms to make them capable of identifying traceability gaps and validating traceability across the clinical research data lifecycle, and (3) development of a traceability query capability for retrieval and visualization of traceability information.

  14. A Testbed for Highly-Scalable Mission Critical Information Systems

    Birman, Kenneth P

    2005-01-01

    ... systems in a networked environment. Headed by Professor Ken Birman, the project is exploring a novel fusion of classical protocols for reliable multicast communication with a new style of peer-to-peer protocol called scalable "gossip...

  15. Scalable Partitioning Algorithms for FPGAs With Heterogeneous Resources

    Selvakkumaran, Navaratnasothie; Ranjan, Abhishek; Raje, Salil; Karypis, George

    2004-01-01

    As FPGA densities increase, partitioning-based FPGA placement approaches are becoming increasingly important as they can be used to provide high-quality and computationally scalable placement solutions...

  16. Phonion: Practical Protection of Metadata in Telephony Networks

    Heuser Stephan

    2017-01-01

    Full Text Available The majority of people across the globe rely on telephony networks as their primary means of communication. As such, many of the most sensitive personal, corporate and government related communications pass through these systems every day. Unsurprisingly, such connections are subject to a wide range of attacks. Of increasing concern is the use of metadata contained in Call Detail Records (CDRs, which contain source, destination, start time and duration of a call. This information is potentially dangerous as the very act of two parties communicating can reveal significant details about their relationship and put them in the focus of targeted observation or surveillance, which is highly critical especially for journalists and activists. To address this problem, we develop the Phonion architecture to frustrate such attacks by separating call setup functions from call delivery. Specifically, Phonion allows users to preemptively establish call circuits across multiple providers and technologies before dialing into the circuit and does not require constant Internet connectivity. Since no single carrier can determine the ultimate destination of the call, it provides unlinkability for its users and helps them to avoid passive surveillance. We define and discuss a range of adversary classes and analyze why current obfuscation technologies fail to protect users against such metadata attacks. In our extensive evaluation we further analyze advanced anonymity technologies (e.g., VoIP over Tor, which do not preserve our functional requirements for high voice quality in the absence of constant broadband Internet connectivity and compatibility with landline and feature phones. Phonion is the first practical system to provide guarantees of unlinkable communication against a range of practical adversaries in telephony systems.

  17. SOL: A Library for Scalable Online Learning Algorithms

    Wu, Yue; Hoi, Steven C. H.; Liu, Chenghao; Lu, Jing; Sahoo, Doyen; Yu, Nenghai

    2016-01-01

    SOL is an open-source library for scalable online learning algorithms, and is particularly suitable for learning with high-dimensional data. The library provides a family of regular and sparse online learning algorithms for large-scale binary and multi-class classification tasks with high efficiency, scalability, portability, and extensibility. SOL was implemented in C++, and provided with a collection of easy-to-use command-line tools, python wrappers and library calls for users and develope...

  18. Modular Universal Scalable Ion-trap Quantum Computer

    2016-06-02

    SECURITY CLASSIFICATION OF: The main goal of the original MUSIQC proposal was to construct and demonstrate a modular and universally- expandable ion...Distribution Unlimited UU UU UU UU 02-06-2016 1-Aug-2010 31-Jan-2016 Final Report: Modular Universal Scalable Ion-trap Quantum Computer The views...P.O. Box 12211 Research Triangle Park, NC 27709-2211 Ion trap quantum computation, scalable modular architectures REPORT DOCUMENTATION PAGE 11

  19. Architectures and Applications for Scalable Quantum Information Systems

    2007-01-01

    Gershenfeld and I. Chuang. Quantum computing with molecules. Scientific American, June 1998. [16] A. Globus, D. Bailey, J. Han, R. Jaffe, C. Levit , R...AFRL-IF-RS-TR-2007-12 Final Technical Report January 2007 ARCHITECTURES AND APPLICATIONS FOR SCALABLE QUANTUM INFORMATION SYSTEMS...NUMBER 5b. GRANT NUMBER FA8750-01-2-0521 4. TITLE AND SUBTITLE ARCHITECTURES AND APPLICATIONS FOR SCALABLE QUANTUM INFORMATION SYSTEMS 5c

  20. On the scalability of LISP and advanced overlaid services

    Coras, Florin

    2015-01-01

    In just four decades the Internet has gone from a lab experiment to a worldwide, business critical infrastructure that caters to the communication needs of almost a half of the Earth's population. With these figures on its side, arguing against the Internet's scalability would seem rather unwise. However, the Internet's organic growth is far from finished and, as billions of new devices are expected to be joined in the not so distant future, scalability, or lack thereof, is commonly believed ...

  1. Scalable, full-colour and controllable chromotropic plasmonic printing

    Xue, Jiancai; Zhou, Zhang-Kai; Wei, Zhiqiang; Su, Rongbin; Lai, Juan; Li, Juntao; Li, Chao; Zhang, Tengwei; Wang, Xue-Hua

    2015-01-01

    Plasmonic colour printing has drawn wide attention as a promising candidate for the next-generation colour-printing technology. However, an efficient approach to realize full colour and scalable fabrication is still lacking, which prevents plasmonic colour printing from practical applications. Here we present a scalable and full-colour plasmonic printing approach by combining conjugate twin-phase modulation with a plasmonic broadband absorber. More importantly, our approach also demonstrates ...

  2. Responsive, Flexible and Scalable Broader Impacts (Invited)

    Decharon, A.; Companion, C.; Steinman, M.

    2010-12-01

    In many educator professional development workshops, scientists present content in a slideshow-type format and field questions afterwards. Drawbacks of this approach include: inability to begin the lecture with content that is responsive to audience needs; lack of flexible access to specific material within the linear presentation; and “Q&A” sessions are not easily scalable to broader audiences. Often this type of traditional interaction provides little direct benefit to the scientists. The Centers for Ocean Sciences Education Excellence - Ocean Systems (COSEE-OS) applies the technique of concept mapping with demonstrated effectiveness in helping scientists and educators “get on the same page” (deCharon et al., 2009). A key aspect is scientist professional development geared towards improving face-to-face and online communication with non-scientists. COSEE-OS promotes scientist-educator collaboration, tests the application of scientist-educator maps in new contexts through webinars, and is piloting the expansion of maps as long-lived resources for the broader community. Collaboration - COSEE-OS has developed and tested a workshop model bringing scientists and educators together in a peer-oriented process, often clarifying common misconceptions. Scientist-educator teams develop online concept maps that are hyperlinked to “assets” (i.e., images, videos, news) and are responsive to the needs of non-scientist audiences. In workshop evaluations, 91% of educators said that the process of concept mapping helped them think through science topics and 89% said that concept mapping helped build a bridge of communication with scientists (n=53). Application - After developing a concept map, with COSEE-OS staff assistance, scientists are invited to give webinar presentations that include live “Q&A” sessions. The webinars extend the reach of scientist-created concept maps to new contexts, both geographically and topically (e.g., oil spill), with a relatively small

  3. Describing Geospatial Assets in the Web of Data: A Metadata Management Scenario

    Cristiano Fugazza

    2016-12-01

    Full Text Available Metadata management is an essential enabling factor for geospatial assets because discovery, retrieval, and actual usage of the latter are tightly bound to the quality of these descriptions. Unfortunately, the multi-faceted landscape of metadata formats, requirements, and conventions makes it difficult to identify editing tools that can be easily tailored to the specificities of a given project, workgroup, and Community of Practice. Our solution is a template-driven metadata editing tool that can be customised to any XML-based schema. Its output is constituted by standards-compliant metadata records that also have a semantics-aware counterpart eliciting novel exploitation techniques. Moreover, external data sources can easily be plugged in to provide autocompletion functionalities on the basis of the data structures made available on the Web of Data. Beside presenting the essentials on customisation of the editor by means of two use cases, we extend the methodology to the whole life cycle of geospatial metadata. We demonstrate the novel capabilities enabled by RDF-based metadata representation with respect to traditional metadata management in the geospatial domain.

  4. Development of health information search engine based on metadata and ontology.

    Song, Tae-Min; Park, Hyeoun-Ae; Jin, Dal-Lae

    2014-04-01

    The aim of the study was to develop a metadata and ontology-based health information search engine ensuring semantic interoperability to collect and provide health information using different application programs. Health information metadata ontology was developed using a distributed semantic Web content publishing model based on vocabularies used to index the contents generated by the information producers as well as those used to search the contents by the users. Vocabulary for health information ontology was mapped to the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), and a list of about 1,500 terms was proposed. The metadata schema used in this study was developed by adding an element describing the target audience to the Dublin Core Metadata Element Set. A metadata schema and an ontology ensuring interoperability of health information available on the internet were developed. The metadata and ontology-based health information search engine developed in this study produced a better search result compared to existing search engines. Health information search engine based on metadata and ontology will provide reliable health information to both information producer and information consumers.

  5. Stop the Bleeding: the Development of a Tool to Streamline NASA Earth Science Metadata Curation Efforts

    le Roux, J.; Baker, A.; Caltagirone, S.; Bugbee, K.

    2017-12-01

    The Common Metadata Repository (CMR) is a high-performance, high-quality repository for Earth science metadata records, and serves as the primary way to search NASA's growing 17.5 petabytes of Earth science data holdings. Released in 2015, CMR has the capability to support several different metadata standards already being utilized by NASA's combined network of Earth science data providers, or Distributed Active Archive Centers (DAACs). The Analysis and Review of CMR (ARC) Team located at Marshall Space Flight Center is working to improve the quality of records already in CMR with the goal of making records optimal for search and discovery. This effort entails a combination of automated and manual review, where each NASA record in CMR is checked for completeness, accuracy, and consistency. This effort is highly collaborative in nature, requiring communication and transparency of findings amongst NASA personnel, DAACs, the CMR team and other metadata curation teams. Through the evolution of this project it has become apparent that there is a need to document and report findings, as well as track metadata improvements in a more efficient manner. The ARC team has collaborated with Element 84 in order to develop a metadata curation tool to meet these needs. In this presentation, we will provide an overview of this metadata curation tool and its current capabilities. Challenges and future plans for the tool will also be discussed.

  6. Microscopic Characterization of Scalable Coherent Rydberg Superatoms

    Johannes Zeiher

    2015-08-01

    Full Text Available Strong interactions can amplify quantum effects such that they become important on macroscopic scales. Controlling these coherently on a single-particle level is essential for the tailored preparation of strongly correlated quantum systems and opens up new prospects for quantum technologies. Rydberg atoms offer such strong interactions, which lead to extreme nonlinearities in laser-coupled atomic ensembles. As a result, multiple excitation of a micrometer-sized cloud can be blocked while the light-matter coupling becomes collectively enhanced. The resulting two-level system, often called a “superatom,” is a valuable resource for quantum information, providing a collective qubit. Here, we report on the preparation of 2 orders of magnitude scalable superatoms utilizing the large interaction strength provided by Rydberg atoms combined with precise control of an ensemble of ultracold atoms in an optical lattice. The latter is achieved with sub-shot-noise precision by local manipulation of a two-dimensional Mott insulator. We microscopically confirm the superatom picture by in situ detection of the Rydberg excitations and observe the characteristic square-root scaling of the optical coupling with the number of atoms. Enabled by the full control over the atomic sample, including the motional degrees of freedom, we infer the overlap of the produced many-body state with a W state from the observed Rabi oscillations and deduce the presence of entanglement. Finally, we investigate the breakdown of the superatom picture when two Rydberg excitations are present in the system, which leads to dephasing and a loss of coherence.

  7. Myria: Scalable Analytics as a Service

    Howe, B.; Halperin, D.; Whitaker, A.

    2014-12-01

    At the UW eScience Institute, we're working to empower non-experts, especially in the sciences, to write and use data-parallel algorithms. To this end, we are building Myria, a web-based platform for scalable analytics and data-parallel programming. Myria's internal model of computation is the relational algebra extended with iteration, such that every program is inherently data-parallel, just as every query in a database is inherently data-parallel. But unlike databases, iteration is a first class concept, allowing us to express machine learning tasks, graph traversal tasks, and more. Programs can be expressed in a number of languages and can be executed on a number of execution environments, but we emphasize a particular language called MyriaL that supports both imperative and declarative styles and a particular execution engine called MyriaX that uses an in-memory column-oriented representation and asynchronous iteration. We deliver Myria over the web as a service, providing an editor, performance analysis tools, and catalog browsing features in a single environment. We find that this web-based "delivery vector" is critical in reaching non-experts: they are insulated from irrelevant effort technical work associated with installation, configuration, and resource management. The MyriaX backend, one of several execution runtimes we support, is a main-memory, column-oriented, RDBMS-on-the-worker system that supports cyclic data flows as a first-class citizen and has been shown to outperform competitive systems on 100-machine cluster sizes. I will describe the Myria system, give a demo, and present some new results in large-scale oceanographic microbiology.

  8. DEVELOPMENT OF A METADATA MANAGEMENT SYSTEM FOR AN INTERDISCIPLINARY RESEARCH PROJECT

    C. Curdt

    2012-07-01

    Full Text Available In every interdisciplinary, long-term research project it is essential to manage and archive all heterogeneous research data, produced by the project participants during the project funding. This has to include sustainable storage, description with metadata, easy and secure provision, back up, and visualisation of all data. To ensure the accurate description of all project data with corresponding metadata, the design and implementation of a metadata management system is a significant duty. Thus, the sustainable use and search of all research results during and after the end of the project is particularly dependent on the implementation of a metadata management system. Therefore, this paper will describe the practical experiences gained during the development of a scientific research data management system (called the TR32DB including the corresponding metadata management system for the multidisciplinary research project Transregional Collaborative Research Centre 32 (CRC/TR32 'Patterns in Soil-Vegetation-Atmosphere Systems'. The entire system was developed according to the requirements of the funding agency, the user and project requirements, as well as according to recent standards and principles. The TR32DB is basically a combination of data storage, database, and web-interface. The metadata management system was designed, realized, and implemented to describe and access all project data via accurate metadata. Since the quantity and sort of descriptive metadata depends on the kind of data, a user-friendly multi-level approach was chosen to cover these requirements. Thus, the self-developed CRC/TR32 metadata framework is designed. It is a combination of general, CRC/TR32 specific, as well as data type specific properties.

  9. The ANSS Station Information System: A Centralized Station Metadata Repository for Populating, Managing and Distributing Seismic Station Metadata

    Thomas, V. I.; Yu, E.; Acharya, P.; Jaramillo, J.; Chowdhury, F.

    2015-12-01

    Maintaining and archiving accurate site metadata is critical for seismic network operations. The Advanced National Seismic System (ANSS) Station Information System (SIS) is a repository of seismic network field equipment, equipment response, and other site information. Currently, there are 187 different sensor models and 114 data-logger models in SIS. SIS has a web-based user interface that allows network operators to enter information about seismic equipment and assign response parameters to it. It allows users to log entries for sites, equipment, and data streams. Users can also track when equipment is installed, updated, and/or removed from sites. When seismic equipment configurations change for a site, SIS computes the overall gain of a data channel by combining the response parameters of the underlying hardware components. Users can then distribute this metadata in standardized formats such as FDSN StationXML or dataless SEED. One powerful advantage of SIS is that existing data in the repository can be leveraged: e.g., new instruments can be assigned response parameters from the Incorporated Research Institutions for Seismology (IRIS) Nominal Response Library (NRL), or from a similar instrument already in the inventory, thereby reducing the amount of time needed to determine parameters when new equipment (or models) are introduced into a network. SIS is also useful for managing field equipment that does not produce seismic data (eg power systems, telemetry devices or GPS receivers) and gives the network operator a comprehensive view of site field work. SIS allows users to generate field logs to document activities and inventory at sites. Thus, operators can also use SIS reporting capabilities to improve planning and maintenance of the network. Queries such as how many sensors of a certain model are installed or what pieces of equipment have active problem reports are just a few examples of the type of information that is available to SIS users.

  10. Optimizing Nanoelectrode Arrays for Scalable Intracellular Electrophysiology.

    Abbott, Jeffrey; Ye, Tianyang; Ham, Donhee; Park, Hongkun

    2018-03-20

    , clarifying how the nanoelectrode attains intracellular access. This understanding will be translated into a circuit model for the nanobio interface, which we will then use to lay out the strategies for improving the interface. The intracellular interface of the nanoelectrode is currently inferior to that of the patch clamp electrode; reaching this benchmark will be an exciting challenge that involves optimization of electrode geometries, materials, chemical modifications, electroporation protocols, and recording/stimulation electronics, as we describe in the Account. Another important theme of this Account, beyond the optimization of the individual nanoelectrode-cell interface, is the scalability of the nanoscale electrodes. We will discuss this theme using a recent development from our groups as an example, where an array of ca. 1000 nanoelectrode pixels fabricated on a CMOS integrated circuit chip performs parallel intracellular recording from a few hundreds of cardiomyocytes, which marks a new milestone in electrophysiology.

  11. Laplacian embedded regression for scalable manifold regularization.

    Chen, Lin; Tsang, Ivor W; Xu, Dong

    2012-06-01

    world data sets show the effectiveness and scalability of the proposed framework.

  12. Linked Metadata - lightweight semantics for data integration (Invited)

    Hendler, J. A.

    2013-12-01

    The "Linked Open Data" cloud (http://linkeddata.org) is currently used to show how the linking of datasets, supported by SPARQL endpoints, is creating a growing set of linked data assets. This linked data space has been growing rapidly, and the last version collected is estimated to have had over 35 billion 'triples.' As impressive as this may sound, there is an inherent flaw in the way the linked data story is conceived. The idea is that all of the data is represented in a linked format (generally RDF) and applications will essentially query this cloud and provide mashup capabilities between the various kinds of data that are found. The view of linking in the cloud is fairly simple -links are provided by either shared URIs or by URIs that are asserted to be owl:sameAs. This view of the linking, which primarily focuses on shared objects and subjects in RDF's subject-predicate-object representation, misses a critical aspect of Semantic Web technology. Given triples such as * A:person1 foaf:knows A:person2 * B:person3 foaf:knows B:person4 * C:person5 foaf:name 'John Doe' this view would not consider them linked (barring other assertions) even though they share a common vocabulary. In fact, we get significant clues that there are commonalities in these data items from the shared namespaces and predicates, even if the traditional 'graph' view of RDF doesn't appear to join on these. Thus, it is the linking of the data descriptions, whether as metadata or other vocabularies, that provides the linking in these cases. This observation is crucial to scientific data integration where the size of the datasets, or even the individual relationships within them, can be quite large. (Note that this is not restricted to scientific data - search engines, social networks, and massive multiuser games also create huge amounts of data.) To convert all the triples into RDF and provide individual links is often unnecessary, and is both time and space intensive. Those looking to do on the

  13. Metadata-Driven SOA-Based Application for Facilitation of Real-Time Data Warehousing

    Pintar, Damir; Vranić, Mihaela; Skočir, Zoran

    Service-oriented architecture (SOA) has already been widely recognized as an effective paradigm for achieving integration of diverse information systems. SOA-based applications can cross boundaries of platforms, operation systems and proprietary data standards, commonly through the usage of Web Services technology. On the other side, metadata is also commonly referred to as a potential integration tool given the fact that standardized metadata objects can provide useful information about specifics of unknown information systems with which one has interest in communicating with, using an approach commonly called "model-based integration". This paper presents the result of research regarding possible synergy between those two integration facilitators. This is accomplished with a vertical example of a metadata-driven SOA-based business process that provides ETL (Extraction, Transformation and Loading) and metadata services to a data warehousing system in need of a real-time ETL support.

  14. Metadata and network API aspects of a framework for storing and retrieving civil infrastructure monitoring data

    Wong, John-Michael; Stojadinovic, Bozidar

    2005-05-01

    A framework has been defined for storing and retrieving civil infrastructure monitoring data over a network. The framework consists of two primary components: metadata and network communications. The metadata component provides the descriptions and data definitions necessary for cataloging and searching monitoring data. The communications component provides Java classes for remotely accessing the data. Packages of Enterprise JavaBeans and data handling utility classes are written to use the underlying metadata information to build real-time monitoring applications. The utility of the framework was evaluated using wireless accelerometers on a shaking table earthquake simulation test of a reinforced concrete bridge column. The NEESgrid data and metadata repository services were used as a backend storage implementation. A web interface was created to demonstrate the utility of the data model and provides an example health monitoring application.

  15. Tags and self-organisation: a metadata ecology for learning resources in a multilingual context

    Vuorikari, Riina Hannuli

    2010-01-01

    Vuorikari, R. (2009). Tags and self-organisation: a metadata ecology for learning resources in a multilingual context. Doctoral thesis. November, 13, 2009, Heerlen, The Netherlands: Open University of the Netherlands, CELSTEC.

  16. Tags and self-organisation: a metadata ecology for learning resources in a multilingual context

    Vuorikari, Riina

    2009-01-01

    Vuorikari, R. (2009). Tags and self-organisation: a metadata ecology for learning resources in a multilingual context. Doctoral thesis. November, 13, 2009, Heerlen, The Netherlands: Open University of the Netherlands, CELSTEC.

  17. Making Information Visible, Accessible, and Understandable: Meta-Data and Registries

    Robinson, Clay

    2007-01-01

    ... the interoperability, discovery, and utility of data assets throughout the Department of Defense (DoD). Proper use and understanding of metadata can substantially enhance the utility of data by making it more visible, accessible, and understandable...

  18. Preserving Geological Samples and Metadata from Polar Regions

    Grunow, A.; Sjunneskog, C. M.

    2011-12-01

    The Office of Polar Programs at the National Science Foundation (NSF-OPP) has long recognized the value of preserving earth science collections due to the inherent logistical challenges and financial costs of collecting geological samples from Polar Regions. NSF-OPP established two national facilities to make Antarctic geological samples and drill cores openly and freely available for research. The Antarctic Marine Geology Research Facility (AMGRF) at Florida State University was established in 1963 and archives Antarctic marine sediment cores, dredge samples and smear slides along with ship logs. The United States Polar Rock Repository (USPRR) at Ohio State University was established in 2003 and archives polar rock samples, marine dredges, unconsolidated materials and terrestrial cores, along with associated materials such as field notes, maps, raw analytical data, paleomagnetic cores, thin sections, microfossil mounts, microslides and residues. The existence of the AMGRF and USPRR helps to minimize redundant sample collecting, lessen the environmental impact of doing polar field work, facilitates field logistics planning and complies with the data sharing requirement of the Antarctic Treaty. USPRR acquires collections through donations from institutions and scientists and then makes these samples available as no-cost loans for research, education and museum exhibits. The AMGRF acquires sediment cores from US based and international collaboration drilling projects in Antarctica. Destructive research techniques are allowed on the loaned samples and loan requests are accepted from any accredited scientific institution in the world. Currently, the USPRR has more than 22,000 cataloged rock samples available to scientists from around the world. All cataloged samples are relabeled with a USPRR number, weighed, photographed and measured for magnetic susceptibility. Many aspects of the sample metadata are included in the database, e.g. geographical location, sample

  19. A document centric metadata registration tool constructing earth environmental data infrastructure

    Ichino, M.; Kinutani, H.; Ono, M.; Shimizu, T.; Yoshikawa, M.; Masuda, K.; Fukuda, K.; Kawamoto, H.

    2009-12-01

    DIAS (Data Integration and Analysis System) is one of GEOSS activities in Japan. It is also a leading part of the GEOSS task with the same name defined in GEOSS Ten Year Implementation Plan. The main mission of DIAS is to construct data infrastructure that can effectively integrate earth environmental data such as observation data, numerical model outputs, and socio-economic data provided from the fields of climate, water cycle, ecosystem, ocean, biodiversity and agriculture. Some of DIAS's data products are available at the following web site of http://www.jamstec.go.jp/e/medid/dias. Most of earth environmental data commonly have spatial and temporal attributes such as the covering geographic scope or the created date. The metadata standards including these common attributes are published by the geographic information technical committee (TC211) in ISO (the International Organization for Standardization) as specifications of ISO 19115:2003 and 19139:2007. Accordingly, DIAS metadata is developed with basing on ISO/TC211 metadata standards. From the viewpoint of data users, metadata is useful not only for data retrieval and analysis but also for interoperability and information sharing among experts, beginners and nonprofessionals. On the other hand, from the viewpoint of data providers, two problems were pointed out after discussions. One is that data providers prefer to minimize another tasks and spending time for creating metadata. Another is that data providers want to manage and publish documents to explain their data sets more comprehensively. Because of solving these problems, we have been developing a document centric metadata registration tool. The features of our tool are that the generated documents are available instantly and there is no extra cost for data providers to generate metadata. Also, this tool is developed as a Web application. So, this tool does not demand any software for data providers if they have a web-browser. The interface of the tool

  20. Automated Creation of Datamarts from a Clinical Data Warehouse, Driven by an Active Metadata Repository

    Rogerson, Charles L.; Kohlmiller, Paul H.; Stutman, Harris

    1998-01-01

    A methodology and toolkit are described which enable the automated metadata-driven creation of datamarts from clinical data warehouses. The software uses schema-to-schema transformation driven by an active metadata repository. Tools for assessing datamart data quality are described, as well as methods for assessing the feasibility of implementing specific datamarts. A methodology for data remediation and the re-engineering of operational data capture is described.

  1. New Tools to Document and Manage Data/Metadata: Example NGEE Arctic and ARM

    Crow, M. C.; Devarakonda, R.; Killeffer, T.; Hook, L.; Boden, T.; Wullschleger, S.

    2017-12-01

    Tools used for documenting, archiving, cataloging, and searching data are critical pieces of informatics. This poster describes tools being used in several projects at Oak Ridge National Laboratory (ORNL), with a focus on the U.S. Department of Energy's Next Generation Ecosystem Experiment in the Arctic (NGEE Arctic) and Atmospheric Radiation Measurements (ARM) project, and their usage at different stages of the data lifecycle. The Online Metadata Editor (OME) is used for the documentation and archival stages while a Data Search tool supports indexing, cataloging, and searching. The NGEE Arctic OME Tool [1] provides a method by which researchers can upload their data and provide original metadata with each upload while adhering to standard metadata formats. The tool is built upon a Java SPRING framework to parse user input into, and from, XML output. Many aspects of the tool require use of a relational database including encrypted user-login, auto-fill functionality for predefined sites and plots, and file reference storage and sorting. The Data Search Tool conveniently displays each data record in a thumbnail containing the title, source, and date range, and features a quick view of the metadata associated with that record, as well as a direct link to the data. The search box incorporates autocomplete capabilities for search terms and sorted keyword filters are available on the side of the page, including a map for geo-searching. These tools are supported by the Mercury [2] consortium (funded by DOE, NASA, USGS, and ARM) and developed and managed at Oak Ridge National Laboratory. Mercury is a set of tools for collecting, searching, and retrieving metadata and data. Mercury collects metadata from contributing project servers, then indexes the metadata to make it searchable using Apache Solr, and provides access to retrieve it from the web page. Metadata standards that Mercury supports include: XML, Z39.50, FGDC, Dublin-Core, Darwin-Core, EML, and ISO-19115.

  2. Deploying the ATLAS Metadata Interface (AMI) on the cloud with Jenkins

    Lambert, F.; Odier, J.; Fulachier, J.; ATLAS Collaboration

    2017-10-01

    The ATLAS Metadata Interface (AMI) is a mature application of more than 15 years of existence. Mainly used by the ATLAS experiment at CERN, it consists of a very generic tool ecosystem for metadata aggregation and cataloguing. AMI is used by the ATLAS production system, therefore the service must guarantee a high level of availability. We describe our monitoring and administration systems, and the Jenkins-based strategy used to dynamically test and deploy cloud OpenStack nodes on demand.

  3. Deploying the ATLAS Metadata Interface (AMI) on the cloud with Jenkins.

    AUTHOR|(SzGeCERN)637120; The ATLAS collaboration; Odier, Jerome; Fulachier, Jerome

    2017-01-01

    The ATLAS Metadata Interface (AMI) is a mature application of more than 15 years of existence. Mainly used by the ATLAS experiment at CERN, it consists of a very generic tool ecosystem for metadata aggregation and cataloguing. AMI is used by the ATLAS production system, therefore the service must guarantee a high level of availability. We describe our monitoring and administration systems, and the Jenkins-based strategy used to dynamically test and deploy cloud OpenStack nodes on demand.

  4. Making Information Visible, Accessible, and Understandable: Meta-Data and Registries

    2007-07-01

    the data created, the length of play time, album name, and the genre. Without resource metadata, portable digital music players would not be so...notion of a catalog card in a library. An example of metadata is the description of a music file specifying the creator, the artist that performed the song...describe struc- ture and formatting which are critical to interoperability and the management of databases. Going back to the portable music player example

  5. Automating the Extraction of Metadata from Archaeological Data Using iRods Rules

    David Walling

    2011-10-01

    Full Text Available The Texas Advanced Computing Center and the Institute for Classical Archaeology at the University of Texas at Austin developed a method that uses iRods rules and a Jython script to automate the extraction of metadata from digital archaeological data. The first step was to create a record-keeping system to classify the data. The record-keeping system employs file and directory hierarchy naming conventions designed specifically to maintain the relationship between the data objects and map the archaeological documentation process. The metadata implicit in the record-keeping system is automatically extracted upon ingest, combined with additional sources of metadata, and stored alongside the data in the iRods preservation environment. This method enables a more organized workflow for the researchers, helps them archive their data close to the moment of data creation, and avoids error prone manual metadata input. We describe the types of metadata extracted and provide technical details of the extraction process and storage of the data and metadata.

  6. Survey data and metadata modelling using document-oriented NoSQL

    Rahmatuti Maghfiroh, Lutfi; Gusti Bagus Baskara Nugraha, I.

    2018-03-01

    Survey data that are collected from year to year have metadata change. However it need to be stored integratedly to get statistical data faster and easier. Data warehouse (DW) can be used to solve this limitation. However there is a change of variables in every period that can not be accommodated by DW. Traditional DW can not handle variable change via Slowly Changing Dimension (SCD). Previous research handle the change of variables in DW to manage metadata by using multiversion DW (MVDW). MVDW is designed using relational model. Some researches also found that developing nonrelational model in NoSQL database has reading time faster than the relational model. Therefore, we propose changes to metadata management by using NoSQL. This study proposes a model DW to manage change and algorithms to retrieve data with metadata changes. Evaluation of the proposed models and algorithms result in that database with the proposed design can retrieve data with metadata changes properly. This paper has contribution in comprehensive data analysis with metadata changes (especially data survey) in integrated storage.

  7. Normalized Metadata Generation for Human Retrieval Using Multiple Video Surveillance Cameras

    Jaehoon Jung

    2016-06-01

    Full Text Available Since it is impossible for surveillance personnel to keep monitoring videos from a multiple camera-based surveillance system, an efficient technique is needed to help recognize important situations by retrieving the metadata of an object-of-interest. In a multiple camera-based surveillance system, an object detected in a camera has a different shape in another camera, which is a critical issue of wide-range, real-time surveillance systems. In order to address the problem, this paper presents an object retrieval method by extracting the normalized metadata of an object-of-interest from multiple, heterogeneous cameras. The proposed metadata generation algorithm consists of three steps: (i generation of a three-dimensional (3D human model; (ii human object-based automatic scene calibration; and (iii metadata generation. More specifically, an appropriately-generated 3D human model provides the foot-to-head direction information that is used as the input of the automatic calibration of each camera. The normalized object information is used to retrieve an object-of-interest in a wide-range, multiple-camera surveillance system in the form of metadata. Experimental results show that the 3D human model matches the ground truth, and automatic calibration-based normalization of metadata enables a successful retrieval and tracking of a human object in the multiple-camera video surveillance system.

  8. A Comparative Study on Metadata Scheme of Chinese and American Open Data Platforms

    Yang Sinan

    2018-01-01

    Full Text Available [Purpose/significance] Open government data is conducive to the rational development and utilization of data resources. It can encourage social innovation and promote economic development. Besides, in order to ensure effective utilization and social increment of open government data, high-quality metadata schemes is necessary. [Method/process] Firstly, this paper analyzed the related research of open government data at home and abroad. Then, it investigated the open metadata schemes of some Chinese main local governments’ data platforms, and made a comparison with the metadata standard of American open government data. [Result/conclusion] This paper reveals that there are some disadvantages about Chinese local government open data affect the use effect of open data, which including that different governments use different data metadata schemes, the description of data set is too simple for further utilization and usually presented in HTML Web page format with lower machine-readable. Therefore, our government should come up with a standardized metadata schemes by drawing on the international mature and effective metadata standard, to ensure the social needs of high quality and high value data.

  9. Teknik Migrasi Data Lintas DBMS dengan Menggunakan Metadata

    Wahyu Hidayat

    2015-11-01

    Full Text Available Proses migrasi data biasanya dibutuhkan saat adanya perubahan sistem, format, atau tipe storage. Saat ini telah dikenal beberapa teknik dan kakas untuk melakukan migrasi data, misalnya CSV file, ODBC, SQLDump dan sebagainya. Sayangnya tidak semua teknik tersebut dapat diimplementasikan untuk migrasi data antara dua DBMS yang berbeda. Dalam penelitian ini dipaparkan sebuah teknik migrasi data yang dapat digunakan untuk migrasi data lintas DBMS. Teknik migrasi data yang dipaparkan memanfaatkan metadata yang ada di masing-masing DBMS. Proses migrasi data yang dipaparkan di sini melalui tiga tahap yaitu capture, convert dan construct. Sebuah prototype dibangun untuk menguji teknik migrasi data ini. Dengan menggunakan schema HR dilakukan uji coba migrasi data lintas DBMS antara Oracle dan MySQL. Dengan menggunakan teknik ini, migrasi data full-schema membutuhkan waktu rata-rata 20,43 detik dari DBMS Oracle ke MySQL dan 12,96 detik untuk skenario sebaliknya. Adapun untuk migrasi data parsial dibutuhkan waktu rata-rata 5,95 detik dari DBMS Oracle ke MySQL dan 2,19 detik untuk skenario sebaliknya.

  10. Structure constrained by metadata in networks of chess players.

    Almeira, Nahuel; Schaigorodsky, Ana L; Perotti, Juan I; Billoni, Orlando V

    2017-11-09

    Chess is an emblematic sport that stands out because of its age, popularity and complexity. It has served to study human behavior from the perspective of a wide number of disciplines, from cognitive skills such as memory and learning, to aspects like innovation and decision-making. Given that an extensive documentation of chess games played throughout history is available, it is possible to perform detailed and statistically significant studies about this sport. Here we use one of the most extensive chess databases in the world to construct two networks of chess players. One of the networks includes games that were played over-the-board and the other contains games played on the Internet. We study the main topological characteristics of the networks, such as degree distribution and correlations, transitivity and community structure. We complement the structural analysis by incorporating players' level of play as node metadata. Although both networks are topologically different, we show that in both cases players gather in communities according to their expertise and that an emergent rich-club structure, composed by the top-rated players, is also present.

  11. Data to Pictures to Data: Outreach Imaging Software and Metadata

    Levay, Z.

    2011-07-01

    A convergence between astronomy science and digital photography has enabled a steady stream of visually rich imagery from state-of-the-art data. The accessibility of hardware and software has facilitated an explosion of astronomical images for outreach, from space-based observatories, ground-based professional facilities and among the vibrant amateur astrophotography community. Producing imagery from science data involves a combination of custom software to understand FITS data (FITS Liberator), off-the-shelf, industry-standard software to composite multi-wavelength data and edit digital photographs (Adobe Photoshop), and application of photo/image-processing techniques. Some additional effort is needed to close the loop and enable this imagery to be conveniently available for various purposes beyond web and print publication. The metadata paradigms in digital photography are now complying with FITS and science software to carry information such as keyword tags and world coordinates, enabling these images to be usable in more sophisticated, imaginative ways exemplified by Sky in Google Earth and World Wide Telescope.

  12. File and metadata management for BESIII distributed computing

    Nicholson, C; Zheng, Y H; Lin, L; Deng, Z Y; Li, W D; Zhang, X M

    2012-01-01

    The BESIII experiment at the Institute of High Energy Physics (IHEP), Beijing, uses the high-luminosity BEPCII e + e − collider to study physics in the π-charm energy region around 3.7 GeV; BEPCII has produced the worlds largest samples of J/φ and φ’ events to date. An order of magnitude increase in the data sample size over the 2011-2012 data-taking period demanded a move from a very centralized to a distributed computing environment, as well as the development of an efficient file and metadata management system. While BESIII is on a smaller scale than some other HEP experiments, this poses particular challenges for its distributed computing and data management system. These constraints include limited resources and manpower, and low quality of network connections to IHEP. Drawing on the rich experience of the HEP community, a system has been developed which meets these constraints. The design and development of the BESIII distributed data management system, including its integration with other BESIII distributed computing components, such as job management, are presented here.

  13. Quality Scalability Compression on Single-Loop Solution in HEVC

    Mengmeng Zhang

    2014-01-01

    Full Text Available This paper proposes a quality scalable extension design for the upcoming high efficiency video coding (HEVC standard. In the proposed design, the single-loop decoder solution is extended into the proposed scalable scenario. A novel interlayer intra/interprediction is added to reduce the amount of bits representation by exploiting the correlation between coding layers. The experimental results indicate that the average Bjøntegaard delta rate decrease of 20.50% can be gained compared with the simulcast encoding. The proposed technique achieved 47.98% Bjøntegaard delta rate reduction compared with the scalable video coding extension of the H.264/AVC. Consequently, significant rate savings confirm that the proposed method achieves better performance.

  14. EUDAT B2FIND : A Cross-Discipline Metadata Service and Discovery Portal

    Widmann, Heinrich; Thiemann, Hannes

    2016-04-01

    The European Data Infrastructure (EUDAT) project aims at a pan-European environment that supports a variety of multiple research communities and individuals to manage the rising tide of scientific data by advanced data management technologies. This led to the establishment of the community-driven Collaborative Data Infrastructure that implements common data services and storage resources to tackle the basic requirements and the specific challenges of international and interdisciplinary research data management. The metadata service B2FIND plays a central role in this context by providing a simple and user-friendly discovery portal to find research data collections stored in EUDAT data centers or in other repositories. For this we store the diverse metadata collected from heterogeneous sources in a comprehensive joint metadata catalogue and make them searchable in an open data portal. The implemented metadata ingestion workflow consists of three steps. First the metadata records - provided either by various research communities or via other EUDAT services - are harvested. Afterwards the raw metadata records are converted and mapped to unified key-value dictionaries as specified by the B2FIND schema. The semantic mapping of the non-uniform, community specific metadata to homogenous structured datasets is hereby the most subtle and challenging task. To assure and improve the quality of the metadata this mapping process is accompanied by • iterative and intense exchange with the community representatives, • usage of controlled vocabularies and community specific ontologies and • formal and semantic validation. Finally the mapped and checked records are uploaded as datasets to the catalogue, which is based on the open source data portal software CKAN. CKAN provides a rich RESTful JSON API and uses SOLR for dataset indexing that enables users to query and search in the catalogue. The homogenization of the community specific data models and vocabularies enables not

  15. Scalable DeNoise-and-Forward in Bidirectional Relay Networks

    Sørensen, Jesper Hemming; Krigslund, Rasmus; Popovski, Petar

    2010-01-01

    In this paper a scalable relaying scheme is proposed based on an existing concept called DeNoise-and-Forward, DNF. We call it Scalable DNF, S-DNF, and it targets the scenario with multiple communication flows through a single common relay. The idea of the scheme is to combine packets at the relay...... in order to save transmissions. To ensure decodability at the end-nodes, a priori information about the content of the combined packets must be available. This is gathered during the initial transmissions to the relay. The trade-off between decodability and number of necessary transmissions is analysed...

  16. Metadata Quality in Institutional Repositories May be Improved by Addressing Staffing Issues

    Elizabeth Stovold

    2016-09-01

    Full Text Available A Review of: Moulaison, S. H., & Dykas, F. (2016. High-quality metadata and repository staffing: Perceptions of United States–based OpenDOAR participants. Cataloging & Classification Quarterly, 54(2, 101-116. http://dx.doi.org/10.1080/01639374.2015.1116480 Objective – To investigate the quality of institutional repository metadata, metadata practices, and identify barriers to quality. Design – Survey questionnaire. Setting – The OpenDOAR online registry of worldwide repositories. Subjects – A random sample of 50 from 358 administrators of institutional repositories in the United States of America listed in the OpenDOAR registry. Methods – The authors surveyed a random sample of administrators of American institutional repositories included in the OpenDOAR registry. The survey was distributed electronically. Recipients were asked to forward the email if they felt someone else was better suited to respond. There were questions about the demographics of the repository, the metadata creation environment, metadata quality, standards and practices, and obstacles to quality. Results were analyzed in Excel, and qualitative responses were coded by two researchers together. Main results – There was a 42% (n=21 response rate to the section on metadata quality, a 40% (n=20 response rate to the metadata creation section, and 40% (n=20 to the section on obstacles to quality. The majority of respondents rated their metadata quality as average (65%, n=13 or above average (30%, n=5. No one rated the quality as high or poor, while 10% (n=2 rated the quality as below average. The survey found that the majority of descriptive metadata was created by professional (84%, n=16 or paraprofessional (53%, n=10 library staff. Professional staff were commonly involved in creating administrative metadata, reviewing the metadata, and selecting standards and documentation. Department heads and advisory committees were also involved in standards and documentation

  17. Improvements to the Ontology-based Metadata Portal for Unified Semantics (OlyMPUS)

    Linsinbigler, M. A.; Gleason, J. L.; Huffer, E.

    2016-12-01

    The Ontology-based Metadata Portal for Unified Semantics (OlyMPUS), funded by the NASA Earth Science Technology Office Advanced Information Systems Technology program, is an end-to-end system designed to support Earth Science data consumers and data providers, enabling the latter to register data sets and provision them with the semantically rich metadata that drives the Ontology-Driven Interactive Search Environment for Earth Sciences (ODISEES). OlyMPUS complements the ODISEES' data discovery system with an intelligent tool to enable data producers to auto-generate semantically enhanced metadata and upload it to the metadata repository that drives ODISEES. Like ODISEES, the OlyMPUS metadata provisioning tool leverages robust semantics, a NoSQL database and query engine, an automated reasoning engine that performs first- and second-order deductive inferencing, and uses a controlled vocabulary to support data interoperability and automated analytics. The ODISEES data discovery portal leverages this metadata to provide a seamless data discovery and access experience for data consumers who are interested in comparing and contrasting the multiple Earth science data products available across NASA data centers. Olympus will support scientists' services and tools for performing complex analyses and identifying correlations and non-obvious relationships across all types of Earth System phenomena using the full spectrum of NASA Earth Science data available. By providing an intelligent discovery portal that supplies users - both human users and machines - with detailed information about data products, their contents and their structure, ODISEES will reduce the level of effort required to identify and prepare large volumes of data for analysis. This poster will explain how OlyMPUS leverages deductive reasoning and other technologies to create an integrated environment for generating and exploiting semantically rich metadata.

  18. Predicting age groups of Twitter users based on language and metadata features.

    Antonio A Morgan-Lopez

    Full Text Available Health organizations are increasingly using social media, such as Twitter, to disseminate health messages to target audiences. Determining the extent to which the target audience (e.g., age groups was reached is critical to evaluating the impact of social media education campaigns. The main objective of this study was to examine the separate and joint predictive validity of linguistic and metadata features in predicting the age of Twitter users. We created a labeled dataset of Twitter users across different age groups (youth, young adults, adults by collecting publicly available birthday announcement tweets using the Twitter Search application programming interface. We manually reviewed results and, for each age-labeled handle, collected the 200 most recent publicly available tweets and user handles' metadata. The labeled data were split into training and test datasets. We created separate models to examine the predictive validity of language features only, metadata features only, language and metadata features, and words/phrases from another age-validated dataset. We estimated accuracy, precision, recall, and F1 metrics for each model. An L1-regularized logistic regression model was conducted for each age group, and predicted probabilities between the training and test sets were compared for each age group. Cohen's d effect sizes were calculated to examine the relative importance of significant features. Models containing both Tweet language features and metadata features performed the best (74% precision, 74% recall, 74% F1 while the model containing only Twitter metadata features were least accurate (58% precision, 60% recall, and 57% F1 score. Top predictive features included use of terms such as "school" for youth and "college" for young adults. Overall, it was more challenging to predict older adults accurately. These results suggest that examining linguistic and Twitter metadata features to predict youth and young adult Twitter users may

  19. Event-level analysis of alcohol consumption and condom use in partnership contexts among men who have sex with men and transgender women in Lima, Peru.

    Delgado, Jeanne R; Segura, Eddy R; Lake, Jordan E; Sanchez, Jorge; Lama, Javier R; Clark, Jesse L

    2017-01-01

    We explored the association between alcohol use and condomless receptive (CRAI) and insertive (CIAI) anal intercourse within partnership contexts of men who have sex with men (MSM) and transgender women (TGW) in Lima, Peru. From 2012-2014, we surveyed men and TGW (n=1607) who reported anal intercourse with ≥1 male or TGW. Alcohol use with up to 3 sexual partners during the prior 90days was evaluated. Bivariate and multivariate analyses used generalized estimating equations to assess event-level associations between alcohol use, CRAI, CIAI, and partnership characteristics while adjusting for participant clustering from multiple partners. Of 4774 sexual partnerships reported, 48% were casual, 34% primary, 10% anonymous, and 8% commercial. Alcohol use preceding sex was significantly (pPeru. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  20. Design for scalability in 3D computer graphics architectures

    Holten-Lund, Hans Erik

    2002-01-01

    This thesis describes useful methods and techniques for designing scalable hybrid parallel rendering architectures for 3D computer graphics. Various techniques for utilizing parallelism in a pipelines system are analyzed. During the Ph.D study a prototype 3D graphics architecture named Hybris has...

  1. Scalable storage for a DBMS using transparent distribution

    J.S. Karlsson; M.L. Kersten (Martin)

    1997-01-01

    textabstractScalable Distributed Data Structures (SDDSs) provide a self-managing and self-organizing data storage of potentially unbounded size. This stands in contrast to common distribution schemas deployed in conventional distributed DBMS. SDDSs, however, have mostly been used in synthetic

  2. Scalable force directed graph layout algorithms using fast multipole methods

    Yunis, Enas Abdulrahman; Yokota, Rio; Ahmadia, Aron

    2012-01-01

    We present an extension to ExaFMM, a Fast Multipole Method library, as a generalized approach for fast and scalable execution of the Force-Directed Graph Layout algorithm. The Force-Directed Graph Layout algorithm is a physics-based approach

  3. Cascaded column generation for scalable predictive demand side management

    Toersche, Hermen; Molderink, Albert; Hurink, Johann L.; Smit, Gerardus Johannes Maria

    2014-01-01

    We propose a nested Dantzig-Wolfe decomposition, combined with dynamic programming, for the distributed scheduling of a large heterogeneous fleet of residential appliances with nonlinear behavior. A cascaded column generation approach gives a scalable optimization strategy, provided that the problem

  4. Scalable power selection method for wireless mesh networks

    Olwal, TO

    2009-01-01

    Full Text Available This paper addresses the problem of a scalable dynamic power control (SDPC) for wireless mesh networks (WMNs) based on IEEE 802.11 standards. An SDPC model that accounts for architectural complexities witnessed in multiple radios and hops...

  5. Efficient Enhancement for Spatial Scalable Video Coding Transmission

    Mayada Khairy

    2017-01-01

    Full Text Available Scalable Video Coding (SVC is an international standard technique for video compression. It is an extension of H.264 Advanced Video Coding (AVC. In the encoding of video streams by SVC, it is suitable to employ the macroblock (MB mode because it affords superior coding efficiency. However, the exhaustive mode decision technique that is usually used for SVC increases the computational complexity, resulting in a longer encoding time (ET. Many other algorithms were proposed to solve this problem with imperfection of increasing transmission time (TT across the network. To minimize the ET and TT, this paper introduces four efficient algorithms based on spatial scalability. The algorithms utilize the mode-distribution correlation between the base layer (BL and enhancement layers (ELs and interpolation between the EL frames. The proposed algorithms are of two categories. Those of the first category are based on interlayer residual SVC spatial scalability. They employ two methods, namely, interlayer interpolation (ILIP and the interlayer base mode (ILBM method, and enable ET and TT savings of up to 69.3% and 83.6%, respectively. The algorithms of the second category are based on full-search SVC spatial scalability. They utilize two methods, namely, full interpolation (FIP and the full-base mode (FBM method, and enable ET and TT savings of up to 55.3% and 76.6%, respectively.

  6. Scalable Robust Principal Component Analysis Using Grassmann Averages

    Hauberg, Søren; Feragen, Aasa; Enficiaud, Raffi

    2016-01-01

    In large datasets, manual data verification is impossible, and we must expect the number of outliers to increase with data size. While principal component analysis (PCA) can reduce data size, and scalable solutions exist, it is well-known that outliers can arbitrarily corrupt the results. Unfortu...

  7. A Scalable Smart Meter Data Generator Using Spark

    Iftikhar, Nadeem; Liu, Xiufeng; Danalachi, Sergiu

    2017-01-01

    Today, smart meters are being used worldwide. As a matter of fact smart meters produce large volumes of data. Thus, it is important for smart meter data management and analytics systems to process petabytes of data. Benchmarking and testing of these systems require scalable data, however, it can ...

  8. Scalability and efficiency of genetic algorithms for geometrical applications

    Dijk, van S.F.; Thierens, D.; Berg, de M.; Schoenauer, M.

    2000-01-01

    We study the scalability and efficiency of a GA that we developed earlier to solve the practical cartographic problem of labeling a map with point features. We argue that the special characteristics of our GA make that it fits in well with theoretical models predicting the optimal population size

  9. Scalable electro-photonic integration concept based on polymer waveguides

    Bosman, E.; Steenberge, G. van; Boersma, A.; Wiegersma, S.; Harmsma, P.J.; Karppinen, M.; Korhonen, T.; Offrein, B.J.; Dangel, R.; Daly, A.; Ortsiefer, M.; Justice, J.; Corbett, B.; Dorrestein, S.; Duis, J.

    2016-01-01

    A novel method for fabricating a single mode optical interconnection platform is presented. The method comprises the miniaturized assembly of optoelectronic single dies, the scalable fabrication of polymer single mode waveguides and the coupling to glass fiber arrays providing the I/O's. The low

  10. A Massively Scalable Architecture for Instant Messaging & Presence

    Schippers, Jorrit; Remke, Anne Katharina Ingrid; Punt, Henk; Wegdam, M.; Haverkort, Boudewijn R.H.M.; Thomas, N.; Bradley, J.; Knottenbelt, W.; Dingle, N.; Harder, U.

    2010-01-01

    This paper analyzes the scalability of Instant Messaging & Presence (IM&P) architectures. We take a queueing-based modelling and analysis approach to ��?nd the bottlenecks of the current IM&P architecture at the Dutch social network Hyves, as well as of alternative architectures. We use the

  11. Adolescent sexuality education: An appraisal of some scalable ...

    Adolescent sexuality education: An appraisal of some scalable interventions for the Nigerian context. VC Pam. Abstract. Most issues around sexual intercourse are highly sensitive topics in Nigeria. Despite the disturbingly high adolescent HIV prevalence and teenage pregnancy rate in Nigeria, sexuality education is ...

  12. Scalable multifunction RF system concepts for joint operations

    Otten, M.P.G.; Wit, J.J.M. de; Smits, F.M.A.; Rossum, W.L. van; Huizing, A.

    2010-01-01

    RF systems based on modular architectures have the potential of better re-use of technology, decreasing development time, and decreasing life cycle cost. Moreover, modular architectures provide scalability, allowing low cost upgrades and adaptability to different platforms. To achieve maximum

  13. Estimates of the Sampling Distribution of Scalability Coefficient H

    Van Onna, Marieke J. H.

    2004-01-01

    Coefficient "H" is used as an index of scalability in nonparametric item response theory (NIRT). It indicates the degree to which a set of items rank orders examinees. Theoretical sampling distributions, however, have only been derived asymptotically and only under restrictive conditions. Bootstrap methods offer an alternative possibility to…

  14. Development of an open metadata schema for prospective clinical research (openPCR) in China.

    Xu, W; Guan, Z; Sun, J; Wang, Z; Geng, Y

    2014-01-01

    In China, deployment of electronic data capture (EDC) and clinical data management system (CDMS) for clinical research (CR) is in its very early stage, and about 90% of clinical studies collected and submitted clinical data manually. This work aims to build an open metadata schema for Prospective Clinical Research (openPCR) in China based on openEHR archetypes, in order to help Chinese researchers easily create specific data entry templates for registration, study design and clinical data collection. Singapore Framework for Dublin Core Application Profiles (DCAP) is used to develop openPCR and four steps such as defining the core functional requirements and deducing the core metadata items, developing archetype models, defining metadata terms and creating archetype records, and finally developing implementation syntax are followed. The core functional requirements are divided into three categories: requirements for research registration, requirements for trial design, and requirements for case report form (CRF). 74 metadata items are identified and their Chinese authority names are created. The minimum metadata set of openPCR includes 3 documents, 6 sections, 26 top level data groups, 32 lower data groups and 74 data elements. The top level container in openPCR is composed of public document, internal document and clinical document archetypes. A hierarchical structure of openPCR is established according to Data Structure of Electronic Health Record Architecture and Data Standard of China (Chinese EHR Standard). Metadata attributes are grouped into six parts: identification, definition, representation, relation, usage guides, and administration. OpenPCR is an open metadata schema based on research registration standards, standards of the Clinical Data Interchange Standards Consortium (CDISC) and Chinese healthcare related standards, and is to be publicly available throughout China. It considers future integration of EHR and CR by adopting data structure and data

  15. Evaluation of 3D printed anatomically scalable transfemoral prosthetic knee.

    Ramakrishnan, Tyagi; Schlafly, Millicent; Reed, Kyle B

    2017-07-01

    This case study compares a transfemoral amputee's gait while using the existing Ossur Total Knee 2000 and our novel 3D printed anatomically scalable transfemoral prosthetic knee. The anatomically scalable transfemoral prosthetic knee is 3D printed out of a carbon-fiber and nylon composite that has a gear-mesh coupling with a hard-stop weight-actuated locking mechanism aided by a cross-linked four-bar spring mechanism. This design can be scaled using anatomical dimensions of a human femur and tibia to have a unique fit for each user. The transfemoral amputee who was tested is high functioning and walked on the Computer Assisted Rehabilitation Environment (CAREN) at a self-selected pace. The motion capture and force data that was collected showed that there were distinct differences in the gait dynamics. The data was used to perform the Combined Gait Asymmetry Metric (CGAM), where the scores revealed that the overall asymmetry of the gait on the Ossur Total Knee was more asymmetric than the anatomically scalable transfemoral prosthetic knee. The anatomically scalable transfemoral prosthetic knee had higher peak knee flexion that caused a large step time asymmetry. This made walking on the anatomically scalable transfemoral prosthetic knee more strenuous due to the compensatory movements in adapting to the different dynamics. This can be overcome by tuning the cross-linked spring mechanism to emulate the dynamics of the subject better. The subject stated that the knee would be good for daily use and has the potential to be adapted as a running knee.

  16. Virtual Environments for Visualizing Structural Health Monitoring Sensor Networks, Data, and Metadata.

    Napolitano, Rebecca; Blyth, Anna; Glisic, Branko

    2018-01-16

    Visualization of sensor networks, data, and metadata is becoming one of the most pivotal aspects of the structural health monitoring (SHM) process. Without the ability to communicate efficiently and effectively between disparate groups working on a project, an SHM system can be underused, misunderstood, or even abandoned. For this reason, this work seeks to evaluate visualization techniques in the field, identify flaws in current practices, and devise a new method for visualizing and accessing SHM data and metadata in 3D. More precisely, the work presented here reflects a method and digital workflow for integrating SHM sensor networks, data, and metadata into a virtual reality environment by combining spherical imaging and informational modeling. Both intuitive and interactive, this method fosters communication on a project enabling diverse practitioners of SHM to efficiently consult and use the sensor networks, data, and metadata. The method is presented through its implementation on a case study, Streicker Bridge at Princeton University campus. To illustrate the efficiency of the new method, the time and data file size were compared to other potential methods used for visualizing and accessing SHM sensor networks, data, and metadata in 3D. Additionally, feedback from civil engineering students familiar with SHM is used for validation. Recommendations on how different groups working together on an SHM project can create SHM virtual environment and convey data to proper audiences, are also included.

  17. CCR+: Metadata Based Extended Personal Health Record Data Model Interoperable with the ASTM CCR Standard.

    Park, Yu Rang; Yoon, Young Jo; Jang, Tae Hun; Seo, Hwa Jeong; Kim, Ju Han

    2014-01-01

    Extension of the standard model while retaining compliance with it is a challenging issue because there is currently no method for semantically or syntactically verifying an extended data model. A metadata-based extended model, named CCR+, was designed and implemented to achieve interoperability between standard and extended models. Furthermore, a multilayered validation method was devised to validate the standard and extended models. The American Society for Testing and Materials (ASTM) Community Care Record (CCR) standard was selected to evaluate the CCR+ model; two CCR and one CCR+ XML files were evaluated. In total, 188 metadata were extracted from the ASTM CCR standard; these metadata are semantically interconnected and registered in the metadata registry. An extended-data-model-specific validation file was generated from these metadata. This file can be used in a smartphone application (Health Avatar CCR+) as a part of a multilayered validation. The new CCR+ model was successfully evaluated via a patient-centric exchange scenario involving multiple hospitals, with the results supporting both syntactic and semantic interoperability between the standard CCR and extended, CCR+, model. A feasible method for delivering an extended model that complies with the standard model is presented herein. There is a great need to extend static standard models such as the ASTM CCR in various domains: the methods presented here represent an important reference for achieving interoperability between standard and extended models.

  18. Separation of metadata and pixel data to speed DICOM tag morphing.

    Ismail, Mahmoud; Philbin, James

    2013-01-01

    The DICOM information model combines pixel data and metadata in single DICOM object. It is not possible to access the metadata separately from the pixel data. There are use cases where only metadata is accessed. The current DICOM object format increases the running time of those use cases. Tag morphing is one of those use cases. Tag morphing includes deletion, insertion or manipulation of one or more of the metadata attributes. It is typically used for order reconciliation on study acquisition or to localize the issuer of patient ID (IPID) and the patient ID attributes when data from one domain is transferred to a different domain. In this work, we propose using Multi-Series DICOM (MSD) objects, which separate metadata from pixel data and remove duplicate attributes, to reduce the time required for Tag Morphing. The time required to update a set of study attributes in each format is compared. The results show that the MSD format significantly reduces the time required for tag morphing.

  19. Geo-Enrichment and Semantic Enhancement of Metadata Sets to Augment Discovery in Geoportals

    Bernhard Vockner

    2014-03-01

    Full Text Available Geoportals are established to function as main gateways to find, evaluate, and start “using” geographic information. Still, current geoportal implementations face problems in optimizing the discovery process due to semantic heterogeneity issues, which leads to low recall and low precision in performing text-based searches. Therefore, we propose an enhanced semantic discovery approach that supports multilingualism and information domain context. Thus, we present workflow that enriches existing structured metadata with synonyms, toponyms, and translated terms derived from user-defined keywords based on multilingual thesauri and ontologies. To make the results easier and understandable, we also provide automated translation capabilities for the resource metadata to support the user in conceiving the thematic content of the descriptive metadata, even if it has been documented using a language the user is not familiar with. In addition, to text-enable spatial filtering capabilities, we add additional location name keywords to metadata sets. These are based on the existing bounding box and shall tweak discovery scores when performing single text line queries. In order to improve the user’s search experience, we tailor faceted search strategies presenting an enhanced query interface for geo-metadata discovery that are transparently leveraging the underlying thesauri and ontologies.

  20. Blind Cooperative Routing for Scalable and Energy-Efficient Internet of Things

    Bader, Ahmed; Alouini, Mohamed-Slim

    2016-01-01

    Multihop networking is promoted in this paper for energy-efficient and highly-scalable Internet of Things (IoT). Recognizing concerns related to the scalability of classical multihop routing and medium access techniques, the use of blind cooperation

  1. GeckoFTL: Scalable Flash Translation Techniques For Very Large Flash Devices

    Dayan, Niv; Bonnet, Philippe; Idreos, Stratos

    2016-01-01

    The volume of metadata needed by a flash translation layer (FTL) is proportional to the storage capacity of a flash device. Ideally, this metadata should reside in the device's integrated RAM to enable fast access. However, as flash devices scale to terabytes, the necessary volume of metadata...... thereby harming performance and device lifetime. In this paper, we identify a key component of the metadata called the Page Validity Bitmap (PVB) as the bottleneck. PVB is used by the garbage-collectors of state-of-the-art FTLs to keep track of which physical pages in the device are invalid. PVB...... constitutes 95% of the FTL's RAM-resident metadata, and recovering PVB after power fails takes a significant proportion of the overall recovery time. To solve this problem, we propose a page-associative FTL called GeckoFTL, whose central innovation is replacing PVB with a new data structure called Logarithmic...

  2. Temporal scalability comparison of the H.264/SVC and distributed video codec

    Huang, Xin; Ukhanova, Ann; Belyaev, Evgeny

    2009-01-01

    The problem of the multimedia scalable video streaming is a current topic of interest. There exist many methods for scalable video coding. This paper is focused on the scalable extension of H.264/AVC (H.264/SVC) and distributed video coding (DVC). The paper presents an efficiency comparison of SV...

  3. INSPIRE: Managing Metadata in a Global Digital Library for High-Energy Physics

    Martin Montull, Javier

    2011-01-01

    Four leading laboratories in the High-Energy Physics (HEP) field are collaborating to roll-out the next-generation scientific information portal: INSPIRE. The goal of this project is to replace the popular 40 year-old SPIRES database. INSPIRE already provides access to about 1 million records and includes services such as fulltext search, automatic keyword assignment, ingestion and automatic display of LaTeX, citation analysis, automatic author disambiguation, metadata harvesting, extraction of figures from fulltext and search in figure captions. In order to achieve high quality metadata both automatic processing and manual curation are needed. The different tools available in the system use modern web technologies to provide the curators of the maximum efficiency, while dealing with the MARC standard format. The project is under heavy development in order to provide new features including semantic analysis, crowdsourcing of metadata curation, user tagging, recommender systems, integration of OAIS standards a...

  4. Detection of Vandalism in Wikipedia using Metadata Features – Implementation in Simple English and Albanian sections

    Arsim Susuri

    2017-03-01

    Full Text Available In this paper, we evaluate a list of classifiers in order to use them in the detection of vandalism by focusing on metadata features. Our work is focused on two low resource data sets (Simple English and Albanian from Wikipedia. The aim of this research is to prove that this form of vandalism detection applied in one data set (language can be extended into another data set (language. Article views data sets in Wikipedia have been used rarely for the purpose of detecting vandalism. We will show the benefits of using article views data set with features from the article revisions data set with the aim of improving the detection of vandalism. The key advantage of using metadata features is that these metadata features are language independent and simple to extract because they require minimal processing. This paper shows that application of vandalism models across low resource languages is possible, and vandalism can be detected through view patterns of articles.

  5. Batch metadata assignment to archival photograph collections using facial recognition software

    Kyle Banerjee

    2013-07-01

    Full Text Available Useful metadata is essential to giving individual meaning and value within the context of a greater image collection as well as making them more discoverable. However, often little information is available about the photos themselves, so adding consistent metadata to large collections of digital and digitized photographs is a time consuming process requiring highly experienced staff. By using facial recognition software, staff can identify individuals more quickly and reliably. Knowledge of individuals in photos helps staff determine when and where photos are taken and also improves understanding of the subject matter. This article demonstrates simple techniques for using facial recognition software and command line tools to assign, modify, and read metadata for large archival photograph collections.

  6. The relevance of music information representation metadata from the perspective of expert users

    Camila Monteiro de Barros

    Full Text Available The general goal of this research was to verify which metadata elements of music information representation are relevant for its retrieval from the perspective of expert music users. Based on a bibliographical research, a comprehensive metadata set of music information representation was developed and transformed into a questionnaire for data collection, which was applied to students and professors of the Graduate Program in Music at the Federal University of Rio Grande do Sul. The results show that the most relevant information for expert music users is related to identification and authorship responsibilities. The respondents from Composition and Interpretative Practice areas agree with these results, while the respondents from Musicology/Ethnomusicology and Music Education areas also consider the metadata related to the historical context of composition relevant.

  7. System for Earth Sample Registration SESAR: Services for IGSN Registration and Sample Metadata Management

    Chan, S.; Lehnert, K. A.; Coleman, R. J.

    2011-12-01

    SESAR, the System for Earth Sample Registration, is an online registry for physical samples collected for Earth and environmental studies. SESAR generates and administers the International Geo Sample Number IGSN, a unique identifier for samples that is dramatically advancing interoperability amongst information systems for sample-based data. SESAR was developed to provide the complete range of registry services, including definition of IGSN syntax and metadata profiles, registration and validation of name spaces requested by users, tools for users to submit and manage sample metadata, validation of submitted metadata, generation and validation of the unique identifiers, archiving of sample metadata, and public or private access to the sample metadata catalog. With the development of SESAR v3, we placed particular emphasis on creating enhanced tools that make metadata submission easier and more efficient for users, and that provide superior functionality for users to manage metadata of their samples in their private workspace MySESAR. For example, SESAR v3 includes a module where users can generate custom spreadsheet templates to enter metadata for their samples, then upload these templates online for sample registration. Once the content of the template is uploaded, it is displayed online in an editable grid format. Validation rules are executed in real-time on the grid data to ensure data integrity. Other new features of SESAR v3 include the capability to transfer ownership of samples to other SESAR users, the ability to upload and store images and other files in a sample metadata profile, and the tracking of changes to sample metadata profiles. In the next version of SESAR (v3.5), we will further improve the discovery, sharing, registration of samples. For example, we are developing a more comprehensive suite of web services that will allow discovery and registration access to SESAR from external systems. Both batch and individual registrations will be possible

  8. Parallel file system with metadata distributed across partitioned key-value store c

    Bent, John M.; Faibish, Sorin; Grider, Gary; Torres, Aaron

    2017-09-19

    Improved techniques are provided for storing metadata associated with a plurality of sub-files associated with a single shared file in a parallel file system. The shared file is generated by a plurality of applications executing on a plurality of compute nodes. A compute node implements a Parallel Log Structured File System (PLFS) library to store at least one portion of the shared file generated by an application executing on the compute node and metadata for the at least one portion of the shared file on one or more object storage servers. The compute node is also configured to implement a partitioned data store for storing a partition of the metadata for the shared file, wherein the partitioned data store communicates with partitioned data stores on other compute nodes using a message passing interface. The partitioned data store can be implemented, for example, using Multidimensional Data Hashing Indexing Middleware (MDHIM).

  9. NCPP's Use of Standard Metadata to Promote Open and Transparent Climate Modeling

    Treshansky, A.; Barsugli, J. J.; Guentchev, G.; Rood, R. B.; DeLuca, C.

    2012-12-01

    The National Climate Predictions and Projections (NCPP) Platform is developing comprehensive regional and local information about the evolving climate to inform decision making and adaptation planning. This includes both creating and providing tools to create metadata about the models and processes used to create its derived data products. NCPP is using the Common Information Model (CIM), an ontology developed by a broad set of international partners in climate research, as its metadata language. This use of a standard ensures interoperability within the climate community as well as permitting access to the ecosystem of tools and services emerging alongside the CIM. The CIM itself is divided into a general-purpose (UML & XML) schema which structures metadata documents, and a project or community-specific (XML) Controlled Vocabulary (CV) which constraints the content of metadata documents. NCPP has already modified the CIM Schema to accommodate downscaling models, simulations, and experiments. NCPP is currently developing a CV for use by the downscaling community. Incorporating downscaling into the CIM will lead to several benefits: easy access to the existing CIM Documents describing CMIP5 models and simulations that are being downscaled, access to software tools that have been developed in order to search, manipulate, and visualize CIM metadata, and coordination with national and international efforts such as ES-DOC that are working to make climate model descriptions and datasets interoperable. Providing detailed metadata descriptions which include the full provenance of derived data products will contribute to making that data (and, the models and processes which generated that data) more open and transparent to the user community.

  10. New Tools to Document and Manage Data/Metadata: Example NGEE Arctic and UrbIS

    Crow, M. C.; Devarakonda, R.; Hook, L.; Killeffer, T.; Krassovski, M.; Boden, T.; King, A. W.; Wullschleger, S. D.

    2016-12-01

    Tools used for documenting, archiving, cataloging, and searching data are critical pieces of informatics. This discussion describes tools being used in two different projects at Oak Ridge National Laboratory (ORNL), but at different stages of the data lifecycle. The Metadata Entry and Data Search Tool is being used for the documentation, archival, and data discovery stages for the Next Generation Ecosystem Experiment - Arctic (NGEE Arctic) project while the Urban Information Systems (UrbIS) Data Catalog is being used to support indexing, cataloging, and searching. The NGEE Arctic Online Metadata Entry Tool [1] provides a method by which researchers can upload their data and provide original metadata with each upload. The tool is built upon a Java SPRING framework to parse user input into, and from, XML output. Many aspects of the tool require use of a relational database including encrypted user-login, auto-fill functionality for predefined sites and plots, and file reference storage and sorting. The UrbIS Data Catalog is a data discovery tool supported by the Mercury cataloging framework [2] which aims to compile urban environmental data from around the world into one location, and be searchable via a user-friendly interface. Each data record conveniently displays its title, source, and date range, and features: (1) a button for a quick view of the metadata, (2) a direct link to the data and, for some data sets, (3) a button for visualizing the data. The search box incorporates autocomplete capabilities for search terms and sorted keyword filters are available on the side of the page, including a map for searching by area. References: [1] Devarakonda, Ranjeet, et al. "Use of a metadata documentation and search tool for large data volumes: The NGEE arctic example." Big Data (Big Data), 2015 IEEE International Conference on. IEEE, 2015. [2] Devarakonda, R., Palanisamy, G., Wilson, B. E., & Green, J. M. (2010). Mercury: reusable metadata management, data discovery

  11. Federated provenance of oceanographic research cruises: from metadata to data

    Thomas, Rob; Leadbetter, Adam; Shepherd, Adam

    2016-04-01

    The World Wide Web Consortium's Provenance Data Model and associated Semantic Web ontology (PROV-O) have created much interest in the Earth and Space Science Informatics community (Ma et al., 2014). Indeed, PROV-O has recently been posited as an upper ontology for the alignment of various data models (Cox, 2015). Similarly, PROV-O has been used as the building blocks of a data release lifecycle ontology (Leadbetter & Buck, 2015). In this presentation we show that the alignment between different local data descriptions of an oceanographic research cruise can be achieved through alignment with PROV-O and that descriptions of the funding bodies, organisations and researchers involved in a cruise and its associated data release lifecycle can be modelled within a PROV-O based environment. We show that, at a first-order, this approach is scalable by presenting results from three endpoints (the Biological and Chemical Oceanography Data Management Office at Woods Hole Oceanographic Institution, USA; the British Oceanographic Data Centre at the National Oceanography Centre, UK; and the Marine Institute, Ireland). Current advances in ontology engineering, provide pathways to resolving reasoning issues from varying perspectives on implementing PROV-O. This includes the use of the Information Object design pattern where such edge cases as research cruise scheduling efforts are considered. PROV-O describes only things which have happened, but the Information Object design pattern allows for the description of planned research cruises through its statement that the local data description is not the the entity itself (in this case the planned research cruise) and therefore the local data description itself can be described using the PROV-O model. In particular, we present the use of the data lifecycle ontology to show the connection between research cruise activities and their associated datasets, and the publication of those data sets online with Digital Object Identifiers and

  12. User interface development and metadata considerations for the Atmospheric Radiation Measurement (ARM) archive

    Singley, P. T.; Bell, J. D.; Daugherty, P. F.; Hubbs, C. A.; Tuggle, J. G.

    1993-01-01

    This paper will discuss user interface development and the structure and use of metadata for the Atmospheric Radiation Measurement (ARM) Archive. The ARM Archive, located at Oak Ridge National Laboratory (ORNL) in Oak Ridge, Tennessee, is the data repository for the U.S. Department of Energy's (DOE's) ARM Project. After a short description of the ARM Project and the ARM Archive's role, we will consider the philosophy and goals, constraints, and prototype implementation of the user interface for the archive. We will also describe the metadata that are stored at the archive and support the user interface.

  13. Towards a best practice of modeling unit of measure and related statistical metadata

    Grossmann, Wilfried

    2011-01-01

    Data and metadata exchange between organizations requires a common language for describing structure and content of statistical data and metadata. The SDMX consortium develops content oriented guidelines (COG) recommending harmonized cross-domain concepts and terminology to increase the efficiency of (meta-) data exchange. A recent challenge is a recommended code list for the unit of measure. Based on examples from SDMX sponsor organizations this paper analyses the diversity of ""unit of measure"" as used in practice, including potential breakdowns and interdependencies of the respective meta-

  14. Deploying the ATLAS Metadata Interface (AMI) on the cloud with Jenkins

    Lambert, Fabian; The ATLAS collaboration

    2016-01-01

    The ATLAS Metadata Interface (AMI) is a mature application of more than 15 years of existence. Mainly used by the ATLAS experiment at CERN, it consists of a very generic tool ecosystem for metadata aggregation and cataloguing. AMI is used by the ATLAS production system, therefore the service must guarantee a high level of availability. We describe our monitoring system and the Jenkins-based strategy used to dynamically test and deploy cloud OpenStack nodes on demand. Moreover, we describe how to switch to a distant replica in case of downtime.

  15. Crowd-sourced BMS point matching and metadata maintenance with Babel

    Fürst, Jonathan; Chen, Kaifei; Katz, Randy H.

    2016-01-01

    Cyber-physical applications, deployed on top of Building Management Systems (BMS), promise energy saving and comfort improvement in non-residential buildings. Such applications are so far mainly deployed as research prototypes. The main roadblock to widespread adoption is the low quality of BMS...... systems. Such applications access sensors and actuators through BMS metadata in form of point labels. The naming of labels is however often inconsistent and incomplete. To tackle this problem, we introduce Babel, a crowd-sourced approach to the creation and maintenance of BMS metadata. In our system...

  16. A DDI3.2 Style for Data and Metadata Extracted from SAS

    Hoyle, Larry

    2014-01-01

    Earlier work by Wackerow and Hoyle has shown that DDI can be a useful medium for interchange of data and metadata among statistical packages. DDI 3.2 has new features which enhance this capability, such as the ability to use UserAttributePairs to represent custom attributes. The metadata from a statistical package can also be represented in DDI3.2 using several different styles – embedded in a StudyUnit, in a Resource Package, or in a set of Fragments. The DDI Documentation for a Fragment sta...

  17. Title, Description, and Subject are the Most Important Metadata Fields for Keyword Discoverability

    Laura Costello

    2016-09-01

    Full Text Available A Review of: Yang, L. (2016. Metadata effectiveness in internet discovery: An analysis of digital collection metadata elements and internet search engine keywords. College & Research Libraries, 77(1, 7-19. http://doi.org/10.5860/crl.77.1.7 Objective – To determine which metadata elements best facilitate discovery of digital collections. Design – Case study. Setting – A public research university serving over 32,000 graduate and undergraduate students in the Southwestern United States of America. Subjects – A sample of 22,559 keyword searches leading to the institution’s digital repository between August 1, 2013, and July 31, 2014. Methods – The author used Google Analytics to analyze 73,341 visits to the institution’s digital repository. He determined that 22,559 of these visits were due to keyword searches. Using Random Integer Generator, the author identified a random sample of 378 keyword searches. The author then matched the keywords with the Dublin Core and VRA Core metadata elements on the landing page in the digital repository to determine which metadata field had drawn the keyword searcher to that particular page. Many of these keywords matched to more than one metadata field, so the author also analyzed the metadata elements that generated unique keyword hits and those fields that were frequently matched together. Main Results – Title was the most matched metadata field with 279 matched keywords from searches. Description and Subject were also significant fields with 208 and 79 matches respectively. Slightly more than half of the results, 195 keywords, matched the institutional repository in one field only. Both Title and Description had significant match rates both independently and in conjunction with other elements, but Subject keywords were the sole match in only three of the sampled cases. Conclusion – The Dublin Core elements of Title, Description, and Subject were the most frequently matched fields in keyword

  18. OlyMPUS - The Ontology-based Metadata Portal for Unified Semantics

    Huffer, E.; Gleason, J. L.

    2015-12-01

    The Ontology-based Metadata Portal for Unified Semantics (OlyMPUS), funded by the NASA Earth Science Technology Office Advanced Information Systems Technology program, is an end-to-end system designed to support data consumers and data providers, enabling the latter to register their data sets and provision them with the semantically rich metadata that drives the Ontology-Driven Interactive Search Environment for Earth Sciences (ODISEES). OlyMPUS leverages the semantics and reasoning capabilities of ODISEES to provide data producers with a semi-automated interface for producing the semantically rich metadata needed to support ODISEES' data discovery and access services. It integrates the ODISEES metadata search system with multiple NASA data delivery tools to enable data consumers to create customized data sets for download to their computers, or for NASA Advanced Supercomputing (NAS) facility registered users, directly to NAS storage resources for access by applications running on NAS supercomputers. A core function of NASA's Earth Science Division is research and analysis that uses the full spectrum of data products available in NASA archives. Scientists need to perform complex analyses that identify correlations and non-obvious relationships across all types of Earth System phenomena. Comprehensive analytics are hindered, however, by the fact that many Earth science data products are disparate and hard to synthesize. Variations in how data are collected, processed, gridded, and stored, create challenges for data interoperability and synthesis, which are exacerbated by the sheer volume of available data. Robust, semantically rich metadata can support tools for data discovery and facilitate machine-to-machine transactions with services such as data subsetting, regridding, and reformatting. Such capabilities are critical to enabling the research activities integral to NASA's strategic plans. However, as metadata requirements increase and competing standards emerge

  19. ­The Geospatial Metadata Manager’s Toolbox: Three Techniques for Maintaining Records

    Bruce Godfrey

    2015-07-01

    Full Text Available Managing geospatial metadata records requires a range of techniques. At the University of Idaho Library, we have tens of thousands of records which need to be maintained as well as the addition of new records which need to be normalized and added to the collections. We show a graphical user interface (GUI tool that was developed to make simple modifications, a simple XSLT that operates on complex metadata, and a Python script with enables parallel processing to make maintenance tasks more efficient. Throughout, we compare these techniques and discuss when they may be useful.

  20. An Examination of the Adoption of Preservation Metadata in Cultural Heritage Institutions: An Exploratory Study Using Diffusion of Innovations Theory

    Alemneh, Daniel Gelaw

    2009-01-01

    Digital preservation is a significant challenge for cultural heritage institutions and other repositories of digital information resources. Recognizing the critical role of metadata in any successful digital preservation strategy, the Preservation Metadata Implementation Strategies (PREMIS) has been extremely influential on providing a "core" set…

  1. Towards an Interoperable Field Spectroscopy Metadata Standard with Extended Support for Marine Specific Applications

    Barbara A. Rasaiah

    2015-11-01

    Full Text Available This paper presents an approach to developing robust metadata standards for specific applications that serves to ensure a high level of reliability and interoperability for a spectroscopy dataset. The challenges of designing a metadata standard that meets the unique requirements of specific user communities are examined, including in situ measurement of reflectance underwater, using coral as a case in point. Metadata schema mappings from seven existing metadata standards demonstrate that they consistently fail to meet the needs of field spectroscopy scientists for general and specific applications (μ = 22%, σ = 32% conformance with the core metadata requirements and μ = 19%, σ = 18% for the special case of a benthic (e.g., coral reflectance metadataset. Issues such as field measurement methods, instrument calibration, and data representativeness for marine field spectroscopy campaigns are investigated within the context of submerged benthic measurements. The implication of semantics and syntax for a robust and flexible metadata standard are also considered. A hybrid standard that serves as a “best of breed” incorporating useful modules and parameters within the standards is proposed. This paper is Part 3 in a series of papers in this journal, examining the issues central to a metadata standard for field spectroscopy datasets. The results presented in this paper are an important step towards field spectroscopy metadata standards that address the specific needs of field spectroscopy data stakeholders while facilitating dataset documentation, quality assurance, discoverability and data exchange within large-scale information sharing platforms.

  2. OceanXtremes: Scalable Anomaly Detection in Oceanographic Time-Series

    Wilson, B. D.; Armstrong, E. M.; Chin, T. M.; Gill, K. M.; Greguska, F. R., III; Huang, T.; Jacob, J. C.; Quach, N.

    2016-12-01

    The oceanographic community must meet the challenge to rapidly identify features and anomalies in complex and voluminous observations to further science and improve decision support. Given this data-intensive reality, we are developing an anomaly detection system, called OceanXtremes, powered by an intelligent, elastic Cloud-based analytic service backend that enables execution of domain-specific, multi-scale anomaly and feature detection algorithms across the entire archive of 15 to 30-year ocean science datasets.Our parallel analytics engine is extending the NEXUS system and exploits multiple open-source technologies: Apache Cassandra as a distributed spatial "tile" cache, Apache Spark for in-memory parallel computation, and Apache Solr for spatial search and storing pre-computed tile statistics and other metadata. OceanXtremes provides these key capabilities: Parallel generation (Spark on a compute cluster) of 15 to 30-year Ocean Climatologies (e.g. sea surface temperature or SST) in hours or overnight, using simple pixel averages or customizable Gaussian-weighted "smoothing" over latitude, longitude, and time; Parallel pre-computation, tiling, and caching of anomaly fields (daily variables minus a chosen climatology) with pre-computed tile statistics; Parallel detection (over the time-series of tiles) of anomalies or phenomena by regional area-averages exceeding a specified threshold (e.g. high SST in El Nino or SST "blob" regions), or more complex, custom data mining algorithms; Shared discovery and exploration of ocean phenomena and anomalies (facet search using Solr), along with unexpected correlations between key measured variables; Scalable execution for all capabilities on a hybrid Cloud, using our on-premise OpenStack Cloud cluster or at Amazon. The key idea is that the parallel data-mining operations will be run "near" the ocean data archives (a local "network" hop) so that we can efficiently access the thousands of files making up a three decade time

  3. Event-Level Analysis of Anal Sex Roles and Sex Drug Use Among Gay and Bisexual Men in Vancouver, British Columbia, Canada.

    Rich, Ashleigh J; Lachowsky, Nathan J; Cui, Zishan; Sereda, Paul; Lal, Allan; Moore, David M; Hogg, Robert S; Roth, Eric A

    2016-08-01

    This study analyzed event-level partnership data from a computer-assisted survey of 719 gay and bisexual men (GBM) enrolled in the Momentum Health Study to delineate potential linkages between anal sex roles and the so-called "sex drugs," i.e., erectile dysfunction drugs (EDD), poppers, and crystal methamphetamine. Univariable and multivariable analyses using generalized linear mixed models with logit link function with sexual encounters (n = 2514) as the unit of analysis tested four hypotheses: (1) EDD are significantly associated with insertive anal sex roles, (2) poppers are significantly associated with receptive anal sex, (3) both poppers and EDD are significantly associated with anal sexual versatility, and (4) crystal methamphetamine is significantly associated with all anal sex roles. Data for survey respondents and their sexual partners allowed testing these hypotheses for both anal sex partners in the same encounter. Multivariable results supported the first three hypotheses. Crystal methamphetamine was significantly associated with all anal sex roles in the univariable models, but not significant in any multivariable ones. Other multivariable significant variables included attending group sex events, venue where first met, and self-described sexual orientation. Results indicate that GBM sex-drug use behavior features rational decision-making strategies linked to anal sex roles. They also suggest that more research on anal sex roles, particularly versatility, is needed, and that sexual behavior research can benefit from partnership analysis.

  4. Scientific visualization uncertainty, multifield, biomedical, and scalable visualization

    Chen, Min; Johnson, Christopher; Kaufman, Arie; Hagen, Hans

    2014-01-01

    Based on the seminar that took place in Dagstuhl, Germany in June 2011, this contributed volume studies the four important topics within the scientific visualization field: uncertainty visualization, multifield visualization, biomedical visualization and scalable visualization. • Uncertainty visualization deals with uncertain data from simulations or sampled data, uncertainty due to the mathematical processes operating on the data, and uncertainty in the visual representation, • Multifield visualization addresses the need to depict multiple data at individual locations and the combination of multiple datasets, • Biomedical is a vast field with select subtopics addressed from scanning methodologies to structural applications to biological applications, • Scalability in scientific visualization is critical as data grows and computational devices range from hand-held mobile devices to exascale computational platforms. Scientific Visualization will be useful to practitioners of scientific visualization, ...

  5. Scalable quantum memory in the ultrastrong coupling regime.

    Kyaw, T H; Felicetti, S; Romero, G; Solano, E; Kwek, L-C

    2015-03-02

    Circuit quantum electrodynamics, consisting of superconducting artificial atoms coupled to on-chip resonators, represents a prime candidate to implement the scalable quantum computing architecture because of the presence of good tunability and controllability. Furthermore, recent advances have pushed the technology towards the ultrastrong coupling regime of light-matter interaction, where the qubit-resonator coupling strength reaches a considerable fraction of the resonator frequency. Here, we propose a qubit-resonator system operating in that regime, as a quantum memory device and study the storage and retrieval of quantum information in and from the Z2 parity-protected quantum memory, within experimentally feasible schemes. We are also convinced that our proposal might pave a way to realize a scalable quantum random-access memory due to its fast storage and readout performances.

  6. Fast & scalable pattern transfer via block copolymer nanolithography

    Li, Tao; Wang, Zhongli; Schulte, Lars

    2015-01-01

    A fully scalable and efficient pattern transfer process based on block copolymer (BCP) self-assembling directly on various substrates is demonstrated. PS-rich and PDMS-rich poly(styrene-b-dimethylsiloxane) (PS-b-PDMS) copolymers are used to give monolayer sphere morphology after spin-casting of s......A fully scalable and efficient pattern transfer process based on block copolymer (BCP) self-assembling directly on various substrates is demonstrated. PS-rich and PDMS-rich poly(styrene-b-dimethylsiloxane) (PS-b-PDMS) copolymers are used to give monolayer sphere morphology after spin...... on long range lateral order, including fabrication of substrates for catalysis, solar cells, sensors, ultrafiltration membranes and templating of semiconductors or metals....

  7. Semantic Models for Scalable Search in the Internet of Things

    Dennis Pfisterer

    2013-03-01

    Full Text Available The Internet of Things is anticipated to connect billions of embedded devices equipped with sensors to perceive their surroundings. Thereby, the state of the real world will be available online and in real-time and can be combined with other data and services in the Internet to realize novel applications such as Smart Cities, Smart Grids, or Smart Healthcare. This requires an open representation of sensor data and scalable search over data from diverse sources including sensors. In this paper we show how the Semantic Web technologies RDF (an open semantic data format and SPARQL (a query language for RDF-encoded data can be used to address those challenges. In particular, we describe how prediction models can be employed for scalable sensor search, how these prediction models can be encoded as RDF, and how the models can be queried by means of SPARQL.

  8. Scalability of DL_POLY on High Performance Computing Platform

    Mabule Samuel Mabakane

    2017-12-01

    Full Text Available This paper presents a case study on the scalability of several versions of the molecular dynamics code (DL_POLY performed on South Africa‘s Centre for High Performance Computing e1350 IBM Linux cluster, Sun system and Lengau supercomputers. Within this study different problem sizes were designed and the same chosen systems were employed in order to test the performance of DL_POLY using weak and strong scalability. It was found that the speed-up results for the small systems were better than large systems on both Ethernet and Infiniband network. However, simulations of large systems in DL_POLY performed well using Infiniband network on Lengau cluster as compared to e1350 and Sun supercomputer.

  9. On the Scalability of Time-predictable Chip-Multiprocessing

    Puffitsch, Wolfgang; Schoeberl, Martin

    2012-01-01

    Real-time systems need a time-predictable execution platform to be able to determine the worst-case execution time statically. In order to be time-predictable, several advanced processor features, such as out-of-order execution and other forms of speculation, have to be avoided. However, just using...... simple processors is not an option for embedded systems with high demands on computing power. In order to provide high performance and predictability we argue to use multiprocessor systems with a time-predictable memory interface. In this paper we present the scalability of a Java chip......-multiprocessor system that is designed to be time-predictable. Adding time-predictable caches is mandatory to achieve scalability with a shared memory multi-processor system. As Java bytecode retains information about the nature of memory accesses, it is possible to implement a memory hierarchy that takes...

  10. ATLAS Grid Data Processing: system evolution and scalability

    Golubkov, D; The ATLAS collaboration; Klimentov, A; Minaenko, A; Nevski, P; Vaniachine, A; Walker, R

    2012-01-01

    The production system for Grid Data Processing handles petascale ATLAS data reprocessing and Monte Carlo activities. The production system empowered further data processing steps on the Grid performed by dozens of ATLAS physics groups with coordinated access to computing resources worldwide, including additional resources sponsored by regional facilities. The system provides knowledge management of configuration parameters for massive data processing tasks, reproducibility of results, scalable database access, orchestrated workflow and performance monitoring, dynamic workload sharing, automated fault tolerance and petascale data integrity control. The system evolves to accommodate a growing number of users and new requirements from our contacts in ATLAS main areas: Trigger, Physics, Data Preparation and Software & Computing. To assure scalability, the next generation production system architecture development is in progress. We report on scaling up the production system for a growing number of users provi...

  11. NPTool: Towards Scalability and Reliability of Business Process Management

    Braghetto, Kelly Rosa; Ferreira, João Eduardo; Pu, Calton

    Currently one important challenge in business process management is provide at the same time scalability and reliability of business process executions. This difficulty becomes more accentuated when the execution control assumes complex countless business processes. This work presents NavigationPlanTool (NPTool), a tool to control the execution of business processes. NPTool is supported by Navigation Plan Definition Language (NPDL), a language for business processes specification that uses process algebra as formal foundation. NPTool implements the NPDL language as a SQL extension. The main contribution of this paper is a description of the NPTool showing how the process algebra features combined with a relational database model can be used to provide a scalable and reliable control in the execution of business processes. The next steps of NPTool include reuse of control-flow patterns and support to data flow management.

  12. Proof of Stake Blockchain: Performance and Scalability for Groupware Communications

    Spasovski, Jason; Eklund, Peter

    2017-01-01

    A blockchain is a distributed transaction ledger, a disruptive technology that creates new possibilities for digital ecosystems. The blockchain ecosystem maintains an immutable transaction record to support many types of digital services. This paper compares the performance and scalability of a web......-based groupware communication application using both non-blockchain and blockchain technologies. Scalability is measured where message load is synthesized over two typical communication topologies. The first is 1 to n network -- a typical client-server or star-topology with a central vertex (server) receiving all...... messages from the remaining n - 1 vertices (clients). The second is a more naturally occurring scale-free network topology, where multiple communication hubs are distributed throughout the network. System performance is tested with both blockchain and non-blockchain solutions using multiple cloud computing...

  13. Continuity-Aware Scheduling Algorithm for Scalable Video Streaming

    Atinat Palawan

    2016-05-01

    Full Text Available The consumer demand for retrieving and delivering visual content through consumer electronic devices has increased rapidly in recent years. The quality of video in packet networks is susceptible to certain traffic characteristics: average bandwidth availability, loss, delay and delay variation (jitter. This paper presents a scheduling algorithm that modifies the stream of scalable video to combat jitter. The algorithm provides unequal look-ahead by safeguarding the base layer (without the need for overhead of the scalable video. The results of the experiments show that our scheduling algorithm reduces the number of frames with a violated deadline and significantly improves the continuity of the video stream without compromising the average Y Peek Signal-to-Noise Ratio (PSNR.

  14. Scalable, full-colour and controllable chromotropic plasmonic printing

    Xue, Jiancai; Zhou, Zhang-Kai; Wei, Zhiqiang; Su, Rongbin; Lai, Juan; Li, Juntao; Li, Chao; Zhang, Tengwei; Wang, Xue-Hua

    2015-01-01

    Plasmonic colour printing has drawn wide attention as a promising candidate for the next-generation colour-printing technology. However, an efficient approach to realize full colour and scalable fabrication is still lacking, which prevents plasmonic colour printing from practical applications. Here we present a scalable and full-colour plasmonic printing approach by combining conjugate twin-phase modulation with a plasmonic broadband absorber. More importantly, our approach also demonstrates controllable chromotropic capability, that is, the ability of reversible colour transformations. This chromotropic capability affords enormous potentials in building functionalized prints for anticounterfeiting, special label, and high-density data encryption storage. With such excellent performances in functional colour applications, this colour-printing approach could pave the way for plasmonic colour printing in real-world commercial utilization. PMID:26567803

  15. A Scalable Heuristic for Viral Marketing Under the Tipping Model

    2013-09-01

    Flixster is a social media website that allows users to share reviews and other information about cinema . [35] It was extracted in Dec. 2010. – FourSquare...work of Reichman were developed independently . We also note that Reichman performs no experimental evaluation of the algorithm. A Scalable Heuristic...other dif- fusion models, such as the independent cascade model [21] and evolutionary graph theory [25] as well as probabilistic variants of the

  16. A Scalable Communication Architecture for Advanced Metering Infrastructure

    Ngo Hoang , Giang; Liquori , Luigi; Nguyen Chan , Hung

    2013-01-01

    Advanced Metering Infrastructure (AMI), seen as foundation for overall grid modernization, is an integration of many technologies that provides an intelligent connection between consumers and system operators [ami 2008]. One of the biggest challenge that AMI faces is to scalable collect and manage a huge amount of data from a large number of customers. In our paper, we address this challenge by introducing a mixed peer-to-peer (P2P) and client-server communication architecture for AMI in whic...

  17. Scalable Multi-group Key Management for Advanced Metering Infrastructure

    Benmalek , Mourad; Challal , Yacine; Bouabdallah , Abdelmadjid

    2015-01-01

    International audience; Advanced Metering Infrastructure (AMI) is composed of systems and networks to incorporate changes for modernizing the electricity grid, reduce peak loads, and meet energy efficiency targets. AMI is a privileged target for security attacks with potentially great damage against infrastructures and privacy. For this reason, Key Management has been identified as one of the most challenging topics in AMI development. In this paper, we propose a new Scalable multi-group key ...

  18. Economical and scalable synthesis of 6-amino-2-cyanobenzothiazole

    Jacob R. Hauser

    2016-09-01

    Full Text Available 2-Cyanobenzothiazoles (CBTs are useful building blocks for: 1 luciferin derivatives for bioluminescent imaging; and 2 handles for bioorthogonal ligations. A particularly versatile CBT is 6-amino-2-cyanobenzothiazole (ACBT, which has an amine handle for straight-forward derivatisation. Here we present an economical and scalable synthesis of ACBT based on a cyanation catalysed by 1,4-diazabicyclo[2.2.2]octane (DABCO, and discuss its advantages for scale-up over previously reported routes.

  19. Space Situational Awareness Data Processing Scalability Utilizing Google Cloud Services

    Greenly, D.; Duncan, M.; Wysack, J.; Flores, F.

    Space Situational Awareness (SSA) is a fundamental and critical component of current space operations. The term SSA encompasses the awareness, understanding and predictability of all objects in space. As the population of orbital space objects and debris increases, the number of collision avoidance maneuvers grows and prompts the need for accurate and timely process measures. The SSA mission continually evolves to near real-time assessment and analysis demanding the need for higher processing capabilities. By conventional methods, meeting these demands requires the integration of new hardware to keep pace with the growing complexity of maneuver planning algorithms. SpaceNav has implemented a highly scalable architecture that will track satellites and debris by utilizing powerful virtual machines on the Google Cloud Platform. SpaceNav algorithms for processing CDMs outpace conventional means. A robust processing environment for tracking data, collision avoidance maneuvers and various other aspects of SSA can be created and deleted on demand. Migrating SpaceNav tools and algorithms into the Google Cloud Platform will be discussed and the trials and tribulations involved. Information will be shared on how and why certain cloud products were used as well as integration techniques that were implemented. Key items to be presented are: 1.Scientific algorithms and SpaceNav tools integrated into a scalable architecture a) Maneuver Planning b) Parallel Processing c) Monte Carlo Simulations d) Optimization Algorithms e) SW Application Development/Integration into the Google Cloud Platform 2. Compute Engine Processing a) Application Engine Automated Processing b) Performance testing and Performance Scalability c) Cloud MySQL databases and Database Scalability d) Cloud Data Storage e) Redundancy and Availability

  20. Architectural Techniques to Enable Reliable and Scalable Memory Systems

    Nair, Prashant J.

    2017-01-01

    High capacity and scalable memory systems play a vital role in enabling our desktops, smartphones, and pervasive technologies like Internet of Things (IoT). Unfortunately, memory systems are becoming increasingly prone to faults. This is because we rely on technology scaling to improve memory density, and at small feature sizes, memory cells tend to break easily. Today, memory reliability is seen as the key impediment towards using high-density devices, adopting new technologies, and even bui...

  1. Superlinearly scalable noise robustness of redundant coupled dynamical systems.

    Kohar, Vivek; Kia, Behnam; Lindner, John F; Ditto, William L

    2016-03-01

    We illustrate through theory and numerical simulations that redundant coupled dynamical systems can be extremely robust against local noise in comparison to uncoupled dynamical systems evolving in the same noisy environment. Previous studies have shown that the noise robustness of redundant coupled dynamical systems is linearly scalable and deviations due to noise can be minimized by increasing the number of coupled units. Here, we demonstrate that the noise robustness can actually be scaled superlinearly if some conditions are met and very high noise robustness can be realized with very few coupled units. We discuss these conditions and show that this superlinear scalability depends on the nonlinearity of the individual dynamical units. The phenomenon is demonstrated in discrete as well as continuous dynamical systems. This superlinear scalability not only provides us an opportunity to exploit the nonlinearity of physical systems without being bogged down by noise but may also help us in understanding the functional role of coupled redundancy found in many biological systems. Moreover, engineers can exploit superlinear noise suppression by starting a coupled system near (not necessarily at) the appropriate initial condition.

  2. Scalable force directed graph layout algorithms using fast multipole methods

    Yunis, Enas Abdulrahman

    2012-06-01

    We present an extension to ExaFMM, a Fast Multipole Method library, as a generalized approach for fast and scalable execution of the Force-Directed Graph Layout algorithm. The Force-Directed Graph Layout algorithm is a physics-based approach to graph layout that treats the vertices V as repelling charged particles with the edges E connecting them acting as springs. Traditionally, the amount of work required in applying the Force-Directed Graph Layout algorithm is O(|V|2 + |E|) using direct calculations and O(|V| log |V| + |E|) using truncation, filtering, and/or multi-level techniques. Correct application of the Fast Multipole Method allows us to maintain a lower complexity of O(|V| + |E|) while regaining most of the precision lost in other techniques. Solving layout problems for truly large graphs with millions of vertices still requires a scalable algorithm and implementation. We have been able to leverage the scalability and architectural adaptability of the ExaFMM library to create a Force-Directed Graph Layout implementation that runs efficiently on distributed multicore and multi-GPU architectures. © 2012 IEEE.

  3. The intergroup protocols: Scalable group communication for the internet

    Berket, Karlo [Univ. of California, Santa Barbara, CA (United States)

    2000-12-04

    Reliable group ordered delivery of multicast messages in a distributed system is a useful service that simplifies the programming of distributed applications. Such a service helps to maintain the consistency of replicated information and to coordinate the activities of the various processes. With the increasing popularity of the Internet, there is an increasing interest in scaling the protocols that provide this service to the environment of the Internet. The InterGroup protocol suite, described in this dissertation, provides such a service, and is intended for the environment of the Internet with scalability to large numbers of nodes and high latency links. The InterGroup protocols approach the scalability problem from various directions. They redefine the meaning of group membership, allow voluntary membership changes, add a receiver-oriented selection of delivery guarantees that permits heterogeneity of the receiver set, and provide a scalable reliability service. The InterGroup system comprises several components, executing at various sites within the system. Each component provides part of the services necessary to implement a group communication system for the wide-area. The components can be categorized as: (1) control hierarchy, (2) reliable multicast, (3) message distribution and delivery, and (4) process group membership. We have implemented a prototype of the InterGroup protocols in Java, and have tested the system performance in both local-area and wide-area networks.

  4. Scalable Video Coding with Interlayer Signal Decorrelation Techniques

    Yang Wenxian

    2007-01-01

    Full Text Available Scalability is one of the essential requirements in the compression of visual data for present-day multimedia communications and storage. The basic building block for providing the spatial scalability in the scalable video coding (SVC standard is the well-known Laplacian pyramid (LP. An LP achieves the multiscale representation of the video as a base-layer signal at lower resolution together with several enhancement-layer signals at successive higher resolutions. In this paper, we propose to improve the coding performance of the enhancement layers through efficient interlayer decorrelation techniques. We first show that, with nonbiorthogonal upsampling and downsampling filters, the base layer and the enhancement layers are correlated. We investigate two structures to reduce this correlation. The first structure updates the base-layer signal by subtracting from it the low-frequency component of the enhancement layer signal. The second structure modifies the prediction in order that the low-frequency component in the new enhancement layer is diminished. The second structure is integrated in the JSVM 4.0 codec with suitable modifications in the prediction modes. Experimental results with some standard test sequences demonstrate coding gains up to 1 dB for I pictures and up to 0.7 dB for both I and P pictures.

  5. Scalable Coverage Maintenance for Dense Wireless Sensor Networks

    Jun Lu

    2007-06-01

    Full Text Available Owing to numerous potential applications, wireless sensor networks have been attracting significant research effort recently. The critical challenge that wireless sensor networks often face is to sustain long-term operation on limited battery energy. Coverage maintenance schemes can effectively prolong network lifetime by selecting and employing a subset of sensors in the network to provide sufficient sensing coverage over a target region. We envision future wireless sensor networks composed of a vast number of miniaturized sensors in exceedingly high density. Therefore, the key issue of coverage maintenance for future sensor networks is the scalability to sensor deployment density. In this paper, we propose a novel coverage maintenance scheme, scalable coverage maintenance (SCOM, which is scalable to sensor deployment density in terms of communication overhead (i.e., number of transmitted and received beacons and computational complexity (i.e., time and space complexity. In addition, SCOM achieves high energy efficiency and load balancing over different sensors. We have validated our claims through both analysis and simulations.

  6. Scalability of Sustainable Business Models in Hybrid Organizations

    Adam Jabłoński

    2016-02-01

    Full Text Available The dynamics of change in modern business create new mechanisms for company management to determine their pursuit and the achievement of their high performance. This performance maintained over a long period of time becomes a source of ensuring business continuity by companies. An ontological being enabling the adoption of such assumptions is such a business model that has the ability to generate results in every possible market situation and, moreover, it has the features of permanent adaptability. A feature that describes the adaptability of the business model is its scalability. Being a factor ensuring more work and more efficient work with an increasing number of components, scalability can be applied to the concept of business models as the company’s ability to maintain similar or higher performance through it. Ensuring the company’s performance in the long term helps to build the so-called sustainable business model that often balances the objectives of stakeholders and shareholders, and that is created by the implemented principles of value-based management and corporate social responsibility. This perception of business paves the way for building hybrid organizations that integrate business activities with pro-social ones. The combination of an approach typical of hybrid organizations in designing and implementing sustainable business models pursuant to the scalability criterion seems interesting from the cognitive point of view. Today, hybrid organizations are great spaces for building effective and efficient mechanisms for dialogue between business and society. This requires the appropriate business model. The purpose of the paper is to present the conceptualization and operationalization of scalability of sustainable business models that determine the performance of a hybrid organization in the network environment. The paper presents the original concept of applying scalability in sustainable business models with detailed

  7. There's Trouble in Paradise: Problems with Educational Metadata Encountered during the MALTED Project.

    Monthienvichienchai, Rachada; Sasse, M. Angela; Wheeldon, Richard

    This paper investigates the usability of educational metadata schemas with respect to the case of the MALTED (Multimedia Authoring Language Teachers and Educational Developers) project at University College London (UCL). The project aims to facilitate authoring of multimedia materials for language learning by allowing teachers to share multimedia…

  8. The Value of Data and Metadata Standardization for Interoperability in Giovanni

    Smit, C.; Hegde, M.; Strub, R. F.; Bryant, K.; Li, A.; Petrenko, M.

    2017-12-01

    Giovanni (https://giovanni.gsfc.nasa.gov/giovanni/) is a data exploration and visualization tool at the NASA Goddard Earth Sciences Data Information Services Center (GES DISC). It has been around in one form or another for more than 15 years. Giovanni calculates simple statistics and produces 22 different visualizations for more than 1600 geophysical parameters from more than 90 satellite and model products. Giovanni relies on external data format standards to ensure interoperability, including the NetCDF CF Metadata Conventions. Unfortunately, these standards were insufficient to make Giovanni's internal data representation truly simple to use. Finding and working with dimensions can be convoluted with the CF Conventions. Furthermore, the CF Conventions are silent on machine-friendly descriptive metadata such as the parameter's source product and product version. In order to simplify analyzing disparate earth science data parameters in a unified way, we developed Giovanni's internal standard. First, the format standardizes parameter dimensions and variables so they can be easily found. Second, the format adds all the machine-friendly metadata Giovanni needs to present our parameters to users in a consistent and clear manner. At a glance, users can grasp all the pertinent information about parameters both during parameter selection and after visualization. This poster gives examples of how our metadata and data standards, both external and internal, have both simplified our code base and improved our users' experiences.

  9. A Metadata Model for E-Learning Coordination through Semantic Web Languages

    Elci, Atilla

    2005-01-01

    This paper reports on a study aiming to develop a metadata model for e-learning coordination based on semantic web languages. A survey of e-learning modes are done initially in order to identify content such as phases, activities, data schema, rules and relations, etc. relevant for a coordination model. In this respect, the study looks into the…

  10. A renaissance in library metadata? The importance of community collaboration in a digital world

    Sarah Bull

    2016-07-01

    Full Text Available This article summarizes a presentation given by Sarah Bull as part of the Association of Learned and Professional Society Publishers (ALPSP seminar ‘Setting the Standard’ in November 2015. Representing the library community at the wide-ranging seminar, Sarah was tasked with making the topic of library metadata an engaging and informative one for a largely publisher audience. With help from co-author Amanda Quimby, this article is an attempt to achieve the same aim! It covers the importance of library metadata and standards in the supply chain and also reflects on the role of the community in successful standards development and maintenance. Special emphasis is given to the importance of quality in e-book metadata and the need for publisher and library collaboration to improve discovery, usage and the student experience. The article details the University of Birmingham experience of e-book metadata from a workflow perspective to highlight the complex integration issues which remain between content procurement and discovery.

  11. The Semantic Mapping of Archival Metadata to the CIDOC CRM Ontology

    Bountouri, Lina; Gergatsoulis, Manolis

    2011-01-01

    In this article we analyze the main semantics of archival description, expressed through Encoded Archival Description (EAD). Our main target is to map the semantics of EAD to the CIDOC Conceptual Reference Model (CIDOC CRM) ontology as part of a wider integration architecture of cultural heritage metadata. Through this analysis, it is concluded…

  12. That obscure object of desire: multimedia metadata on the Web, part 2

    F.-M. Nack (Frank); J.R. van Ossenbruggen (Jacco); L. Hardman (Lynda)

    2003-01-01

    textabstractThis article discusses the state of the art in metadata for audio-visual media in large semantic networks, such as the Semantic Web. Our discussion is predominantly motivated by the two most widely known approaches towards machine-processable and semantic-based content description,

  13. That obscure object of desire: multimedia metadata on the Web, part 1

    F.-M. Nack (Frank); J.R. van Ossenbruggen (Jacco); L. Hardman (Lynda)

    2003-01-01

    textabstractThis article discusses the state of the art in metadata for audio-visual media in large semantic networks, such as the Semantic Web. Our discussion is predominantly motivated by the two most widely known approaches towards machine-processable and semantic-based content description,

  14. Social Web Content Enhancement in a Distance Learning Environment: Intelligent Metadata Generation for Resources

    García-Floriano, Andrés; Ferreira-Santiago, Angel; Yáñez-Márquez, Cornelio; Camacho-Nieto, Oscar; Aldape-Pérez, Mario; Villuendas-Rey, Yenny

    2017-01-01

    Social networking potentially offers improved distance learning environments by enabling the exchange of resources between learners. The existence of properly classified content results in an enhanced distance learning experience in which appropriate materials can be retrieved efficiently; however, for this to happen, metadata needs to be present.…

  15. The open research system: a web-based metadata and data repository for collaborative research

    Charles M. Schweik; Alexander Stepanov; J. Morgan Grove

    2005-01-01

    Beginning in 1999, a web-based metadata and data repository we call the "open research system" (ORS) was designed and built to assist geographically distributed scientific research teams. The purpose of this innovation was to promote the open sharing of data within and across organizational lines and across geographic distances. As the use of the system...

  16. Open Access Metadata, Catalogers, and Vendors: The Future of Cataloging Records

    Flynn, Emily Alinder

    2013-01-01

    The open access (OA) movement is working to transform scholarly communication around the world, but this philosophy can also apply to metadata and cataloging records. While some notable, large academic libraries, such as Harvard University, the University of Michigan, and the University of Cambridge, released their cataloging records under OA…

  17. Preliminary study of technical terminology for the retrieval of scientific book metadata records

    Larsen, Birger; Lioma, Christina; Frommholz, Ingo

    2012-01-01

    Books only represented by brief metadata (book records) are particularly hard to retrieve. One way of improving their retrieval is by extracting retrieval enhancing features from them. This work focusses on scientific (physics) book records. We ask if their technical terminology can be used...

  18. Placing Music Artists and Songs in Time Using Editorial Metadata and Web Mining Techniques

    Bountouridis, D.; Veltkamp, R.C.; Balen, J.M.H. van

    2013-01-01

    This paper investigates the novel task of situating music artists and songs in time, thereby adding contextual information that typically correlates with an artist’s similarities, collaborations and influences. The proposed method makes use of editorial metadata in conjunction with web mining

  19. A metadata catalog for organization and systemization of fusion simulation data

    Greenwald, M.; Fredian, T.; Schissel, D.; Stillerman, J.

    2012-01-01

    Highlights: ► We find that modeling and simulation data need better systemization. ► Workflow, data provenance and relations among data items need to be captured. ► We have begun a design for a simulation metadata catalog that meets these needs. ► The catalog design also supports creation of science notebooks for simulation. - Abstract: Careful management of data and associated metadata is a critical part of any scientific enterprise. Unfortunately, most current fusion simulation efforts lack systematic, project-wide organization of their data. This paper describes an approach to managing simulation data through creation of a comprehensive metadata catalog, currently under development. The catalog is intended to document all past and current simulation activities (including data provenance); to enable global data location and to facilitate data access, analysis and visualization through uniform provision of metadata. The catalog will capture workflow, holding entries for each simulation activity including, at least, data importing and staging, data pre-processing and input preparation, code execution, data storage, post-processing and exporting. The overall aim is that between the catalog and the main data archive, the system would hold a complete and accessible description of the data, all of its attributes and the processes used to generate the data. The catalog will describe data collections, including those representing simulation workflows as well as any other useful groupings. Finally it would be populated with user supplied comments to explain the motivation and results of any activity documented by the catalog.

  20. Scaling the walls of discovery: using semantic metadata for integrative problem solving.

    Manning, Maurice; Aggarwal, Amit; Gao, Kevin; Tucker-Kellogg, Greg

    2009-03-01

    Current data integration approaches by bioinformaticians frequently involve extracting data from a wide variety of public and private data repositories, each with a unique vocabulary and schema, via scripts. These separate data sets must then be normalized through the tedious and lengthy process of resolving naming differences and collecting information into a single view. Attempts to consolidate such diverse data using data warehouses or federated queries add significant complexity and have shown limitations in flexibility. The alternative of complete semantic integration of data requires a massive, sustained effort in mapping data types and maintaining ontologies. We focused instead on creating a data architecture that leverages semantic mapping of experimental metadata, to support the rapid prototyping of scientific discovery applications with the twin goals of reducing architectural complexity while still leveraging semantic technologies to provide flexibility, efficiency and more fully characterized data relationships. A metadata ontology was developed to describe our discovery process. A metadata repository was then created by mapping metadata from existing data sources into this ontology, generating RDF triples to describe the entities. Finally an interface to the repository was designed which provided not only search and browse capabilities but complex query templates that aggregate data from both RDF and RDBMS sources. We describe how this approach (i) allows scientists to discover and link relevant data across diverse data sources and (ii) provides a platform for development of integrative informatics applications.

  1. Automating standards based metadata creation using free and open source GIS tools

    Ellull, C.D.; Tamash, N.; Xian, F.; Stuiver, H.J.; Rickles, P.

    2013-01-01

    The importance of understanding the quality of data used in any GIS operation should not be underestimated. Metadata (data about data) traditionally provides a description of this quality information, but it is frequently deemed as complex to create and maintain. Additionally, it is generally stored

  2. On the communication of scientific data: The Full-Metadata Format

    Riede, Moritz; Schueppel, Rico; Sylvester-Hvid, Kristian O.

    2010-01-01

    In this paper, we introduce a scientific format for text-based data files, which facilitates storing and communicating tabular data sets. The so-called Full-Metadata Format builds on the widely used INI-standard and is based on four principles: readable self-documentation, flexible structure, fail...

  3. Perception of intoxication in a field study of the night-time economy: Blood alcohol concentration, patron characteristics, and event-level predictors.

    Kaestle, Christine E; Droste, Nicolas; Peacock, Amy; Bruno, Raimondo; Miller, Peter

    2018-01-01

    Determine the relationship of subjective intoxication to blood alcohol concentration (BAC) and examine whether patron and event-level characteristics modify the relationship of BAC to subjective intoxication. An in-situ systematic random sample of alcohol consumers attending night-time entertainment districts between 10pm and 3am on Friday and Saturday nights in five Australian cities completed a brief interview (n=4628). Participants reported age, sex, and pre-drinking, energy drink, tobacco, illicit stimulant and other illicit drug use that night, and their subjective intoxication and BAC were assessed. Male and female drinkers displayed equally low sensitivity to the impact of alcohol consumption when self-assessing their intoxication (BAC only explained 19% of variance). The marginal effect of BAC was not constant. At low BAC, participants were somewhat sensitive to increases in alcohol consumption, but at higher BAC levels that modest sensitivity dissipated (actual BAC had less impact on self-assessed intoxication). The slope ultimately leveled out to be non-responsive to additional alcohol intake. Staying out late, pre-drinking, and being young introduced biases resulting in higher self-assessed intoxication regardless of actual BAC. Further, both energy drinks and stimulant use modified the association between BAC and perceived intoxication, resulting in more compressed changes in self-assessment as BAC varies up or down, indicating less ability to perceive differences in BAC level. The ability of intoxicated patrons to detect further intoxication is impaired. Co-consumption of energy drinks and/or stimulant drugs is associated with impaired intoxication judgment, creating an additional challenge for the responsible service and consumption of alcohol. Copyright © 2017 Elsevier Ltd. All rights reserved.

  4. Advancements in Large-Scale Data/Metadata Management for Scientific Data.

    Guntupally, K.; Devarakonda, R.; Palanisamy, G.; Frame, M. T.

    2017-12-01

    Scientific data often comes with complex and diverse metadata which are critical for data discovery and users. The Online Metadata Editor (OME) tool, which was developed by an Oak Ridge National Laboratory team, effectively manages diverse scientific datasets across several federal data centers, such as DOE's Atmospheric Radiation Measurement (ARM) Data Center and USGS's Core Science Analytics, Synthesis, and Libraries (CSAS&L) project. This presentation will focus mainly on recent developments and future strategies for refining OME tool within these centers. The ARM OME is a standard based tool (https://www.archive.arm.gov/armome) that allows scientists to create and maintain metadata about their data products. The tool has been improved with new workflows that help metadata coordinators and submitting investigators to submit and review their data more efficiently. The ARM Data Center's newly upgraded Data Discovery Tool (http://www.archive.arm.gov/discovery) uses rich metadata generated by the OME to enable search and discovery of thousands of datasets, while also providing a citation generator and modern order-delivery techniques like Globus (using GridFTP), Dropbox and THREDDS. The Data Discovery Tool also supports incremental indexing, which allows users to find new data as and when they are added. The USGS CSAS&L search catalog employs a custom version of the OME (https://www1.usgs.gov/csas/ome), which has been upgraded with high-level Federal Geographic Data Committee (FGDC) validations and the ability to reserve and mint Digital Object Identifiers (DOIs). The USGS's Science Data Catalog (SDC) (https://data.usgs.gov/datacatalog) allows users to discover a myriad of science data holdings through a web portal. Recent major upgrades to the SDC and ARM Data Discovery Tool include improved harvesting performance and migration using new search software, such as Apache Solr 6.0 for serving up data/metadata to scientific communities. Our presentation will highlight

  5. Evolution of Web Services in EOSDIS: Search and Order Metadata Registry (ECHO)

    Mitchell, Andrew; Ramapriyan, Hampapuram; Lowe, Dawn

    2009-01-01

    During 2005 through 2008, NASA defined and implemented a major evolutionary change in it Earth Observing system Data and Information System (EOSDIS) to modernize its capabilities. This implementation was based on a vision for 2015 developed during 2005. The EOSDIS 2015 Vision emphasizes increased end-to-end data system efficiency and operability; increased data usability; improved support for end users; and decreased operations costs. One key feature of the Evolution plan was achieving higher operational maturity (ingest, reconciliation, search and order, performance, error handling) for the NASA s Earth Observing System Clearinghouse (ECHO). The ECHO system is an operational metadata registry through which the scientific community can easily discover and exchange NASA's Earth science data and services. ECHO contains metadata for 2,726 data collections comprising over 87 million individual data granules and 34 million browse images, consisting of NASA s EOSDIS Data Centers and the United States Geological Survey's Landsat Project holdings. ECHO is a middleware component based on a Service Oriented Architecture (SOA). The system is comprised of a set of infrastructure services that enable the fundamental SOA functions: publish, discover, and access Earth science resources. It also provides additional services such as user management, data access control, and order management. The ECHO system has a data registry and a services registry. The data registry enables organizations to publish EOS and other Earth-science related data holdings to a common metadata model. These holdings are described through metadata in terms of datasets (types of data) and granules (specific data items of those types). ECHO also supports browse images, which provide a visual representation of the data. The published metadata can be mapped to and from existing standards (e.g., FGDC, ISO 19115). With ECHO, users can find the metadata stored in the data registry and then access the data either

  6. CHARMe Commentary metadata for Climate Science: collecting, linking and sharing user feedback on climate datasets

    Blower, Jon; Lawrence, Bryan; Kershaw, Philip; Nagni, Maurizio

    2014-05-01

    The research process can be thought of as an iterative activity, initiated based on prior domain knowledge, as well on a number of external inputs, and producing a range of outputs including datasets, studies and peer reviewed publications. These outputs may describe the problem under study, the methodology used, the results obtained, etc. In any new publication, the author may cite or comment other papers or datasets in order to support their research hypothesis. However, as their work progresses, the researcher may draw from many other latent channels of information. These could include for example, a private conversation following a lecture or during a social dinner; an opinion expressed concerning some significant event such as an earthquake or for example a satellite failure. In addition, other sources of information of grey literature are important public such as informal papers such as the arxiv deposit, reports and studies. The climate science community is not an exception to this pattern; the CHARMe project, funded under the European FP7 framework, is developing an online system for collecting and sharing user feedback on climate datasets. This is to help users judge how suitable such climate data are for an intended application. The user feedback could be comments about assessments, citations, or provenance of the dataset, or other information such as descriptions of uncertainty or data quality. We define this as a distinct category of metadata called Commentary or C-metadata. We link C-metadata with target climate datasets using a Linked Data approach via the Open Annotation data model. In the context of Linked Data, C-metadata plays the role of a resource which, depending on its nature, may be accessed as simple text or as more structured content. The project is implementing a range of software tools to create, search or visualize C-metadata including a JavaScript plugin enabling this functionality to be integrated in situ with data provider portals

  7. Integrating Semantic Information in Metadata Descriptions for a Geoscience-wide Resource Inventory.

    Zaslavsky, I.; Richard, S. M.; Gupta, A.; Valentine, D.; Whitenack, T.; Ozyurt, I. B.; Grethe, J. S.; Schachne, A.

    2016-12-01

    Integrating semantic information into legacy metadata catalogs is a challenging issue and so far has been mostly done on a limited scale. We present experience of CINERGI (Community Inventory of Earthcube Resources for Geoscience Interoperability), an NSF Earthcube Building Block project, in creating a large cross-disciplinary catalog of geoscience information resources to enable cross-domain discovery. The project developed a pipeline for automatically augmenting resource metadata, in particular generating keywords that describe metadata documents harvested from multiple geoscience information repositories or contributed by geoscientists through various channels including surveys and domain resource inventories. The pipeline examines available metadata descriptions using text parsing, vocabulary management and semantic annotation and graph navigation services of GeoSciGraph. GeoSciGraph, in turn, relies on a large cross-domain ontology of geoscience terms, which bridges several independently developed ontologies or taxonomies including SWEET, ENVO, YAGO, GeoSciML, GCMD, SWO, and CHEBI. The ontology content enables automatic extraction of keywords reflecting science domains, equipment used, geospatial features, measured properties, methods, processes, etc. We specifically focus on issues of cross-domain geoscience ontology creation, resolving several types of semantic conflicts among component ontologies or vocabularies, and constructing and managing facets for improved data discovery and navigation. The ontology and keyword generation rules are iteratively improved as pipeline results are presented to data managers for selective manual curation via a CINERGI Annotator user interface. We present lessons learned from applying CINERGI metadata augmentation pipeline to a number of federal agency and academic data registries, in the context of several use cases that require data discovery and integration across multiple earth science data catalogs of varying quality

  8. Using a linked data approach to aid development of a metadata portal to support Marine Strategy Framework Directive (MSFD) implementation

    Wood, Chris

    2016-04-01

    Under the Marine Strategy Framework Directive (MSFD), EU Member States are mandated to achieve or maintain 'Good Environmental Status' (GES) in their marine areas by 2020, through a series of Programme of Measures (PoMs). The Celtic Seas Partnership (CSP), an EU LIFE+ project, aims to support policy makers, special-interest groups, users of the marine environment, and other interested stakeholders on MSFD implementation in the Celtic Seas geographical area. As part of this support, a metadata portal has been built to provide a signposting service to datasets that are relevant to MSFD within the Celtic Seas. To ensure that the metadata has the widest possible reach, a linked data approach was employed to construct the database. Although the metadata are stored in a traditional RDBS, the metadata are exposed as linked data via the D2RQ platform, allowing virtual RDF graphs to be generated. SPARQL queries can be executed against the end-point allowing any user to manipulate the metadata. D2RQ's mapping language, based on turtle, was used to map a wide range of relevant ontologies to the metadata (e.g. The Provenance Ontology (prov-o), Ocean Data Ontology (odo), Dublin Core Elements and Terms (dc & dcterms), Friend of a Friend (foaf), and Geospatial ontologies (geo)) allowing users to browse the metadata, either via SPARQL queries or by using D2RQ's HTML interface. The metadata were further enhanced by mapping relevant parameters to the NERC Vocabulary Server, itself built on a SPARQL endpoint. Additionally, a custom web front-end was built to enable users to browse the metadata and express queries through an intuitive graphical user interface that requires no prior knowledge of SPARQL. As well as providing means to browse the data via MSFD-related parameters (Descriptor, Criteria, and Indicator), the metadata records include the dataset's country of origin, the list of organisations involved in the management of the data, and links to any relevant INSPIRE

  9. PERANCANGAN SISTEM METADATA UNTUK DATA WAREHOUSE DENGAN STUDI KASUS REVENUE TRACKING PADA PT. TELKOM DIVRE V JAWA TIMUR

    Yudhi Purwananto

    2004-07-01

    Full Text Available Normal 0 false false false IN X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} Data warehouse merupakan media penyimpanan data dalam perusahaan yang diambil dari berbagai sistem dan dapat digunakan untuk berbagai keperluan seperti analisis dan pelaporan. Di PT Telkom Divre V Jawa Timur telah dibangun sebuah data warehouse yang disebut dengan Regional Database. Di Regional Database memerlukan sebuah komponen penting dalam data warehouse yaitu metadata. Definisi metadata secara sederhana adalah "data tentang data". Dalam penelitian ini dirancang sistem metadata dengan studi kasus Revenue Tracking sebagai komponen analisis dan pelaporan pada Regional Database. Metadata sangat perlu digunakan dalam pengelolaan dan memberikan informasi tentang data warehouse. Proses - proses di dalam data warehouse serta komponen - komponen yang berkaitan dengan data warehouse harus saling terintegrasi untuk mewujudkan karakteristik data warehouse yang subject-oriented, integrated, time-variant, dan non-volatile. Karena itu metadata juga harus memiliki kemampuan mempertukarkan informasi (exchange antar komponen dalam data warehouse tersebut. Web service digunakan sebagai mekanisme pertukaran ini. Web service menggunakan teknologi XML dan protokol HTTP dalam berkomunikasi. Dengan web service, setiap komponen

  10. VPLS: an effective technology for building scalable transparent LAN services

    Dong, Ximing; Yu, Shaohua

    2005-02-01

    Virtual Private LAN Service (VPLS) is generating considerable interest with enterprises and service providers as it offers multipoint transparent LAN service (TLS) over MPLS networks. This paper describes an effective technology - VPLS, which links virtual switch instances (VSIs) through MPLS to form an emulated Ethernet switch and build Scalable Transparent Lan Services. It first focuses on the architecture of VPLS with Ethernet bridging technique at the edge and MPLS at the core, then it tries to elucidate the data forwarding mechanism within VPLS domain, including learning and aging MAC addresses on a per LSP basis, flooding of unknown frames and replication for unknown, multicast, and broadcast frames. The loop-avoidance mechanism, known as split horizon forwarding, is also analyzed. Another important aspect of VPLS service is its basic operation, including autodiscovery and signaling, is discussed. From the perspective of efficiency and scalability the paper compares two important signaling mechanism, BGP and LDP, which are used to set up a PW between the PEs and bind the PWs to a particular VSI. With the extension of VPLS and the increase of full mesh of PWs between PE devices (n*(n-1)/2 PWs in all, a n2 complete problem), VPLS instance could have a large number of remote PE associations, resulting in an inefficient use of network bandwidth and system resources as the ingress PE has to replicate each frame and append MPLS labels for remote PE. So the latter part of this paper focuses on the scalability issue: the Hierarchical VPLS. Within the architecture of HVPLS, this paper addresses two ways to cope with a possibly large number of MAC addresses, which make VPLS operate more efficiently.

  11. Combined use of semantics and metadata to manage Research Data Life Cycle in Environmental Sciences

    Aguilar Gómez, Fernando; de Lucas, Jesús Marco; Pertinez, Esther; Palacio, Aida

    2017-04-01

    The use of metadata to contextualize datasets is quite extended in Earth System Sciences. There are some initiatives and available tools to help data managers to choose the best metadata standard that fit their use cases, like the DCC Metadata Directory (http://www.dcc.ac.uk/resources/metadata-standards). In our use case, we have been gathering physical, chemical and biological data from a water reservoir since 2010. A well metadata definition is crucial not only to contextualize our own data but also to integrate datasets from other sources like satellites or meteorological agencies. That is why we have chosen EML (Ecological Metadata Language), which integrates many different elements to define a dataset, including the project context, instrumentation and parameters definition, and the software used to process, provide quality controls and include the publication details. Those metadata elements can contribute to help both human and machines to understand and process the dataset. However, the use of metadata is not enough to fully support the data life cycle, from the Data Management Plan definition to the Publication and Re-use. To do so, we need to define not only metadata and attributes but also the relationships between them, so semantics are needed. Ontologies, being a knowledge representation, can contribute to define the elements of a research data life cycle, including DMP, datasets, software, etc. They also can define how the different elements are related between them and how they interact. The first advantage of developing an ontology of a knowledge domain is that they provide a common vocabulary hierarchy (i.e. a conceptual schema) that can be used and standardized by all the agents interested in the domain (either humans or machines). This way of using ontologies is one of the basis of the Semantic Web, where ontologies are set to play a key role in establishing a common terminology between agents. To develop an ontology we are using a graphical tool

  12. A scalable lock-free hash table with open addressing

    Nielsen, Jesper Puge; Karlsson, Sven

    2016-01-01

    and concurrent operations without any locks. In this paper, we present a new fully lock-free open addressed hash table with a simpler design than prior published work. We split hash table insertions into two atomic phases: first inserting a value ignoring other concurrent operations, then in the second phase......Concurrent data structures synchronized with locks do not scale well with the number of threads. As more scalable alternatives, concurrent data structures and algorithms based on widely available, however advanced, atomic operations have been proposed. These data structures allow for correct...

  13. Highly Scalable Trip Grouping for Large Scale Collective Transportation Systems

    Gidofalvi, Gyozo; Pedersen, Torben Bach; Risch, Tore

    2008-01-01

    Transportation-related problems, like road congestion, parking, and pollution, are increasing in most cities. In order to reduce traffic, recent work has proposed methods for vehicle sharing, for example for sharing cabs by grouping "closeby" cab requests and thus minimizing transportation cost...... and utilizing cab space. However, the methods published so far do not scale to large data volumes, which is necessary to facilitate large-scale collective transportation systems, e.g., ride-sharing systems for large cities. This paper presents highly scalable trip grouping algorithms, which generalize previous...

  14. pcircle - A Suite of Scalable Parallel File System Tools

    2015-10-01

    Most of the software related to file system are written for conventional local file system, they are serialized and can't take advantage of the benefit of a large scale parallel file system. "pcircle" software builds on top of ubiquitous MPI in cluster computing environment and "work-stealing" pattern to provide a scalable, high-performance suite of file system tools. In particular - it implemented parallel data copy and parallel data checksumming, with advanced features such as async progress report, checkpoint and restart, as well as integrity checking.

  15. Scalable video on demand adaptive Internet-based distribution

    Zink, Michael

    2013-01-01

    In recent years, the proliferation of available video content and the popularity of the Internet have encouraged service providers to develop new ways of distributing content to clients. Increasing video scaling ratios and advanced digital signal processing techniques have led to Internet Video-on-Demand applications, but these currently lack efficiency and quality. Scalable Video on Demand: Adaptive Internet-based Distribution examines how current video compression and streaming can be used to deliver high-quality applications over the Internet. In addition to analysing the problems

  16. Scalable web services for the PSIPRED Protein Analysis Workbench.

    Buchan, Daniel W A; Minneci, Federico; Nugent, Tim C O; Bryson, Kevin; Jones, David T

    2013-07-01

    Here, we present the new UCL Bioinformatics Group's PSIPRED Protein Analysis Workbench. The Workbench unites all of our previously available analysis methods into a single web-based framework. The new web portal provides a greatly streamlined user interface with a number of new features to allow users to better explore their results. We offer a number of additional services to enable computationally scalable execution of our prediction methods; these include SOAP and XML-RPC web server access and new HADOOP packages. All software and services are available via the UCL Bioinformatics Group website at http://bioinf.cs.ucl.ac.uk/.

  17. A Scalable Architecture of a Structured LDPC Decoder

    Lee, Jason Kwok-San; Lee, Benjamin; Thorpe, Jeremy; Andrews, Kenneth; Dolinar, Sam; Hamkins, Jon

    2004-01-01

    We present a scalable decoding architecture for a certain class of structured LDPC codes. The codes are designed using a small (n,r) protograph that is replicated Z times to produce a decoding graph for a (Z x n, Z x r) code. Using this architecture, we have implemented a decoder for a (4096,2048) LDPC code on a Xilinx Virtex-II 2000 FPGA, and achieved decoding speeds of 31 Mbps with 10 fixed iterations. The implemented message-passing algorithm uses an optimized 3-bit non-uniform quantizer that operates with 0.2dB implementation loss relative to a floating point decoder.

  18. Interactive segmentation: a scalable superpixel-based method

    Mathieu, Bérengère; Crouzil, Alain; Puel, Jean-Baptiste

    2017-11-01

    This paper addresses the problem of interactive multiclass segmentation of images. We propose a fast and efficient new interactive segmentation method called superpixel α fusion (SαF). From a few strokes drawn by a user over an image, this method extracts relevant semantic objects. To get a fast calculation and an accurate segmentation, SαF uses superpixel oversegmentation and support vector machine classification. We compare SαF with competing algorithms by evaluating its performances on reference benchmarks. We also suggest four new datasets to evaluate the scalability of interactive segmentation methods, using images from some thousand to several million pixels. We conclude with two applications of SαF.

  19. Robust and scalable optical one-way quantum computation

    Wang Hefeng; Yang Chuiping; Nori, Franco

    2010-01-01

    We propose an efficient approach for deterministically generating scalable cluster states with photons. This approach involves unitary transformations performed on atoms coupled to optical cavities. Its operation cost scales linearly with the number of qubits in the cluster state, and photon qubits are encoded such that single-qubit operations can be easily implemented by using linear optics. Robust optical one-way quantum computation can be performed since cluster states can be stored in atoms and then transferred to photons that can be easily operated and measured. Therefore, this proposal could help in performing robust large-scale optical one-way quantum computation.

  20. Scalable Brain Network Construction on White Matter Fibers.

    Chung, Moo K; Adluru, Nagesh; Dalton, Kim M; Alexander, Andrew L; Davidson, Richard J

    2011-02-12

    DTI offers a unique opportunity to characterize the structural connectivity of the human brain non-invasively by tracing white matter fiber tracts. Whole brain tractography studies routinely generate up to half million tracts per brain, which serves as edges in an extremely large 3D graph with up to half million edges. Currently there is no agreed-upon method for constructing the brain structural network graphs out of large number of white matter tracts. In this paper, we present a scalable iterative framework called the ε-neighbor method for building a network graph and apply it to testing abnormal connectivity in autism.