WorldWideScience

Sample records for metadata table query

  1. COMPLEX QUERY AND METADATA

    OpenAIRE

    Nakatoh, Tetsuya; Omori, Keisuke; Yamada, Yasuhiro; Hirokawa, Sachio

    2003-01-01

    We are developing a search system DAISEn which integrates multiple search engines and generates a metasearch engine automatically. The target search engines of DAISEn are not general search engines, but are search engines specialized in some area. Integration of such engines yields efficiency and quality. There are search engines of new type which accept complex query and return structured data. Integration of such search engines is much harder than that of simple search engines which accept ...

  2. Using Closure Tables to Enable Cross-Querying of Ontologies in Database-Driven Applications

    Science.gov (United States)

    Harris, Daniel R.; Henderson, Darren W.; Talbert, Jeffery C.

    2017-01-01

    We demonstrate that closure tables are an effective data structure for developing database-driven applications that query biomedical ontologies and that require cross-querying between multiple ontologies. A closure table stores all available paths within a tree, even those without a direct parent-child relationship; additionally, a node can have multiple ancestors which gives the foundation for supporting linkages between controlled ontologies. We augment the meta-data structure of the ICD9 and ICD10 ontologies included in i2b2, an open source query tool for identifying patient cohorts, to utilize a closure table. We describe our experiences in incorporating existing mappings between ontologies to enable clinical and health researchers to identify patient populations using the ontology that best matches their preference and expertise. PMID:28725879

  3. Query-Adaptive Reciprocal Hash Tables for Nearest Neighbor Search.

    Science.gov (United States)

    Liu, Xianglong; Deng, Cheng; Lang, Bo; Tao, Dacheng; Li, Xuelong

    2016-02-01

    Recent years have witnessed the success of binary hashing techniques in approximate nearest neighbor search. In practice, multiple hash tables are usually built using hashing to cover more desired results in the hit buckets of each table. However, rare work studies the unified approach to constructing multiple informative hash tables using any type of hashing algorithms. Meanwhile, for multiple table search, it also lacks of a generic query-adaptive and fine-grained ranking scheme that can alleviate the binary quantization loss suffered in the standard hashing techniques. To solve the above problems, in this paper, we first regard the table construction as a selection problem over a set of candidate hash functions. With the graph representation of the function set, we propose an efficient solution that sequentially applies normalized dominant set to finding the most informative and independent hash functions for each table. To further reduce the redundancy between tables, we explore the reciprocal hash tables in a boosting manner, where the hash function graph is updated with high weights emphasized on the misclassified neighbor pairs of previous hash tables. To refine the ranking of the retrieved buckets within a certain Hamming radius from the query, we propose a query-adaptive bitwise weighting scheme to enable fine-grained bucket ranking in each hash table, exploiting the discriminative power of its hash functions and their complement for nearest neighbor search. Moreover, we integrate such scheme into the multiple table search using a fast, yet reciprocal table lookup algorithm within the adaptive weighted Hamming radius. In this paper, both the construction method and the query-adaptive search method are general and compatible with different types of hashing algorithms using different feature spaces and/or parameter settings. Our extensive experiments on several large-scale benchmarks demonstrate that the proposed techniques can significantly outperform both

  4. Heuristic query optimization for query multiple table and multiple clausa on mobile finance application

    Science.gov (United States)

    Indrayana, I. N. E.; P, N. M. Wirasyanti D.; Sudiartha, I. KG

    2018-01-01

    Mobile application allow many users to access data from the application without being limited to space, space and time. Over time the data population of this application will increase. Data access time will cause problems if the data record has reached tens of thousands to millions of records.The objective of this research is to maintain the performance of data execution for large data records. One effort to maintain data access time performance is to apply query optimization method. The optimization used in this research is query heuristic optimization method. The built application is a mobile-based financial application using MySQL database with stored procedure therein. This application is used by more than one business entity in one database, thus enabling rapid data growth. In this stored procedure there is an optimized query using heuristic method. Query optimization is performed on a “Select” query that involves more than one table with multiple clausa. Evaluation is done by calculating the average access time using optimized and unoptimized queries. Access time calculation is also performed on the increase of population data in the database. The evaluation results shown the time of data execution with query heuristic optimization relatively faster than data execution time without using query optimization.

  5. Metadata

    CERN Document Server

    Zeng, Marcia Lei

    2016-01-01

    Metadata remains the solution for describing the explosively growing, complex world of digital information, and continues to be of paramount importance for information professionals. Providing a solid grounding in the variety and interrelationships among different metadata types, Zeng and Qin's thorough revision of their benchmark text offers a comprehensive look at the metadata schemas that exist in the world of library and information science and beyond, as well as the contexts in which they operate. Cementing its value as both an LIS text and a handy reference for professionals already in the field, this book: * Lays out the fundamentals of metadata, including principles of metadata, structures of metadata vocabularies, and metadata descriptions * Surveys metadata standards and their applications in distinct domains and for various communities of metadata practice * Examines metadata building blocks, from modelling to defining properties, and from designing application profiles to implementing value vocabu...

  6. Efficient Storage and Querying of Horizontal Tables Using a PIVOT Operation in Commercial Relational DBMSs

    Science.gov (United States)

    Shin, Sung-Hyun; Moon, Yang-Sae; Kim, Jinho; Kim, Sang-Wook

    In recent years, a horizontal table with a large number of attributes is widely used in OLAP or e-business applications to analyze multidimensional data efficiently. For efficient storing and querying of horizontal tables, recent works have tried to transform a horizontal table to a traditional vertical table. Existing works, however, have the drawback of not considering an optimized PIVOT operation provided (or to be provided) in recent commercial RDBMSs. In this paper we propose a formal approach that exploits the optimized PIVOT operation of commercial RDBMSs for storing and querying of horizontal tables. To achieve this goal, we first provide an overall framework that stores and queries a horizontal table using an equivalent vertical table. Under the proposed framework, we then formally define 1) a method that stores a horizontal table in an equivalent vertical table and 2) a PIVOT operation that converts a stored vertical table to an equivalent horizontal view. Next, we propose a novel method that transforms a user-specified query on horizontal tables to an equivalent PIVOT-included query on vertical tables. In particular, by providing transformation rules for all five elementary operations in relational algebra as theorems, we prove our method is theoretically applicable to commercial RDBMSs. Experimental results show that, compared with the earlier work, our method reduces storage space significantly and also improves average performance by several orders of magnitude. These results indicate that our method provides an excellent framework to maximize performance in handling horizontal tables by exploiting the optimized PIVOT operation in commercial RDBMSs.

  7. Solving the problem of Trans-Genomic Query with alignment tables.

    Science.gov (United States)

    Parker, Douglass Stott; Hsiao, Ruey-Lung; Xing, Yi; Resch, Alissa M; Lee, Christopher J

    2008-01-01

    The trans-genomic query (TGQ) problem--enabling the free query of biological information, even across genomes--is a central challenge facing bioinformatics. Solutions to this problem can alter the nature of the field, moving it beyond the jungle of data integration and expanding the number and scope of questions that can be answered. An alignment table is a binary relationship on locations (sequence segments). An important special case of alignment tables are hit tables ? tables of pairs of highly similar segments produced by alignment tools like BLAST. However, alignment tables also include general binary relationships, and can represent any useful connection between sequence locations. They can be curated, and provide a high-quality queryable backbone of connections between biological information. Alignment tables thus can be a natural foundation for TGQ, as they permit a central part of the TGQ problem to be reduced to purely technical problems involving tables of locations.Key challenges in implementing alignment tables include efficient representation and indexing of sequence locations. We define a location datatype that can be incorporated naturally into common off-the-shelf database systems. We also describe an implementation of alignment tables in BLASTGRES, an extension of the open-source POSTGRESQL database system that provides indexing and operators on locations required for querying alignment tables. This paper also reviews several successful large-scale applications of alignment tables for Trans-Genomic Query. Tables with millions of alignments have been used in queries about alternative splicing, an area of genomic analysis concerning the way in which a single gene can yield multiple transcripts. Comparative genomics is a large potential application area for TGQ and alignment tables.

  8. Metadata

    CERN Document Server

    Pomerantz, Jeffrey

    2015-01-01

    When "metadata" became breaking news, appearing in stories about surveillance by the National Security Agency, many members of the public encountered this once-obscure term from information science for the first time. Should people be reassured that the NSA was "only" collecting metadata about phone calls -- information about the caller, the recipient, the time, the duration, the location -- and not recordings of the conversations themselves? Or does phone call metadata reveal more than it seems? In this book, Jeffrey Pomerantz offers an accessible and concise introduction to metadata. In the era of ubiquitous computing, metadata has become infrastructural, like the electrical grid or the highway system. We interact with it or generate it every day. It is not, Pomerantz tell us, just "data about data." It is a means by which the complexity of an object is represented in a simpler form. For example, the title, the author, and the cover art are metadata about a book. When metadata does its job well, it fades i...

  9. Using Common Table Expressions to Build a Scalable Boolean Query Generator for Clinical Data Warehouses

    Science.gov (United States)

    Harris, Daniel R.; Henderson, Darren W.; Kavuluru, Ramakanth; Stromberg, Arnold J.; Johnson, Todd R.

    2015-01-01

    We present a custom, Boolean query generator utilizing common-table expressions (CTEs) that is capable of scaling with big datasets. The generator maps user-defined Boolean queries, such as those interactively created in clinical-research and general-purpose healthcare tools, into SQL. We demonstrate the effectiveness of this generator by integrating our work into the Informatics for Integrating Biology and the Bedside (i2b2) query tool and show that it is capable of scaling. Our custom generator replaces and outperforms the default query generator found within the Clinical Research Chart (CRC) cell of i2b2. In our experiments, sixteen different types of i2b2 queries were identified by varying four constraints: date, frequency, exclusion criteria, and whether selected concepts occurred in the same encounter. We generated non-trivial, random Boolean queries based on these 16 types; the corresponding SQL queries produced by both generators were compared by execution times. The CTE-based solution significantly outperformed the default query generator and provided a much more consistent response time across all query types (M=2.03, SD=6.64 vs. M=75.82, SD=238.88 seconds). Without costly hardware upgrades, we provide a scalable solution based on CTEs with very promising empirical results centered on performance gains. The evaluation methodology used for this provides a means of profiling clinical data warehouse performance. PMID:25192572

  10. Time-Dependent Networks as Models to Achieve Fast Exact Time-Table Queries

    DEFF Research Database (Denmark)

    Brodal, Gert Stølting; Jacob, Rico

    2003-01-01

    We consider efficient algorithms for exact time-table queries, i.e. algorithms that find optimal itineraries for travelers using a train system. We propose to use time-dependent networks as a model and show advantages of this approach over space-time networks as models.......We consider efficient algorithms for exact time-table queries, i.e. algorithms that find optimal itineraries for travelers using a train system. We propose to use time-dependent networks as a model and show advantages of this approach over space-time networks as models....

  11. Time-dependent Networks as Models to Achieve Fast Exact Time-table Queries

    DEFF Research Database (Denmark)

    Brodal, Gerth Stølting; Jacob, Rico

    2001-01-01

    We consider efficient algorithms for exact time-table queries, i.e. algorithms that find optimal itineraries. We propose to use time-dependent networks as a model and show advantages of this approach over space-time networks as models.......We consider efficient algorithms for exact time-table queries, i.e. algorithms that find optimal itineraries. We propose to use time-dependent networks as a model and show advantages of this approach over space-time networks as models....

  12. Gel table - RPD | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available switchLanguage; BLAST Search Image Search Home About Archive Update History Data ...d_main.zip File URL: ftp://ftp.biosciencedbc.jp/archive/rpd/LATEST/rpd_main.zip File size: 1 KB Simple searc...iption Download License Update History of This Database Site Policy | Contact Us Gel table - RPD | LSDB Archive ...

  13. Developing a registration entry and query system within the scope of harmonizing of the orthophoto metadata with the international standards

    Science.gov (United States)

    Şahin, İ.; Alkış, Z.

    2013-10-01

    about the storage, management and presentation of the huge amounts of orthophoto images to the users must be started immediately. In this study; metadata components of the produced orthophotos compatible with the international standards have been defined, a relational database has been created to keep complete and accurate metadata, and a user interface has been developed to insert the metadata into the database. Through the developed software, some extra time has been saved while creating and querying the metadata.

  14. The CMS DBS Query Language

    CERN Document Server

    Kuznetsov, Valentin; Afaq, Anzar; Sekhri, Vijay; Guo, Yuyi; Lueking, Lee

    2009-01-01

    The CMS experiment has implemented a flexible and powerful system enabling users to find data within the CMS physics data catalog. The Dataset Bookkeeping Service (DBS) comprises a database and the services used to store and access metadata related to CMS physics data. To this, we have added a generalized query system in addition to the existing web and programmatic interfaces to the DBS. This query system is based on a query language that hides the complexity of the underlying database structure by discovering the join conditions between database tables. This provides a way of querying the system that is simple and straightforward for CMS data managers and physicists to use without requiring knowledge of the database tables or keys. The DBS Query Language uses the ANTLR tool to build the input query parser and tokenizer, followed by a query builder that uses a graph representation of the DBS schema to construct the SQL query sent to underlying database. We will describe the design of the query system, provid...

  15. Table of Cluster and Organism Species Number - Gclust Server | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Gclust Server Table of Cluster and Organism Species Number Data detail Data name Table of Cluster and Organism...resentative sequence ID of cluster, its length, the number of sequences contained in the cluster, organism s...pecies, the number of sequences belonging to the cluster for each of 95 organism ...t Us Table of Cluster and Organism Species Number - Gclust Server | LSDB Archive ...

  16. cDNA library Table - KAIKOcDNA | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available switchLanguage; BLAST Search Image Search Home About Archive Update History Data List Contact us KAIKOcDNA... cDNA library Table Data detail Data name cDNA library Table DOI 10.18908/lsdba.nbd...c00951-005 Description of data contents List of Bombyx mori cDNA libraries. Data file File name: kaiko_cdna_...iption Registered library name Registered name of the partial cDNA library Library synonym Another name for cDNA... Download License Update History of This Database Site Policy | Contact Us cDNA library Table - KAIKOcDNA | LSDB Archive ...

  17. cDNA table - RPD | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available switchLanguage; BLAST Search Image Search Home About Archive Update History Data ...ile URL: ftp://ftp.biosciencedbc.jp/archive/rpd/LATEST/rpd_cdna.zip File size: 15 KB Simple search URL http:...age About This Database Database Description Download License Update History of This Database Site Policy | Contact Us cDNA table - RPD | LSDB Archive ...

  18. OPTIMIZATION OF DISTRIBUTED QUERY USED IN SYNCHRONIZING DATA BETWEEN TABLES WITH DIFFERENT STRUCTURE

    Directory of Open Access Journals (Sweden)

    Demian Horia

    2010-07-01

    Full Text Available Replication can be used to improve local database performance and to improve the availability of applications. An application can access a local database rather than a central database from another site, which minimize network traffic, locking escalation at the central database and achieve maximum performance for current insert, delete or update operations. The application can continue to function if the central database is down, or cannot be contacted due to a communication problem, power or hardware failure. This paper is focused on presenting a synchronization process between a central Microsoft SQL Server database and many remote sites databases. One possible problem in replication can appear when the two databases have different organization of tables and structures.

  19. License - Nikkaji-InChI Mapping Table | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Nikkaji-InChI Mapping Table License License to Use This Database Last updated : 2015/05/22 You may use this database...ense specifies the license terms regarding the use of this database and the requirements you must follow in using this database.... The license for this database is specified in the Creative C...ommons Attribution 2.1 Japan . If you use data from this database, please be sure attribute this database as... . With regard to this database, you are licensed to: freely access part or whole of this database, and acqu

  20. Visualization of JPEG Metadata

    Science.gov (United States)

    Malik Mohamad, Kamaruddin; Deris, Mustafa Mat

    There are a lot of information embedded in JPEG image than just graphics. Visualization of its metadata would benefit digital forensic investigator to view embedded data including corrupted image where no graphics can be displayed in order to assist in evidence collection for cases such as child pornography or steganography. There are already available tools such as metadata readers, editors and extraction tools but mostly focusing on visualizing attribute information of JPEG Exif. However, none have been done to visualize metadata by consolidating markers summary, header structure, Huffman table and quantization table in a single program. In this paper, metadata visualization is done by developing a program that able to summarize all existing markers, header structure, Huffman table and quantization table in JPEG. The result shows that visualization of metadata helps viewing the hidden information within JPEG more easily.

  1. Metadata aided run selection at ATLAS

    International Nuclear Information System (INIS)

    Buckingham, R M; Gallas, E J; Tseng, J C-L; Viegas, F; Vinek, E

    2011-01-01

    Management of the large volume of data collected by any large scale scientific experiment requires the collection of coherent metadata quantities, which can be used by reconstruction or analysis programs and/or user interfaces, to pinpoint collections of data needed for specific purposes. In the ATLAS experiment at the LHC, we have collected metadata from systems storing non-event-wise data (Conditions) into a relational database. The Conditions metadata (COMA) database tables not only contain conditions known at the time of event recording, but also allow for the addition of conditions data collected as a result of later analysis of the data (such as improved measurements of beam conditions or assessments of data quality). A new web based interface called 'runBrowser' makes these Conditions Metadata available as a Run based selection service. runBrowser, based on PHP and JavaScript, uses jQuery to present selection criteria and report results. It not only facilitates data selection by conditions attributes, but also gives the user information at each stage about the relationship between the conditions chosen and the remaining conditions criteria available. When a set of COMA selections are complete, runBrowser produces a human readable report as well as an XML file in a standardized ATLAS format. This XML can be saved for later use or refinement in a future runBrowser session, shared with physics/detector groups, or used as input to ELSSI (event level Metadata browser) or other ATLAS run or event processing services.

  2. EquiX-A Search and Query Language for XML.

    Science.gov (United States)

    Cohen, Sara; Kanza, Yaron; Kogan, Yakov; Sagiv, Yehoshua; Nutt, Werner; Serebrenik, Alexander

    2002-01-01

    Describes EquiX, a search language for XML that combines querying with searching to query the data and the meta-data content of Web pages. Topics include search engines; a data model for XML documents; search query syntax; search query semantics; an algorithm for evaluating a query on a document; and indexing EquiX queries. (LRW)

  3. Data, Metadata - Who Cares?

    Science.gov (United States)

    Baumann, Peter

    2013-04-01

    There is a traditional saying that metadata are understandable, semantic-rich, and searchable. Data, on the other hand, are big, with no accessible semantics, and just downloadable. Not only has this led to an imbalance of search support form a user perspective, but also underneath to a deep technology divide often using relational databases for metadata and bespoke archive solutions for data. Our vision is that this barrier will be overcome, and data and metadata become searchable likewise, leveraging the potential of semantic technologies in combination with scalability technologies. Ultimately, in this vision ad-hoc processing and filtering will not distinguish any longer, forming a uniformly accessible data universe. In the European EarthServer initiative, we work towards this vision by federating database-style raster query languages with metadata search and geo broker technology. We present our approach taken, how it can leverage OGC standards, the benefits envisaged, and first results.

  4. Mercury Toolset for Spatiotemporal Metadata

    Science.gov (United States)

    Devarakonda, Ranjeet; Palanisamy, Giri; Green, James; Wilson, Bruce; Rhyne, B. Timothy; Lindsley, Chris

    2010-06-01

    Mercury (http://mercury.ornl.gov) is a set of tools for federated harvesting, searching, and retrieving metadata, particularly spatiotemporal metadata. Version 3.0 of the Mercury toolset provides orders of magnitude improvements in search speed, support for additional metadata formats, integration with Google Maps for spatial queries, facetted type search, support for RSS (Really Simple Syndication) delivery of search results, and enhanced customization to meet the needs of the multiple projects that use Mercury. It provides a single portal to very quickly search for data and information contained in disparate data management systems, each of which may use different metadata formats. Mercury harvests metadata and key data from contributing project servers distributed around the world and builds a centralized index. The search interfaces then allow the users to perform a variety of fielded, spatial, and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury periodically (typically daily)harvests metadata sources through a collection of interfaces and re-indexes these metadata to provide extremely rapid search capabilities, even over collections with tens of millions of metadata records. A number of both graphical and application interfaces have been constructed within Mercury, to enable both human users and other computer programs to perform queries. Mercury was also designed to support multiple different projects, so that the particular fields that can be queried and used with search filters are easy to configure for each different project.

  5. Mercury Toolset for Spatiotemporal Metadata

    Science.gov (United States)

    Wilson, Bruce E.; Palanisamy, Giri; Devarakonda, Ranjeet; Rhyne, B. Timothy; Lindsley, Chris; Green, James

    2010-01-01

    Mercury (http://mercury.ornl.gov) is a set of tools for federated harvesting, searching, and retrieving metadata, particularly spatiotemporal metadata. Version 3.0 of the Mercury toolset provides orders of magnitude improvements in search speed, support for additional metadata formats, integration with Google Maps for spatial queries, facetted type search, support for RSS (Really Simple Syndication) delivery of search results, and enhanced customization to meet the needs of the multiple projects that use Mercury. It provides a single portal to very quickly search for data and information contained in disparate data management systems, each of which may use different metadata formats. Mercury harvests metadata and key data from contributing project servers distributed around the world and builds a centralized index. The search interfaces then allow the users to perform a variety of fielded, spatial, and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury periodically (typically daily) harvests metadata sources through a collection of interfaces and re-indexes these metadata to provide extremely rapid search capabilities, even over collections with tens of millions of metadata records. A number of both graphical and application interfaces have been constructed within Mercury, to enable both human users and other computer programs to perform queries. Mercury was also designed to support multiple different projects, so that the particular fields that can be queried and used with search filters are easy to configure for each different project.

  6. BASINS Metadata

    Science.gov (United States)

    Metadata or data about data describes the content, quality, condition, and other characteristics of data. Geospatial metadata are critical to data discovery and serves as the fuel for the Geospatial One-Stop data portal.

  7. LHCb: Optimising query execution time in LHCb Bookkeeping System using partition pruning and partition wise joins

    CERN Multimedia

    Mathe, Z

    2013-01-01

    The LHCb experiment produces a huge amount of data which has associated metadata such as run number, data taking condition (detector status when the data was taken), simulation condition, etc. The data are stored in files, replicated on the Computing Grid around the world. The LHCb Bookkeeping System provides methods for retrieving datasets based on their metadata. The metadata is stored in a hybrid database model, which is a mixture of Relational and Hierarchical database models and is based on the Oracle Relational Database Management System (RDBMS). The database access has to be reliable and fast. In order to achieve a high timing performance, the tables are partitioned and the queries are executed in parallel. When we store large amounts of data the partition pruning is essential for database performance, because it reduces the amount of data retrieved from the disk and optimises the resource utilisation. This research presented here is focusing on the extended composite partitioning strategy such as rang...

  8. Superfund Query

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Superfund Query allows users to retrieve data from the Comprehensive Environmental Response, Compensation, and Liability Information System (CERCLIS) database.

  9. A Metadata-Rich File System

    Energy Technology Data Exchange (ETDEWEB)

    Ames, S; Gokhale, M B; Maltzahn, C

    2009-01-07

    Despite continual improvements in the performance and reliability of large scale file systems, the management of file system metadata has changed little in the past decade. The mismatch between the size and complexity of large scale data stores and their ability to organize and query their metadata has led to a de facto standard in which raw data is stored in traditional file systems, while related, application-specific metadata is stored in relational databases. This separation of data and metadata requires considerable effort to maintain consistency and can result in complex, slow, and inflexible system operation. To address these problems, we have developed the Quasar File System (QFS), a metadata-rich file system in which files, metadata, and file relationships are all first class objects. In contrast to hierarchical file systems and relational databases, QFS defines a graph data model composed of files and their relationships. QFS includes Quasar, an XPATH-extended query language for searching the file system. Results from our QFS prototype show the effectiveness of this approach. Compared to the defacto standard, the QFS prototype shows superior ingest performance and comparable query performance on user metadata-intensive operations and superior performance on normal file metadata operations.

  10. Scribble query

    DEFF Research Database (Denmark)

    Nielsen, Matthias; Elmqvist, Niklas; Grønbæk, Kaj

    2016-01-01

    The wide availability of touch-enabled devices is a unique opportunity for visualization research to invent novel techniques to fluently explore, analyse, and understand complex and large-scale data. In this paper, we introduce Scribble Query, a novel interaction technique for fluid freehand scri...... visualization with Scribble Query. The studies suggest that Scribble Query has a low entry barrier facilitating easy adoption, casual and infrequent usage, and in one case, enabled live dissemination of findings by the domain expert to managers in the organization....... scribbling (casual drawing) on touch-enabled devices to support interactive querying in data visualizations. Inspired by the low-entry yet rich interaction of touch drawing applications, a Scribble Query can be created with a single touch stroke yet have the expressiveness of multiple brushes (a...

  11. Query responses

    Directory of Open Access Journals (Sweden)

    Paweł Łupkowski

    2017-05-01

    Full Text Available In this article we consider the phenomenon of answering a query with a query. Although such answers are common, no large scale, corpus-based characterization exists, with the exception of clarification requests. After briefly reviewing different theoretical approaches on this subject, we present a corpus study of query responses in the British National Corpus and develop a taxonomy for query responses. We point at a variety of response categories that have not been formalized in previous dialogue work, particularly those relevant to adversarial interaction. We show that different response categories have significantly different rates of subsequent answer provision. We provide a formal analysis of the response categories in the framework of KoS.

  12. Expression information data table (Strain List) of Drosophila GAL4 enhancer trap lines - GETDB | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us GETDB Expression information data table (Strain List) of Drosophila GAL4 enhancer trap lines... Data detail Data name Expression information data table (Strain List) of Drosophila GAL4 enhancer trap line...his Database Site Policy | Contact Us Expression information data table (Strain List) of Drosophila GAL4 enhancer trap lines - GETDB | LSDB Archive ...

  13. Clustering Table of the genome insert site of Drosophila GAL4 enhancer trap lines (Cluster List) - GETDB | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available ster List) Data detail Data name Clustering Table of the genome insert site of Drosophila GAL4 enhancer trap...se Site Policy | Contact Us Clustering Table of the genome insert site of Drosophila GAL4 enhancer trap lines (Cluster List) - GETDB | LSDB Archive ... ...stering Table of the genome insert site of Drosophila GAL4 enhancer trap lines (Clu...switchLanguage; BLAST Search Image Search Home About Archive Update History Data List Contact us GETDB Clu

  14. Web development with jQuery

    CERN Document Server

    York, Richard

    2015-01-01

    Newly revised and updated resource on jQuery's many features and advantages Web Development with jQuery offers a major update to the popular Beginning JavaScript and CSS Development with jQuery from 2009. More than half of the content is new or updated, and reflects recent innovations with regard to mobile applications, jQuery mobile, and the spectrum of associated plugins. Readers can expect thorough revisions with expanded coverage of events, CSS, AJAX, animation, and drag and drop. New chapters bring developers up to date on popular features like jQuery UI, navigation, tables, interacti

  15. SM4MQ: A Semantic Model for Multidimensional Queries

    DEFF Research Database (Denmark)

    Varga, Jovan; Dobrokhotova, Ekaterina; Romero, Oscar

    2017-01-01

    On-Line Analytical Processing (OLAP) is a data analysis approach to support decision-making. On top of that, Exploratory OLAP is a novel initiative for the convergence of OLAP and the Semantic Web (SW) that enables the use of OLAP techniques on SW data. Moreover, OLAP approaches exploit different...... metadata artifacts (e.g., queries) to assist users with the analysis. However, modeling and sharing of most of these artifacts are typically overlooked. Thus, in this paper we focus on the query metadata artifact in the Exploratory OLAP context and propose an RDF-based vocabulary for its representation......, sharing, and reuse on the SW. As OLAP is based on the underlying multidimensional (MD) data model we denote such queries as MD queries and define SM4MQ: A Semantic Model for Multidimensional Queries. Furthermore, we propose a method to automate the exploitation of queries by means of SPARQL. We apply...

  16. Big Metadata, Smart Metadata, and Metadata Capital: Toward Greater Synergy Between Data Science and Metadata

    Directory of Open Access Journals (Sweden)

    Jane Greenberg

    2017-08-01

    Full Text Available Purpose: The purpose of the paper is to provide a framework for addressing the disconnect between metadata and data science. Data science cannot progress without metadata research. This paper takes steps toward advancing the synergy between metadata and data science, and identifies pathways for developing a more cohesive metadata research agenda in data science. Design/methodology/approach: This paper identifies factors that challenge metadata research in the digital ecosystem, defines metadata and data science, and presents the concepts big metadata, smart metadata, and metadata capital as part of a metadata lingua franca connecting to data science. Findings: The “utilitarian nature” and “historical and traditional views” of metadata are identified as two intersecting factors that have inhibited metadata research. Big metadata, smart metadata, and metadata capital are presented as part of a metadata lingua franca to help frame research in the data science research space. Research limitations: There are additional, intersecting factors to consider that likely inhibit metadata research, and other significant metadata concepts to explore. Practical implications: The immediate contribution of this work is that it may elicit response, critique, revision, or, more significantly, motivate research. The work presented can encourage more researchers to consider the significance of metadata as a research worthy topic within data science and the larger digital ecosystem. Originality/value: Although metadata research has not kept pace with other data science topics, there is little attention directed to this problem. This is surprising, given that metadata is essential for data science endeavors. This examination synthesizes original and prior scholarship to provide new grounding for metadata research in data science.

  17. Cytometry metadata in XML

    Science.gov (United States)

    Leif, Robert C.; Leif, Stephanie H.

    2016-04-01

    Introduction: The International Society for Advancement of Cytometry (ISAC) has created a standard for the Minimum Information about a Flow Cytometry Experiment (MIFlowCyt 1.0). CytometryML will serve as a common metadata standard for flow and image cytometry (digital microscopy). Methods: The MIFlowCyt data-types were created, as is the rest of CytometryML, in the XML Schema Definition Language (XSD1.1). The datatypes are primarily based on the Flow Cytometry and the Digital Imaging and Communication (DICOM) standards. A small section of the code was formatted with standard HTML formatting elements (p, h1, h2, etc.). Results:1) The part of MIFlowCyt that describes the Experimental Overview including the specimen and substantial parts of several other major elements has been implemented as CytometryML XML schemas (www.cytometryml.org). 2) The feasibility of using MIFlowCyt to provide the combination of an overview, table of contents, and/or an index of a scientific paper or a report has been demonstrated. Previously, a sample electronic publication, EPUB, was created that could contain both MIFlowCyt metadata as well as the binary data. Conclusions: The use of CytometryML technology together with XHTML5 and CSS permits the metadata to be directly formatted and together with the binary data to be stored in an EPUB container. This will facilitate: formatting, data- mining, presentation, data verification, and inclusion in structured research, clinical, and regulatory documents, as well as demonstrate a publication's adherence to the MIFlowCyt standard, promote interoperability and should also result in the textual and numeric data being published using web technology without any change in composition.

  18. Design and Implementation of a Metadata-rich File System

    Energy Technology Data Exchange (ETDEWEB)

    Ames, S; Gokhale, M B; Maltzahn, C

    2010-01-19

    Despite continual improvements in the performance and reliability of large scale file systems, the management of user-defined file system metadata has changed little in the past decade. The mismatch between the size and complexity of large scale data stores and their ability to organize and query their metadata has led to a de facto standard in which raw data is stored in traditional file systems, while related, application-specific metadata is stored in relational databases. This separation of data and semantic metadata requires considerable effort to maintain consistency and can result in complex, slow, and inflexible system operation. To address these problems, we have developed the Quasar File System (QFS), a metadata-rich file system in which files, user-defined attributes, and file relationships are all first class objects. In contrast to hierarchical file systems and relational databases, QFS defines a graph data model composed of files and their relationships. QFS incorporates Quasar, an XPATH-extended query language for searching the file system. Results from our QFS prototype show the effectiveness of this approach. Compared to the de facto standard, the QFS prototype shows superior ingest performance and comparable query performance on user metadata-intensive operations and superior performance on normal file metadata operations.

  19. Table of 3D organ model IDs and organ names (PART-OF Tree) - BodyParts3D | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available switchLanguage; BLAST Search Image Search Home About Archive Update History Data List Contact us BodyParts...ata file File name: partof_parts_list_e.txt (PART-OF Tree) File URL: ftp://ftp.biosciencedbc.jp/archive/bodyparts...3d/LATEST/partof_parts_list_e.txt File size: 58 KB Simple search URL http://togodb.biosciencedbc.jp/togodb/view/bodyparts...3d_partof_parts_list_e Data acquisition method - Data analysis ...atabase Site Policy | Contact Us Table of 3D organ model IDs and organ names (PART-OF Tree) - BodyParts3D | LSDB Archive ...

  20. Table of 3D organ model IDs and organ names (IS-A Tree) - BodyParts3D | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available switchLanguage; BLAST Search Image Search Home About Archive Update History Data List Contact us BodyParts...nce between 3D organ model IDs and organ names available in IS-A Tree. Data file File name: isa_parts..._list_e.txt (IS-A Tree) File URL: ftp://ftp.biosciencedbc.jp/archive/bodyparts3d/LATEST/isa_parts..._list_e.txt File size: 126 KB Simple search URL http://togodb.biosciencedbc.jp/togodb/view/bodyparts3d_isa_parts...| Contact Us Table of 3D organ model IDs and organ names (IS-A Tree) - BodyParts3D | LSDB Archive ...

  1. An Ensemble Approach for Expanding Queries

    Science.gov (United States)

    2012-11-01

    nephritis ” in query number 145. lupus nephritis ( nephritis OR lupus lupus OR glomerulonephritis mycophenolate OR mofetil glomerulonephritis OR... lupus cyclophosphamide membranous OR lupus OR nephritis OR syndrome diffuse OR lupus OR glomerulonephritis OR syndrome sle OR...document collection (Table 1). Table 1. Stop words. High frequency words Common English stop words but treatment normal him who after over

  2. Adding query privacy to robust DHTs

    DEFF Research Database (Denmark)

    Backes, Michael; Goldberg, Ian; Kate, Aniket

    2012-01-01

    Interest in anonymous communication over distributed hash tables (DHTs) has increased in recent years. However, almost all known solutions solely aim at achieving sender or requestor anonymity in DHT queries. In many application scenarios, it is crucial that the queried key remains secret from...... intermediate peers that (help to) route the queries towards their destinations. In this paper, we satisfy this requirement by presenting an approach for providing privacy for the keys in DHT queries. We use the concept of oblivious transfer (OT) in communication over DHTs to preserve query privacy without...... compromising spam resistance. Although our OT-based approach can work over any DHT, we concentrate on robust DHTs that can tolerate Byzantine faults and resist spam. We choose the best-known robust DHT construction, and employ an efficient OT protocol well-suited for achieving our goal of obtaining query...

  3. SM4MQ: A Semantic Model for Multidimensional Queries

    DEFF Research Database (Denmark)

    Varga, Jovan; Dobrokhotova, Ekaterina; Romero, Oscar

    2017-01-01

    metadata artifacts (e.g., queries) to assist users with the analysis. However, modeling and sharing of most of these artifacts are typically overlooked. Thus, in this paper we focus on the query metadata artifact in the Exploratory OLAP context and propose an RDF-based vocabulary for its representation......On-Line Analytical Processing (OLAP) is a data analysis approach to support decision-making. On top of that, Exploratory OLAP is a novel initiative for the convergence of OLAP and the Semantic Web (SW) that enables the use of OLAP techniques on SW data. Moreover, OLAP approaches exploit different...

  4. Metadata: A user`s view

    Energy Technology Data Exchange (ETDEWEB)

    Bretherton, F.P. [Univ. of Wisconsin, Madison, WI (United States); Singley, P.T. [Oak Ridge National Lab., TN (United States)

    1994-12-31

    An analysis is presented of the uses of metadata from four aspects of database operations: (1) search, query, retrieval, (2) ingest, quality control, processing, (3) application to application transfer; (4) storage, archive. Typical degrees of database functionality ranging from simple file retrieval to interdisciplinary global query with metadatabase-user dialog and involving many distributed autonomous databases, are ranked in approximate order of increasing sophistication of the required knowledge representation. An architecture is outlined for implementing such functionality in many different disciplinary domains utilizing a variety of off the shelf database management subsystems and processor software, each specialized to a different abstract data model.

  5. Approximate dictionary queries

    DEFF Research Database (Denmark)

    Brodal, Gerth Stølting; Gasieniec, Leszek

    1996-01-01

    Given a set of n binary strings of length m each. We consider the problem of answering d-queries. Given a binary query string of length m, a d-query is to report if there exists a string in the set within Hamming distance d of . We present a data structure of size O(nm) supporting 1-queries in ti...

  6. Optimizing Temporal Queries

    DEFF Research Database (Denmark)

    Toman, David; Bowman, Ivan Thomas

    2003-01-01

    , these query languages are implemented by translating temporal queries into standard relational queries. However, the compiled queries are often quite cumbersome and expensive to execute even using state-of-the-art relational products. This paper presents an optimization technique that produces more efficient...... translated SQL queries by taking into account the properties of the encoding used for temporal attributes. For concreteness, this translation technique is presented in the context of SQL/TP; however, these techniques are also applicable to other temporal query languages....

  7. Geospatial metadata retrieval from web services

    Directory of Open Access Journals (Sweden)

    Ivanildo Barbosa

    Full Text Available Nowadays, producers of geospatial data in either raster or vector formats are able to make them available on the World Wide Web by deploying web services that enable users to access and query on those contents even without specific software for geoprocessing. Several providers around the world have deployed instances of WMS (Web Map Service, WFS (Web Feature Service and WCS (Web Coverage Service, all of them specified by the Open Geospatial Consortium (OGC. In consequence, metadata about the available contents can be retrieved to be compared with similar offline datasets from other sources. This paper presents a brief summary and describes the matching process between the specifications for OGC web services (WMS, WFS and WCS and the specifications for metadata required by the ISO 19115 - adopted as reference for several national metadata profiles, including the Brazilian one. This process focuses on retrieving metadata about the identification and data quality packages as well as indicates the directions to retrieve metadata related to other packages. Therefore, users are able to assess whether the provided contents fit to their purposes.

  8. Adding Query Privacy to Robust DHTs

    DEFF Research Database (Denmark)

    Backes, Michael; Goldberg, Ian; Kate, Aniket

    2011-01-01

    Interest in anonymous communication over distributed hash tables (DHTs) has increased in recent years. However, almost all known solutions solely aim at achieving sender or requestor anonymity in DHT queries. In many application scenarios, it is crucial that the queried key remains secret from...... intermediate peers that (help to) route the queries towards their destinations. In this paper, we satisfy this requirement by presenting an approach for providing privacy for the keys in DHT queries. We use the concept of oblivious transfer (OT) in communication over DHTs to preserve query privacy without...... compromising spam resistance. Although our OT-based approach can work over any DHT, we concentrate on communication over robust DHTs that can tolerate Byzantine faults and resist spam. We choose the best-known robust DHT construction, and employ an efficient OT protocol well-suited for achieving our goal...

  9. Finding Atmospheric Composition (AC) Metadata

    Science.gov (United States)

    Strub, Richard F..; Falke, Stefan; Fiakowski, Ed; Kempler, Steve; Lynnes, Chris; Goussev, Oleg

    2015-01-01

    The Atmospheric Composition Portal (ACP) is an aggregator and curator of information related to remotely sensed atmospheric composition data and analysis. It uses existing tools and technologies and, where needed, enhances those capabilities to provide interoperable access, tools, and contextual guidance for scientists and value-adding organizations using remotely sensed atmospheric composition data. The initial focus is on Essential Climate Variables identified by the Global Climate Observing System CH4, CO, CO2, NO2, O3, SO2 and aerosols. This poster addresses our efforts in building the ACP Data Table, an interface to help discover and understand remotely sensed data that are related to atmospheric composition science and applications. We harvested GCMD, CWIC, GEOSS metadata catalogs using machine to machine technologies - OpenSearch, Web Services. We also manually investigated the plethora of CEOS data providers portals and other catalogs where that data might be aggregated. This poster is our experience of the excellence, variety, and challenges we encountered.Conclusions:1.The significant benefits that the major catalogs provide are their machine to machine tools like OpenSearch and Web Services rather than any GUI usability improvements due to the large amount of data in their catalog.2.There is a trend at the large catalogs towards simulating small data provider portals through advanced services. 3.Populating metadata catalogs using ISO19115 is too complex for users to do in a consistent way, difficult to parse visually or with XML libraries, and too complex for Java XML binders like CASTOR.4.The ability to search for Ids first and then for data (GCMD and ECHO) is better for machine to machine operations rather than the timeouts experienced when returning the entire metadata entry at once. 5.Metadata harvest and export activities between the major catalogs has led to a significant amount of duplication. (This is currently being addressed) 6.Most (if not all

  10. USGIN ISO metadata profile

    Science.gov (United States)

    Richard, S. M.

    2011-12-01

    The USGIN project has drafted and is using a specification for use of ISO 19115/19/39 metadata, recommendations for simple metadata content, and a proposal for a URI scheme to identify resources using resolvable http URI's(see http://lab.usgin.org/usgin-profiles). The principal target use case is a catalog in which resources can be registered and described by data providers for discovery by users. We are currently using the ESRI Geoportal (Open Source), with configuration files for the USGIN profile. The metadata offered by the catalog must provide sufficient content to guide search engines to locate requested resources, to describe the resource content, provenance, and quality so users can determine if the resource will serve for intended usage, and finally to enable human users and sofware clients to obtain or access the resource. In order to achieve an operational federated catalog system, provisions in the ISO specification must be restricted and usage clarified to reduce the heterogeneity of 'standard' metadata and service implementations such that a single client can search against different catalogs, and the metadata returned by catalogs can be parsed reliably to locate required information. Usage of the complex ISO 19139 XML schema allows for a great deal of structured metadata content, but the heterogenity in approaches to content encoding has hampered development of sophisticated client software that can take advantage of the rich metadata; the lack of such clients in turn reduces motivation for metadata producers to produce content-rich metadata. If the only significant use of the detailed, structured metadata is to format into text for people to read, then the detailed information could be put in free text elements and be just as useful. In order for complex metadata encoding and content to be useful, there must be clear and unambiguous conventions on the encoding that are utilized by the community that wishes to take advantage of advanced metadata

  11. Metadata management staging system

    Energy Technology Data Exchange (ETDEWEB)

    2013-08-01

    Django application providing a user-interface for building a file and metadata management system. An evolution of our Node.js and CouchDb metadata management system. This one focuses on server functionality and uses a well-documented, rational and REST-ful API for data access.

  12. Recommending Multidimensional Queries

    Science.gov (United States)

    Giacometti, Arnaud; Marcel, Patrick; Negre, Elsa

    Interactive analysis of datacube, in which a user navigates a cube by launching a sequence of queries is often tedious since the user may have no idea of what the forthcoming query should be in his current analysis. To better support this process we propose in this paper to apply a Collaborative Work approach that leverages former explorations of the cube to recommend OLAP queries. The system that we have developed adapts Approximate String Matching, a technique popular in Information Retrieval, to match the current analysis with the former explorations and help suggesting a query to the user. Our approach has been implemented with the open source Mondrian OLAP server to recommend MDX queries and we have carried out some preliminary experiments that show its efficiency for generating effective query recommendations.

  13. Metadata management and semantics in microarray repositories.

    Science.gov (United States)

    Kocabaş, F; Can, T; Baykal, N

    2011-12-01

    The number of microarray and other high-throughput experiments on primary repositories keeps increasing as do the size and complexity of the results in response to biomedical investigations. Initiatives have been started on standardization of content, object model, exchange format and ontology. However, there are backlogs and inability to exchange data between microarray repositories, which indicate that there is a great need for a standard format and data management. We have introduced a metadata framework that includes a metadata card and semantic nets that make experimental results visible, understandable and usable. These are encoded in syntax encoding schemes and represented in RDF (Resource Description Frame-word), can be integrated with other metadata cards and semantic nets, and can be exchanged, shared and queried. We demonstrated the performance and potential benefits through a case study on a selected microarray repository. We concluded that the backlogs can be reduced and that exchange of information and asking of knowledge discovery questions can become possible with the use of this metadata framework.

  14. Harvesting NASA's Common Metadata Repository

    Science.gov (United States)

    Shum, D.; Mitchell, A. E.; Durbin, C.; Norton, J.

    2017-12-01

    As part of NASA's Earth Observing System Data and Information System (EOSDIS), the Common Metadata Repository (CMR) stores metadata for over 30,000 datasets from both NASA and international providers along with over 300M granules. This metadata enables sub-second discovery and facilitates data access. While the CMR offers a robust temporal, spatial and keyword search functionality to the general public and international community, it is sometimes more desirable for international partners to harvest the CMR metadata and merge the CMR metadata into a partner's existing metadata repository. This poster will focus on best practices to follow when harvesting CMR metadata to ensure that any changes made to the CMR can also be updated in a partner's own repository. Additionally, since each partner has distinct metadata formats they are able to consume, the best practices will also include guidance on retrieving the metadata in the desired metadata format using CMR's Unified Metadata Model translation software.

  15. Indexing for summary queries

    DEFF Research Database (Denmark)

    Yi, Ke; Wang, Lu; Wei, Zhewei

    2014-01-01

    returned by reporting queries. In this article, we design indexing techniques that allow for extracting a statistical summary of all the records in the query. The summaries we support include frequent items, quantiles, and various sketches, all of which are of central importance in massive data analysis....... Our indexes require linear space and extract a summary with the optimal or near-optimal query cost. We illustrate the efficiency and usefulness of our designs through extensive experiments and a system demonstration....

  16. Flexible Query Answering Systems

    DEFF Research Database (Denmark)

    This book constitutes the refereed proceedings of the 12th International Conference on Flexible Query Answering Systems, FQAS 2017, held in London, UK, in June 2017. The 21 full papers presented in this book together with 4 short papers were carefully reviewed and selected from 43 submissions....... The papers cover the following topics: foundations of flexible querying; recommendation and ranking; technologies for flexible representations and querying; knowledge discovery and information/data retrieval; intuitionistic sets; and generalized net model....

  17. Unemployment Insurance Query (UIQ)

    Data.gov (United States)

    Social Security Administration — The Unemployment Insurance Query (UIQ) provides State Unemployment Insurance agencies real-time online access to SSA data. This includes SSN verification and Title...

  18. Mining Building Metadata by Data Stream Comparison

    DEFF Research Database (Denmark)

    Holmegaard, Emil; Kjærgaard, Mikkel Baun

    2016-01-01

    ways to annotate sensor and actuation points. This makes it difficult to create intuitive queries for retrieving data streams from points. Another problem is the amount of insufficient or missing metadata. We introduce Metafier, a tool for extracting metadata from comparing data streams. Metafier...... are Dynamic Time Warping (DTW), Empirical Mode Decomposition (EMD), and the differential coefficient. Two of the algorithms compare the slope of the data stream in the values. EMD finds similarities based on the frequency bands among the data stream. By using several algorithms the system is robust enough...... to handle data streams with only slightly similar patterns. We have evaluated Metafier with points and data from one building located in Denmark. We have evaluated Metafier with 903 points, and the overall accuracy, with only 3 known examples, was 94.71%. Furthermore we found that using DTW for mining...

  19. Mastering jQuery

    CERN Document Server

    Libby, Alex

    2015-01-01

    If you are a developer who is already familiar with using jQuery and wants to push your skill set further, then this book is for you. The book assumes an intermediate knowledge level of jQuery, JavaScript, HTML5, and CSS.

  20. Range-clustering queries

    DEFF Research Database (Denmark)

    Abrahamsen, Mikkel; de Berg, Mark; Buchin, Kevin

    2017-01-01

    an optimal k-clustering for S P ∩ Q. We obtain the following results. • We present a general method to compute a (1 + ϵ)-approximation to a range-clustering query, where ϵ > 0 is a parameter that can be specified as part of the query. Our method applies to a large class of clustering problems, including k...

  1. Query complexity in expectation

    NARCIS (Netherlands)

    Kaniewski, J.; Lee, T.; de Wolf, R.; Halldórsson, M.M.; Iwama, K.; Kobayashi, N.; Speckmann, B.

    2015-01-01

    We study the query complexity of computing a function f:{0,1}n→R+ in expectation. This requires the algorithm on input x to output a nonnegative random variable whose expectation equals f(x), using as few queries to the input x as possible. We exactly characterize both the randomized and the quantum

  2. A Shared Infrastructure for Federated Search Across Distributed Scientific Metadata Catalogs

    Science.gov (United States)

    Reed, S. A.; Truslove, I.; Billingsley, B. W.; Grauch, A.; Harper, D.; Kovarik, J.; Lopez, L.; Liu, M.; Brandt, M.

    2013-12-01

    The vast amount of science metadata can be overwhelming and highly complex. Comprehensive analysis and sharing of metadata is difficult since institutions often publish to their own repositories. There are many disjoint standards used for publishing scientific data, making it difficult to discover and share information from different sources. Services that publish metadata catalogs often have different protocols, formats, and semantics. The research community is limited by the exclusivity of separate metadata catalogs and thus it is desirable to have federated search interfaces capable of unified search queries across multiple sources. Aggregation of metadata catalogs also enables users to critique metadata more rigorously. With these motivations in mind, the National Snow and Ice Data Center (NSIDC) and Advanced Cooperative Arctic Data and Information Service (ACADIS) implemented two search interfaces for the community. Both the NSIDC Search and ACADIS Arctic Data Explorer (ADE) use a common infrastructure which keeps maintenance costs low. The search clients are designed to make OpenSearch requests against Solr, an Open Source search platform. Solr applies indexes to specific fields of the metadata which in this instance optimizes queries containing keywords, spatial bounds and temporal ranges. NSIDC metadata is reused by both search interfaces but the ADE also brokers additional sources. Users can quickly find relevant metadata with minimal effort and ultimately lowers costs for research. This presentation will highlight the reuse of data and code between NSIDC and ACADIS, discuss challenges and milestones for each project, and will identify creation and use of Open Source libraries.

  3. Querying Workflow Logs

    Directory of Open Access Journals (Sweden)

    Yan Tang

    2018-01-01

    Full Text Available A business process or workflow is an assembly of tasks that accomplishes a business goal. Business process management is the study of the design, configuration/implementation, enactment and monitoring, analysis, and re-design of workflows. The traditional methodology for the re-design and improvement of workflows relies on the well-known sequence of extract, transform, and load (ETL, data/process warehousing, and online analytical processing (OLAP tools. In this paper, we study the ad hoc queryiny of process enactments for (data-centric business processes, bypassing the traditional methodology for more flexibility in querying. We develop an algebraic query language based on “incident patterns” with four operators inspired from Business Process Model and Notation (BPMN representation, allowing the user to formulate ad hoc queries directly over workflow logs. A formal semantics of this query language, a preliminary query evaluation algorithm, and a group of elementary properties of the operators are provided.

  4. NAIP National Metadata

    Data.gov (United States)

    Farm Service Agency, Department of Agriculture — The NAIP National Metadata Map contains USGS Quarter Quad and NAIP Seamline boundaries for every year NAIP imagery has been collected. Clicking on the map also makes...

  5. ATLAS Metadata Task Force

    Energy Technology Data Exchange (ETDEWEB)

    ATLAS Collaboration; Costanzo, D.; Cranshaw, J.; Gadomski, S.; Jezequel, S.; Klimentov, A.; Lehmann Miotto, G.; Malon, D.; Mornacchi, G.; Nemethy, P.; Pauly, T.; von der Schmitt, H.; Barberis, D.; Gianotti, F.; Hinchliffe, I.; Mapelli, L.; Quarrie, D.; Stapnes, S.

    2007-04-04

    This document provides an overview of the metadata, which are needed to characterizeATLAS event data at different levels (a complete run, data streams within a run, luminosity blocks within a run, individual events).

  6. The RBV metadata catalog

    Science.gov (United States)

    Andre, Francois; Fleury, Laurence; Gaillardet, Jerome; Nord, Guillaume

    2015-04-01

    RBV (Réseau des Bassins Versants) is a French initiative to consolidate the national efforts made by more than 15 elementary observatories funded by various research institutions (CNRS, INRA, IRD, IRSTEA, Universities) that study river and drainage basins. The RBV Metadata Catalogue aims at giving an unified vision of the work produced by every observatory to both the members of the RBV network and any external person interested by this domain of research. Another goal is to share this information with other existing metadata portals. Metadata management is heterogeneous among observatories ranging from absence to mature harvestable catalogues. Here, we would like to explain the strategy used to design a state of the art catalogue facing this situation. Main features are as follows : - Multiple input methods: Metadata records in the catalog can either be entered with the graphical user interface, harvested from an existing catalogue or imported from information system through simplified web services. - Hierarchical levels: Metadata records may describe either an observatory, one of its experimental site or a single dataset produced by one instrument. - Multilingualism: Metadata can be easily entered in several configurable languages. - Compliance to standards : the backoffice part of the catalogue is based on a CSW metadata server (Geosource) which ensures ISO19115 compatibility and the ability of being harvested (globally or partially). On going tasks focus on the use of SKOS thesaurus and SensorML description of the sensors. - Ergonomy : The user interface is built with the GWT Framework to offer a rich client application with a fully ajaxified navigation. - Source code sharing : The work has led to the development of reusable components which can be used to quickly create new metadata forms in other GWT applications You can visit the catalogue (http://portailrbv.sedoo.fr/) or contact us by email rbv@sedoo.fr.

  7. Metadata Creation, Management and Search System for your Scientific Data

    Science.gov (United States)

    Devarakonda, R.; Palanisamy, G.

    2012-12-01

    Mercury Search Systems is a set of tools for creating, searching, and retrieving of biogeochemical metadata. Mercury toolset provides orders of magnitude improvements in search speed, support for any metadata format, integration with Google Maps for spatial queries, multi-facetted type search, search suggestions, support for RSS (Really Simple Syndication) delivery of search results, and enhanced customization to meet the needs of the multiple projects that use Mercury. Mercury's metadata editor provides a easy way for creating metadata and Mercury's search interface provides a single portal to search for data and information contained in disparate data management systems, each of which may use any metadata format including FGDC, ISO-19115, Dublin-Core, Darwin-Core, DIF, ECHO, and EML. Mercury harvests metadata and key data from contributing project servers distributed around the world and builds a centralized index. The search interfaces then allow the users to perform a variety of fielded, spatial, and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury is being used more than 14 different projects across 4 federal agencies. It was originally developed for NASA, with continuing development funded by NASA, USGS, and DOE for a consortium of projects. Mercury search won the NASA's Earth Science Data Systems Software Reuse Award in 2008. References: R. Devarakonda, G. Palanisamy, B.E. Wilson, and J.M. Green, "Mercury: reusable metadata management data discovery and access system", Earth Science Informatics, vol. 3, no. 1, pp. 87-94, May 2010. R. Devarakonda, G. Palanisamy, J.M. Green, B.E. Wilson, "Data sharing and retrieval using OAI-PMH", Earth Science Informatics DOI: 10.1007/s12145-010-0073-0, (2010);

  8. Populating and harvesting metadata in a virtual observatory

    Science.gov (United States)

    Walker, Raymond; King, Todd; Joy, Steven; Bargatze, Lee; Chi, Peter; Weygand, James

    Founded in 2007 the Virtual Magnetospheric Observatory (VMO) provides one stop shopping for data and services useful in magnetospheric research. The VMO's purview includes ground based observations as well as observations from spacecraft. The data and services for using and analyzing these data are found at laboratories distributed around the world. The VMO is itself a federated data system with branches at UCLA and the Goddard Space Flight Center (GSFC). These data can be connected by using a common data model. The VMO has selected the Space Physics Archive Search and Extract (SPASE) metadata standard for this purpose. SPASE metadata are collected and stored in distributed registries that are maintained along with the data at the location of the data provider. Populating the registries and extracting the metadata requested for a given study remain major challenges. In general there is little or no money available to data providers to create the metadata and populate the registries. We have taken a two pronged approach to minimize the effort required to create the metadata and maintain the registries. First part of the approach is human. We have appointed a group of domain experts called "X-Men". X-Men are expert in both magnetospheric physics and data management. They work closely with data providers to help them prepare the metadata and populate the registries. The second part of our approach is to develop a series of tools to populate and harvest information from the registries. We have developed SPASE editors for high level metadata and adopted the NASA Planetary Data System's Rule Set approach in which the science data are used to generate detailed level SPASE metadata. Finally we have developed a unique harvesting system to retrieve metadata from distributed registries in response to user queries.

  9. A programmatic view of metadata, metadata services, and metadata flow in ATLAS

    Science.gov (United States)

    Malon, D.; Albrand, S.; Gallas, E.; Stewart, G.

    2012-12-01

    The volume and diversity of metadata in an experiment of the size and scope of ATLAS are considerable. Even the definition of metadata may seem context-dependent: data that are primary for one purpose may be metadata for another. ATLAS metadata services must integrate and federate information from inhomogeneous sources and repositories, map metadata about logical or physics constructs to deployment and production constructs, provide a means to associate metadata at one level of granularity with processing or decision-making at another, offer a coherent and integrated view to physicists, and support both human use and programmatic access. In this paper we consider ATLAS metadata, metadata services, and metadata flow principally from the illustrative perspective of how disparate metadata are made available to executing jobs and, conversely, how metadata generated by such jobs are returned. We describe how metadata are read, how metadata are cached, and how metadata generated by jobs and the tasks of which they are a part are communicated, associated with data products, and preserved. We also discuss the principles that guide decision-making about metadata storage, replication, and access.

  10. DESIGN AND PRACTICE ON METADATA SERVICE SYSTEM OF SURVEYING AND MAPPING RESULTS BASED ON GEONETWORK

    Directory of Open Access Journals (Sweden)

    Z. Zha

    2012-08-01

    Full Text Available Based on the analysis and research on the current geographic information sharing and metadata service,we design, develop and deploy a distributed metadata service system based on GeoNetwork covering more than 30 nodes in provincial units of China.. By identifying the advantages of GeoNetwork, we design a distributed metadata service system of national surveying and mapping results. It consists of 31 network nodes, a central node and a portal. Network nodes are the direct system metadata source, and are distributed arround the country. Each network node maintains a metadata service system, responsible for metadata uploading and management. The central node harvests metadata from network nodes using OGC CSW 2.0.2 standard interface. The portal shows all metadata in the central node, provides users with a variety of methods and interface for metadata search or querying. It also provides management capabilities on connecting the central node and the network nodes together. There are defects with GeoNetwork too. Accordingly, we made improvement and optimization on big-amount metadata uploading, synchronization and concurrent access. For metadata uploading and synchronization, by carefully analysis the database and index operation logs, we successfully avoid the performance bottlenecks. And with a batch operation and dynamic memory management solution, data throughput and system performance are significantly improved; For concurrent access, , through a request coding and results cache solution, query performance is greatly improved. To smoothly respond to huge concurrent requests, a web cluster solution is deployed. This paper also gives an experiment analysis and compares the system performance before and after improvement and optimization. Design and practical results have been applied in national metadata service system of surveying and mapping results. It proved that the improved GeoNetwork service architecture can effectively adaptive for

  11. The Medical Query Language

    Science.gov (United States)

    Shusman, Daniel J.; Morgan, Mary M.; Zielstorff, Rita; Barnett, G. Octo

    1983-01-01

    The Medical Query Language (MQL) is an English-like query language with which a user with little or no training in programming or computer science can formulate and satisfy inquiries on data contained in his/her Standard MUMPS database. To date, major applications of MQL have been in the areas of quality assurance, medical research, and practice administration at sites using the COmputer STored Ambulatory Record (COSTAR) database system.

  12. PropBase Query Layer: a single portal to UK subsurface physical property databases

    Science.gov (United States)

    Kingdon, Andrew; Nayembil, Martin L.; Richardson, Anne E.; Smith, A. Graham

    2013-04-01

    Until recently, the delivery of geological information for industry and public was achieved by geological mapping. Now pervasively available computers mean that 3D geological models can deliver realistic representations of the geometric location of geological units, represented as shells or volumes. The next phase of this process is to populate these with physical properties data that describe subsurface heterogeneity and its associated uncertainty. Achieving this requires capture and serving of physical, hydrological and other property information from diverse sources to populate these models. The British Geological Survey (BGS) holds large volumes of subsurface property data, derived both from their own research data collection and also other, often commercially derived data sources. This can be voxelated to incorporate this data into the models to demonstrate property variation within the subsurface geometry. All property data held by BGS has for many years been stored in relational databases to ensure their long-term continuity. However these have, by necessity, complex structures; each database contains positional reference data and model information, and also metadata such as sample identification information and attributes that define the source and processing. Whilst this is critical to assessing these analyses, it also hugely complicates the understanding of variability of the property under assessment and requires multiple queries to study related datasets making extracting physical properties from these databases difficult. Therefore the PropBase Query Layer has been created to allow simplified aggregation and extraction of all related data and its presentation of complex data in simple, mostly denormalized, tables which combine information from multiple databases into a single system. The structure from each relational database is denormalized in a generalised structure, so that each dataset can be viewed together in a common format using a simple

  13. Storing, Browsing, Querying, and Sharing Data: the THREDDS Data Repository (TDR)

    Science.gov (United States)

    Wilson, A.; Lindholm, D.; Baltzer, T.

    2005-12-01

    OPeNDAP and gridftp. The modular structure will allow substitution of software components so that both simple and complex storage media can be integrated into the repository. It will also allow integration of different varieties of supporting software. For example, if replication is desired, replica management could be handled via a simple hash table or a complex solution such as Replica Locater Service (RLS). In order to ensure that metadata is available for all the data in the repository, the TDR will also generate THREDDS metadata when necessary. Users will be able to establish levels of access control to their metadata and data. Coupled with a THREDDS Data Server, both browsing via THREDDS catalogs and querying capabilities will be supported. This presentation will describe the motivating factors, current status, and future plans of the TDR. References: IDD: http://www.unidata.ucar.edu/content/software/idd/index.html THREDDS: http://www.unidata.ucar.edu/content/projects/THREDDS/tech/server/ServerStatus.html LEAD: http://lead.ou.edu/ RLS: http://www.isi.edu/~annc/papers/chervenakRLSjournal05.pdf

  14. Indexing of ATLAS data management and analysis system metadata

    CERN Document Server

    Grigoryeva, Maria; The ATLAS collaboration

    2017-01-01

    This manuscript is devoted to the development of the system to manage metainformation of modern HENP experiments. The main purpose of the system is to provide scientists with transparent access to the actual and historical metadata related to data analysis, processing and modeling. The system design addresses the following goals : providing a flexible and fast search for metadata on various combinations of keywords, generating aggregated reports, categorized according to selected parameters, such as the studied physical process, scientific topic, physical group, etc. The article presents the architecture of the developed indexing and search system, as well as the results of performance tests. The comparison of the query execution speed within the developed system and in case of querying the original relational databases showed that the developed system provides results faster. Also the new system allows much more complex search requests, than the original storages.

  15. jQuery Mobile

    CERN Document Server

    Reid, Jon

    2011-01-01

    Native apps have distinct advantages, but the future belongs to mobile web apps that function on a broad range of smartphones and tablets. Get started with jQuery Mobile, the touch-optimized framework for creating apps that look and behave consistently across many devices. This concise book provides HTML5, CSS3, and JavaScript code examples, screen shots, and step-by-step guidance to help you build a complete working app with jQuery Mobile. If you're already familiar with the jQuery JavaScript library, you can use your existing skills to build cross-platform mobile web apps right now. This b

  16. Code query by example

    Science.gov (United States)

    Vaucouleur, Sebastien

    2011-02-01

    We introduce code query by example for customisation of evolvable software products in general and of enterprise resource planning systems (ERPs) in particular. The concept is based on an initial empirical study on practices around ERP systems. We motivate our design choices based on those empirical results, and we show how the proposed solution helps with respect to the infamous upgrade problem: the conflict between the need for customisation and the need for upgrade of ERP systems. We further show how code query by example can be used as a form of lightweight static analysis, to detect automatically potential defects in large software products. Code query by example as a form of lightweight static analysis is particularly interesting in the context of ERP systems: it is often the case that programmers working in this field are not computer science specialists but more of domain experts. Hence, they require a simple language to express custom rules.

  17. Flexible Query Answering Systems

    DEFF Research Database (Denmark)

    are organized in a general session train and a parallel special session track. The general session train covers the following topics: querying-answering systems; semantic technology; patterns and classification; personalization and recommender systems; searching and ranking; and Web and human......-computer interaction. The special track covers some some specific and, typically, newer fields, namely: environmental scanning for strategic early warning; generating linguistic descriptions of data; advances in fuzzy querying and fuzzy databases: theory and applications; fusion and ensemble techniques for on......This book constitutes the refereed proceedings of the 10th International Conference on Flexible Query Answering Systems, FQAS 2013, held in Granada, Spain, in September 2013. The 59 full papers included in this volume were carefully reviewed and selected from numerous submissions. The papers...

  18. Robust Optimization of Database Queries

    Indian Academy of Sciences (India)

    JAYANT

    2011-07-06

    Jul 6, 2011 ... join order [ ((S R) C) or ((R C) S) ? ] join techniques [ Nested-Loops or Sort-Merge or Hash ? ] ○ DBMS query optimizer identifies the optimal. ○ DBMS query optimizer identifies the optimal evaluation strategy: “query execution plan”. July 2011. Robust Query Optimization (IASc Mid-year Meeting). 6 ...

  19. A Programmatic View of Metadata, Metadata Services, and Metadata Flow in ATLAS

    CERN Multimedia

    CERN. Geneva

    2012-01-01

    The volume and diversity of metadata in an experiment of the size and scope of ATLAS is considerable. Even the definition of metadata may seem context-dependent: data that are primary for one purpose may be metadata for another. Trigger information and data from the Large Hadron Collider itself provide cases in point, but examples abound. Metadata about logical or physics constructs, such as data-taking periods and runs and luminosity blocks and events and algorithms, often need to be mapped to deployment and production constructs, such as datasets and jobs and files and software versions, and vice versa. Metadata at one level of granularity may have implications at another. ATLAS metadata services must integrate and federate information from inhomogeneous sources and repositories, map metadata about logical or physics constructs to deployment and production constructs, provide a means to associate metadata at one level of granularity with processing or decision-making at another, offer a coherent and ...

  20. A Programmatic View of Metadata, Metadata Services, and Metadata Flow in ATLAS

    CERN Document Server

    Malon, D; The ATLAS collaboration; Gallas, E; Stewart, G

    2012-01-01

    The volume and diversity of metadata in an experiment of the size and scope of ATLAS are considerable. Even the definition of metadata may seem context-dependent: data that are primary for one purpose may be metadata for another. Trigger information and data from the Large Hadron Collider itself provide cases in point, but examples abound. Metadata about logical or physics constructs, such as data-taking periods and runs and luminosity blocks and events and algorithms, often need to be mapped to deployment and production constructs, such as datasets and jobs and files and software versions, and vice versa. Metadata at one level of granularity may have implications at another. ATLAS metadata services must integrate and federate information from inhomogeneous sources and repositories, map metadata about logical or physics constructs to deployment and production constructs, provide a means to associate metadata at one level of granularity with processing or decision-making at another, offer a coherent and integr...

  1. A programmatic view of metadata, metadata services, and metadata flow in ATLAS

    CERN Document Server

    Malon, D; The ATLAS collaboration; Albrand, S; Gallas, E; Stewart, G

    2012-01-01

    The volume and diversity of metadata in an experiment of the size and scope of ATLAS are considerable. Even the definition of metadata may seem context-dependent: data that are primary for one purpose may be metadata for another. Trigger information and data from the Large Hadron Collider itself provide cases in point, but examples abound. Metadata about logical or physics constructs, such as data-taking periods and runs and luminosity blocks and events and algorithms, often need to be mapped to deployment and production constructs, such as datasets and jobs and files and software versions, and vice versa. Metadata at one level of granularity may have implications at another. ATLAS metadata services must integrate and federate information from inhomogeneous sources and repositories, map metadata about logical or physics constructs to deployment and production constructs, provide a means to associate metadata at one level of granularity with processing or decision-making at another, offer a coherent and integr...

  2. Collective spatial keyword querying

    DEFF Research Database (Denmark)

    Cao, Xin; Cong, Gao; Jensen, Christian S.

    2011-01-01

    With the proliferation of geo-positioning and geo-tagging, spatial web objects that possess both a geographical location and a textual description are gaining in prevalence, and spatial keyword queries that exploit both location and textual description are gaining in prominence. However, the quer......With the proliferation of geo-positioning and geo-tagging, spatial web objects that possess both a geographical location and a textual description are gaining in prevalence, and spatial keyword queries that exploit both location and textual description are gaining in prominence. However...

  3. Flexible Query Answering Systems

    DEFF Research Database (Denmark)

    are organized in a general session train and a parallel special session track. The general session train covers the following topics: querying-answering systems; semantic technology; patterns and classification; personalization and recommender systems; searching and ranking; and Web and human......This book constitutes the refereed proceedings of the 10th International Conference on Flexible Query Answering Systems, FQAS 2013, held in Granada, Spain, in September 2013. The 59 full papers included in this volume were carefully reviewed and selected from numerous submissions. The papers...

  4. Learning jQuery

    CERN Document Server

    Chaffer, Jonathan

    2013-01-01

    Step through each of the core concepts of the jQuery library, building an overall picture of its capabilities. Once you have thoroughly covered the basics, the book returns to each concept to cover more advanced examples and techniques.This book is for web designers who want to create interactive elements for their designs, and for developers who want to create the best user interface for their web applications. Basic JavaScript programming and knowledge of HTML and CSS is required. No knowledge of jQuery is assumed, nor is experience with any other JavaScript libraries.

  5. Medical Query Language

    OpenAIRE

    Morgan, Mary M.; Beaman, Peter D.; Shusman, Daniel J.; Hupp, Jon A.; Zielstorff, Rita D.; Barnett, G. Octo

    1981-01-01

    This paper describes the Medical Query Language (MQL), a “formal” language which enables unsophisticated users, having no background in programming or computer science, to express information retrieval and analysis questions of their data bases. MQL is designed to access any MUMPS data base. Most MQL applications to date have dealt with the COmputer STored Ambulatory Record (COSTAR) data base.

  6. Spatial Keyword Querying

    DEFF Research Database (Denmark)

    Cao, Xin; Chen, Lisi; Cong, Gao

    2012-01-01

    The web is increasingly being used by mobile users. In addition, it is increasingly becoming possible to accurately geo-position mobile users and web content. This development gives prominence to spatial web data management. Specifically, a spatial keyword query takes a user location and user...

  7. Approximating terminological queries

    NARCIS (Netherlands)

    Stuckenschmidt, Heiner; Van Harmelen, Frank

    2002-01-01

    Current proposals for languages to encode terminological knowledge in intelligent systems support logical reasoning for answering user queries about objects and classes. An application of these languages on the World Wide Web, however, is hampered by the limitations of logical reasoning in terms

  8. Learning via Query Synthesis

    KAUST Repository

    Alabdulmohsin, Ibrahim Mansour

    2017-05-07

    Active learning is a subfield of machine learning that has been successfully used in many applications. One of the main branches of active learning is query synthe- sis, where the learning agent constructs artificial queries from scratch in order to reveal sensitive information about the underlying decision boundary. It has found applications in areas, such as adversarial reverse engineering, automated science, and computational chemistry. Nevertheless, the existing literature on membership query synthesis has, generally, focused on finite concept classes or toy problems, with a limited extension to real-world applications. In this thesis, I develop two spectral algorithms for learning halfspaces via query synthesis. The first algorithm is a maximum-determinant convex optimization method while the second algorithm is a Markovian method that relies on Khachiyan’s classical update formulas for solving linear programs. The general theme of these methods is to construct an ellipsoidal approximation of the version space and to synthesize queries, afterward, via spectral decomposition. Moreover, I also describe how these algorithms can be extended to other settings as well, such as pool-based active learning. Having demonstrated that halfspaces can be learned quite efficiently via query synthesis, the second part of this thesis proposes strategies for mitigating the risk of reverse engineering in adversarial environments. One approach that can be used to render query synthesis algorithms ineffective is to implement a randomized response. In this thesis, I propose a semidefinite program (SDP) for learning a distribution of classifiers, subject to the constraint that any individual classifier picked at random from this distributions provides reliable predictions with a high probability. This algorithm is, then, justified both theoretically and empirically. A second approach is to use a non-parametric classification method, such as similarity-based classification. In this

  9. QUERY SUPPORT FOR GMZ

    Directory of Open Access Journals (Sweden)

    A. Khandelwal

    2017-07-01

    Full Text Available Generic text-based compression models are simple and fast but there are two issues that needs to be addressed. They cannot leverage the structure that exists in data to achieve better compression and there is an unnecessary decompression step before the user can actually use the data. To address these issues, we came up with GMZ, a lossless compression model aimed at achieving high compression ratios. The decision to design GMZ (Khandelwal and Rajan, 2017 exclusively for GML's Simple Features Profile (SFP seems fair because of the high use of SFP in WFS and that it facilitates high optimisation of the compression model. This is an extension of our work on GMZ. In a typical server-client model such as Web Feature Service, the server is the primary creator and provider of GML, and therefore, requires compression and query capabilities. On the other hand, the client is the primary consumer of GML, and therefore, requires decompression and visualisation capabilities. In the first part of our work, we demonstrated compression using a python script that can be plugged in a server architecture, and decompression and visualisation in a web browser using a Firefox addon. The focus of this work is to develop the already existing tools to provide query capability to server. Our model provides the ability to decompress individual features in isolation, which is an essential requirement for realising query in compressed state. We con - struct an R-Tree index for spatial data and a custom index for non-spatial data and store these in a separate index file to prevent alter - ing the compression model. This facilitates independent use of compressed GMZ file where index can be constructed when required. The focus of this work is the bounding-box or range query commonly used in webGIS with provision for other spatial and non-spatial queries. The decrement in compression ratios due to the new index file is in the range of 1–3 percent which is trivial considering

  10. KoralQuery -- A General Corpus Query Protocol

    DEFF Research Database (Denmark)

    Bingel, Joachim; Diewald, Nils

    2015-01-01

    The task-oriented and format-driven development of corpus query systems has led to the creation of numerous corpus query languages (QLs) that vary strongly in expressiveness and syntax. This is a severe impediment for the interoperability of corpus analysis systems, which lack a common protocol....... In this paper, we present KoralQuery, a JSON-LD based general corpus query protocol, aiming to be independent of particular QLs, tasks and corpus formats. In addition to describing the system of types and operations that KoralQuery is built on, we exemplify the representation of corpus queries in the serialized...

  11. Google BigQuery analytics

    CERN Document Server

    Tigani, Jordan

    2014-01-01

    How to effectively use BigQuery, avoid common mistakes, and execute sophisticated queries against large datasets Google BigQuery Analytics is the perfect guide for business and data analysts who want the latest tips on running complex queries and writing code to communicate with the BigQuery API. The book uses real-world examples to demonstrate current best practices and techniques, and also explains and demonstrates streaming ingestion, transformation via Hadoop in Google Compute engine, AppEngine datastore integration, and using GViz with Tableau to generate charts of query results. In addit

  12. Improving Access to NASA Earth Science Data through Collaborative Metadata Curation

    Science.gov (United States)

    Sisco, A. W.; Bugbee, K.; Shum, D.; Baynes, K.; Dixon, V.; Ramachandran, R.

    2017-12-01

    The NASA-developed Common Metadata Repository (CMR) is a high-performance metadata system that currently catalogs over 375 million Earth science metadata records. It serves as the authoritative metadata management system of NASA's Earth Observing System Data and Information System (EOSDIS), enabling NASA Earth science data to be discovered and accessed by a worldwide user community. The size of the EOSDIS data archive is steadily increasing, and the ability to manage and query this archive depends on the input of high quality metadata to the CMR. Metadata that does not provide adequate descriptive information diminishes the CMR's ability to effectively find and serve data to users. To address this issue, an innovative and collaborative review process is underway to systematically improve the completeness, consistency, and accuracy of metadata for approximately 7,000 data sets archived by NASA's twelve EOSDIS data centers, or Distributed Active Archive Centers (DAACs). The process involves automated and manual metadata assessment of both collection and granule records by a team of Earth science data specialists at NASA Marshall Space Flight Center. The team communicates results to DAAC personnel, who then make revisions and reingest improved metadata into the CMR. Implementation of this process relies on a network of interdisciplinary collaborators leveraging a variety of communication platforms and long-range planning strategies. Curating metadata at this scale and resolving metadata issues through community consensus improves the CMR's ability to serve current and future users and also introduces best practices for stewarding the next generation of Earth Observing System data. This presentation will detail the metadata curation process, its outcomes thus far, and also share the status of ongoing curation activities.

  13. Conceptual querying through ontologies

    DEFF Research Database (Denmark)

    Andreasen, Troels; Bulskov, Henrik

    2009-01-01

    is motivated by an obvious need for users to survey huge volumes of objects in query answers. An ontology formalism and a special notion of-instantiated ontology" are introduced. The latter is a structure reflecting the content in the document collection in that; it is a restriction of a general world......We present here ail approach to conceptual querying where the aim is, given a collection of textual database objects or documents, to target an abstraction of the entire database content in terms of the concepts appearing in documents, rather than the documents in the collection. The approach...... knowledge ontology to the concepts instantiated in the collection. The notion of ontology-based similarity is briefly described, language constructs for direct navigation and retrieval of concepts in the ontology are discussed and approaches to conceptual summarization are presented....

  14. Metadata and Service at the GFZ ISDC Portal

    Science.gov (United States)

    Ritschel, B.

    2008-05-01

    an explicit identification of single data files and the set-up of a comprehensive Earth science data catalog. The huge ISDC data catalog is realized by product type dependent tables filled with data file related metadata, which have relations to corresponding metadata tables. The product type describing parent DIF XML metadata documents are stored and managed in ORACLE's XML storage structures. In order to improve the interoperability of the ISDC service portal, the existing proprietary catalog system will be extended by an ISO 19115 based web catalog service. In addition to this development there is ISDC related concerning semantic network of different kind of metadata resources, like different kind of standardized and not-standardized metadata documents and literature as well as Web 2.0 user generated information derived from tagging activities and social navigation data.

  15. From scarcity to bounty: how Galateas can turn your scarce short queries into gold

    NARCIS (Netherlands)

    Segond, F.; Barbu, E.; Barsanti, I.; Kovachev, B.; Lagos, N.; Trevisan, M.; Vald, E.

    2012-01-01

    With the growth of digital libraries and the digital library federation in addition to partially unstructured collections of documents such as web sites, a large set of vendors are offering engines for retrieving content and metadata via search requests by the end user (queries). In most cases these

  16. Metadata specification in a dynamic geometry software

    Science.gov (United States)

    Radaković, Davorka; Herceg, Äńorde

    2017-07-01

    Attributes in C# are a mechanism that provides association of declarative information with C# code such as classes, types, methods, properties, namespaces etc. Once defined and associated with a program entity, an attribute can be queried at run time. However, the attributes have certain restrictions which limit their application to representing complex metadata necessary for development of dynamic geometry software (DGS). We have devised a solution, independent of attributes, which was developed to overcome the limitations, while maintaining the functionality of attributes. Our solution covers a wide range of uses, from providing extensibility to a functional programming language and declaring new data types and operations, to being a foundation for runtime optimizations of expression tree evaluation, and helpful user interface features, such as code completion.

  17. Mastering jQuery mobile

    CERN Document Server

    Lambert, Chip

    2015-01-01

    You've started down the path of jQuery Mobile, now begin mastering some of jQuery Mobile's higher level topics. Go beyond jQuery Mobile's documentation and master one of the hottest mobile technologies out there. Previous JavaScript and PHP experience can help you get the most out of this book.

  18. Query optimization over crowdsourced data

    KAUST Repository

    Park, Hyunjung

    2013-08-26

    Deco is a comprehensive system for answering declarative queries posed over stored relational data together with data obtained on-demand from the crowd. In this paper we describe Deco\\'s cost-based query optimizer, building on Deco\\'s data model, query language, and query execution engine presented earlier. Deco\\'s objective in query optimization is to find the best query plan to answer a query, in terms of estimated monetary cost. Deco\\'s query semantics and plan execution strategies require several fundamental changes to traditional query optimization. Novel techniques incorporated into Deco\\'s query optimizer include a cost model distinguishing between "free" existing data versus paid new data, a cardinality estimation algorithm coping with changes to the database state during query execution, and a plan enumeration algorithm maximizing reuse of common subplans in a setting that makes reuse challenging. We experimentally evaluate Deco\\'s query optimizer, focusing on the accuracy of cost estimation and the efficiency of plan enumeration.

  19. A Geospatial Semantic Enrichment and Query Service for Geotagged Photographs

    Science.gov (United States)

    Ennis, Andrew; Nugent, Chris; Morrow, Philip; Chen, Liming; Ioannidis, George; Stan, Alexandru; Rachev, Preslav

    2015-01-01

    With the increasing abundance of technologies and smart devices, equipped with a multitude of sensors for sensing the environment around them, information creation and consumption has now become effortless. This, in particular, is the case for photographs with vast amounts being created and shared every day. For example, at the time of this writing, Instagram users upload 70 million photographs a day. Nevertheless, it still remains a challenge to discover the “right” information for the appropriate purpose. This paper describes an approach to create semantic geospatial metadata for photographs, which can facilitate photograph search and discovery. To achieve this we have developed and implemented a semantic geospatial data model by which a photograph can be enrich with geospatial metadata extracted from several geospatial data sources based on the raw low-level geo-metadata from a smartphone photograph. We present the details of our method and implementation for searching and querying the semantic geospatial metadata repository to enable a user or third party system to find the information they are looking for. PMID:26205265

  20. A Geospatial Semantic Enrichment and Query Service for Geotagged Photographs

    Directory of Open Access Journals (Sweden)

    Andrew Ennis

    2015-07-01

    Full Text Available With the increasing abundance of technologies and smart devices, equipped with a multitude of sensors for sensing the environment around them, information creation and consumption has now become effortless. This, in particular, is the case for photographs with vast amounts being created and shared every day. For example, at the time of this writing, Instagram users upload 70 million photographs a day. Nevertheless, it still remains a challenge to discover the “right” information for the appropriate purpose. This paper describes an approach to create semantic geospatial metadata for photographs, which can facilitate photograph search and discovery. To achieve this we have developed and implemented a semantic geospatial data model by which a photograph can be enrich with geospatial metadata extracted from several geospatial data sources based on the raw low-level geo-metadata from a smartphone photograph. We present the details of our method and implementation for searching and querying the semantic geospatial metadata repository to enable a user or third party system to find the information they are looking for.

  1. Instant Cassandra query language

    CERN Document Server

    Singh, Amresh

    2013-01-01

    Get to grips with a new technology, understand what it is and what it can do for you, and then get to work with the most important features and tasks. It's an Instant Starter guide.Instant Cassandra Query Language is great for those who are working with Cassandra databases and who want to either learn CQL to check data from the console or build serious applications using CQL. If you're looking for something that helps you get started with CQL in record time and you hate the idea of learning a new language syntax, then this book is for you.

  2. Flexible Community-driven Metadata with the Component Metadata Infrastructure

    NARCIS (Netherlands)

    Windhouwer, M.; Goosen, Twan; Mosutka, Jozef; Van Uytvanck, D.; Broeder, D.

    Many researchers, from the humanities and other domains, have a strong need to study resources in close detail. Nowadays more and more of these resources are available online. To make these resources discoverable, they are described with metadata. These metadata records are collected and made

  3. Metadata Realities for Cyberinfrastructure: Data Authors as Metadata Creators

    Science.gov (United States)

    Mayernik, Matthew Stephen

    2011-01-01

    As digital data creation technologies become more prevalent, data and metadata management are necessary to make data available, usable, sharable, and storable. Researchers in many scientific settings, however, have little experience or expertise in data and metadata management. In this dissertation, I explore the everyday data and metadata…

  4. Federating Metadata Catalogs

    Science.gov (United States)

    Baru, C.; Lin, K.

    2009-04-01

    The Geosciences Network project (www.geongrid.org) has been developing cyberinfrastructure for data sharing in the Earth Science community based on a service-oriented architecture. The project defines a standard "software stack", which includes a standardized set of software modules and corresponding service interfaces. The system employs Grid certificates for distributed user authentication. The GEON Portal provides online access to these services via a set of portlets. This service-oriented approach has enabled the GEON network to easily expand to new sites and deploy the same infrastructure in new projects. To facilitate interoperation with other distributed geoinformatics environments, service standards are being defined and implemented for catalog services and federated search across distributed catalogs. The need arises because there may be multiple metadata catalogs in a distributed system, for example, for each institution, agency, geographic region, and/or country. Ideally, a geoinformatics user should be able to search across all such catalogs by making a single search request. In this paper, we describe our implementation for such a search capability across federated metadata catalogs in the GEON service-oriented architecture. The GEON catalog can be searched using spatial, temporal, and other metadata-based search criteria. The search can be invoked as a Web service and, thus, can be imbedded in any software application. The need for federated catalogs in GEON arises because, (i) GEON collaborators at the University of Hyderabad, India have deployed their own catalog, as part of the iGEON-India effort, to register information about local resources for broader access across the network, (ii) GEON collaborators in the GEO Grid (Global Earth Observations Grid) project at AIST, Japan have implemented a catalog for their ASTER data products, and (iii) we have recently deployed a search service to access all data products from the EarthScope project in the US

  5. Creating preservation metadata from XML-metadata profiles

    Science.gov (United States)

    Ulbricht, Damian; Bertelmann, Roland; Gebauer, Petra; Hasler, Tim; Klump, Jens; Kirchner, Ingo; Peters-Kottig, Wolfgang; Mettig, Nora; Rusch, Beate

    2014-05-01

    Registration of dataset DOIs at DataCite makes research data citable and comes with the obligation to keep data accessible in the future. In addition, many universities and research institutions measure data that is unique and not repeatable like the data produced by an observational network and they want to keep these data for future generations. In consequence, such data should be ingested in preservation systems, that automatically care for file format changes. Open source preservation software that is developed along the definitions of the ISO OAIS reference model is available but during ingest of data and metadata there are still problems to be solved. File format validation is difficult, because format validators are not only remarkably slow - due to variety in file formats different validators return conflicting identification profiles for identical data. These conflicts are hard to resolve. Preservation systems have a deficit in the support of custom metadata. Furthermore, data producers are sometimes not aware that quality metadata is a key issue for the re-use of data. In the project EWIG an university institute and a research institute work together with Zuse-Institute Berlin, that is acting as an infrastructure facility, to generate exemplary workflows for research data into OAIS compliant archives with emphasis on the geosciences. The Institute for Meteorology provides timeseries data from an urban monitoring network whereas GFZ Potsdam delivers file based data from research projects. To identify problems in existing preservation workflows the technical work is complemented by interviews with data practitioners. Policies for handling data and metadata are developed. Furthermore, university teaching material is created to raise the future scientists awareness of research data management. As a testbed for ingest workflows the digital preservation system Archivematica [1] is used. During the ingest process metadata is generated that is compliant to the

  6. From Questions to Queries

    Directory of Open Access Journals (Sweden)

    M. Drlík

    2007-12-01

    Full Text Available The extension of (Internet databases forceseveryone to become more familiar with techniques of datastorage and retrieval because users’ success often dependson their ability to pose right questions and to be able tointerpret their answers. University programs pay moreattention to developing database programming skills than todata exploitation skills. To educate our students to become“database users”, the authors intensively exploit supportivetools simplifying the production of database elements astables, queries, forms, reports, web pages, and macros.Videosequences demonstrating “standard operations” forcompleting them have been prepared to enhance out-ofclassroomlearning. The use of SQL and other professionaltools is reduced to the cases when the wizards are unable togenerate the intended construct.

  7. CyberSKA Radio Imaging Metadata and VO Compliance Engineering

    Science.gov (United States)

    Anderson, K. R.; Rosolowsky, E.; Dowler, P.

    2013-10-01

    The CyberSKA project has written a specification for the metadata encapsulation of radio astronomy data products pursuant to insertion into the VO-compliant Common Archive Observation Model (CAOM) database hosted by the Canadian Astronomy Data Centre (CADC). This specification accommodates radio FITS Image and UV Visibility data, as well as pure CASA Tables Imaging and Visibility Measurement Sets. To extract and engineer radio metadata, we have authored two software packages: metaData (v0.5.0) and mddb (v1.3). Together, these Python packages can convert all the above stated data format types into concise FITS-like header files, engineer the metadata to conform to the CAOM data model, and then insert these engineered data into the CADC database, which subsequently becomes published through the Canadian Virtual Observatory. The metaData and mddb packages have, for the first time, published ALMA imaging data on VO services. Our ongoing work aims to integrate visibility data from ALMA and the SKA into VO services and to enable user-submitted radio data to move seamlessly into the Virtual Observatory.

  8. ATLAS Metadata Interface (AMI), a generic metadata framework

    CERN Document Server

    AUTHOR|(SzGeCERN)573735; The ATLAS collaboration; Odier, Jerome; Lambert, Fabian

    2017-01-01

    The ATLAS Metadata Interface (AMI) is a mature application of more than 15 years of existence. Mainly used by the ATLAS experiment at CERN, it consists of a very generic tool ecosystem for metadata aggregation and cataloguing. We briefly describe the architecture, the main services and the benefits of using AMI in big collaborations, especially for high energy physics. We focus on the recent improvements, for instance: the lightweight clients (Python, JavaScript, C++), the new smart task server system and the Web 2.0 AMI framework for simplifying the development of metadata-oriented web interfaces.

  9. ATLAS Metadata Interface (AMI), a generic metadata framework

    Science.gov (United States)

    Fulachier, J.; Odier, J.; Lambert, F.; ATLAS Collaboration

    2017-10-01

    The ATLAS Metadata Interface (AMI) is a mature application of more than 15 years of existence. Mainly used by the ATLAS experiment at CERN, it consists of a very generic tool ecosystem for metadata aggregation and cataloguing. We briefly describe the architecture, the main services and the benefits of using AMI in big collaborations, especially for high energy physics. We focus on the recent improvements, for instance: the lightweight clients (Python, JavaScript, C++), the new smart task server system and the Web 2.0 AMI framework for simplifying the development of metadata-oriented web interfaces.

  10. ATLAS Metadata Interface (AMI), a generic metadata framework

    CERN Document Server

    Fulachier, Jerome; The ATLAS collaboration

    2016-01-01

    The ATLAS Metadata Interface (AMI) is a mature application of more than 15 years of existence. Mainly used by the ATLAS experiment at CERN, it consists of a very generic tool ecosystem for metadata aggregation and cataloguing. We briefly describe the architecture, the main services and the benefits of using AMI in big collaborations, especially for high energy physics. We focus on the recent improvements, for instance: the lightweight clients (Python, Javascript, C++), the new smart task server system and the Web 2.0 AMI framework for simplifying the development of metadata-oriented web interfaces.

  11. From Nested-Loop to Join Queries in OODB

    NARCIS (Netherlands)

    Steenhagen, H.J.; Steenhagen, H.J.; Apers, Peter M.G.; Blanken, Henk; de By, R.A.

    Most declarative SQL-like query languages for object-oriented database systems are orthogonal languages allowing for arbitrary nesting of expressions in the select-, from-, and where-clause. Expressions in the from-clause may be base tables as well as set-valued attributes. In this paper, we propose

  12. Research Issues in Mobile Querying

    DEFF Research Database (Denmark)

    Breunig, M.; Jensen, Christian Søndergaard; Klein, M.

    2004-01-01

    This document reports on key aspects of the discussions conducted within the working group. In particular, the document aims to offer a structured and somewhat digested summary of the group's discussions. The document first offers concepts that enable characterization of "mobile queries" as well...... as the types of systems that enable such queries. It explores the notion of context in mobile queries. The document ends with a few observations, mainly regarding challenges....

  13. Query optimization for graph analytics on linked data using SPARQL

    Energy Technology Data Exchange (ETDEWEB)

    Hong, Seokyong [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Lee, Sangkeun [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Lim, Seung -Hwan [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Sukumar, Sreenivas R. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Vatsavai, Ranga Raju [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

    2015-07-01

    Triplestores that support query languages such as SPARQL are emerging as the preferred and scalable solution to represent data and meta-data as massive heterogeneous graphs using Semantic Web standards. With increasing adoption, the desire to conduct graph-theoretic mining and exploratory analysis has also increased. Addressing that desire, this paper presents a solution that is the marriage of Graph Theory and the Semantic Web. We present software that can analyze Linked Data using graph operations such as counting triangles, finding eccentricity, testing connectedness, and computing PageRank directly on triple stores via the SPARQL interface. We describe the process of optimizing performance of the SPARQL-based implementation of such popular graph algorithms by reducing the space-overhead, simplifying iterative complexity and removing redundant computations by understanding query plans. Our optimized approach shows significant performance gains on triplestores hosted on stand-alone workstations as well as hardware-optimized scalable supercomputers such as the Cray XMT.

  14. Developing the CUAHSI Metadata Profile

    Science.gov (United States)

    Piasecki, M.; Bermudez, L.; Islam, S.; Beran, B.

    2004-12-01

    The Hydrologic Information System (HIS), of the Consortium of Universities for the Advancement of Hydrologic Science Inc., (CUAHSI), has as one of its goals to improve access to large volume, high quality, and heterogeneous hydrologic data sets. This will be attained in part by adopting a community metadata profile to achieve consistent descriptions that will facilitate data discovery. However, common standards are quite general in nature and typically lack domain specific vocabularies, complicating the adoption of standards for specific communities. We will show and demonstrate the problems encountered in the process of adopting ISO standards to create a CUAHSI metadata profile. The final schema is expressed in a simple metadata format, Metadata Template File (MTF), to leverage metadata annotations/viewer tools already developed by the San Diego Super Computer Center. The steps performed to create an MTF starting from ISO 19115:2003 are the following: 1) creation of ontologies using the Web Ontology Language (OWL) for ISO:19115 2003 and related ISO/TC 211 documents; 2) conceptualization in OWL of related hydrologic vocabularies such as NASA's Global Change Master Directory and units from the Hydrologic Handbook; 3) definition of CUAHSI profile by importing and extending the previous ontologies; 4) explicit creation of CUAHSI core set 5) export of the core set to MTF); 6) definition of metadata blocks for arbitrary digital objects (e.g. time series vs static-spatial data) using ISO's methodology for feature cataloguing; and 7) export of metadata blocks to MTF.

  15. A NOVEL APPROACH OF INDEXING AND RETRIEVING SPATIAL POLYGONS FOR EFFICIENT SPATIAL REGION QUERIES

    Directory of Open Access Journals (Sweden)

    J. H. Zhao

    2017-10-01

    Full Text Available Spatial region queries are more and more widely used in web-based applications. Mechanisms to provide efficient query processing over geospatial data are essential. However, due to the massive geospatial data volume, heavy geometric computation, and high access concurrency, it is difficult to get response in real time. Spatial indexes are usually used in this situation. In this paper, based on k-d tree, we introduce a distributed KD-Tree (DKD-Tree suitbable for polygon data, and a two-step query algorithm. The spatial index construction is recursive and iterative, and the query is an in memory process. Both the index and query methods can be processed in parallel, and are implemented based on HDFS, Spark and Redis. Experiments on a large volume of Remote Sensing images metadata have been carried out, and the advantages of our method are investigated by comparing with spatial region queries executed on PostgreSQL and PostGIS. Results show that our approach not only greatly improves the efficiency of spatial region query, but also has good scalability, Moreover, the two-step spatial range query algorithm can also save cluster resources to support a large number of concurrent queries. Therefore, this method is very useful when building large geographic information systems.

  16. a Novel Approach of Indexing and Retrieving Spatial Polygons for Efficient Spatial Region Queries

    Science.gov (United States)

    Zhao, J. H.; Wang, X. Z.; Wang, F. Y.; Shen, Z. H.; Zhou, Y. C.; Wang, Y. L.

    2017-10-01

    Spatial region queries are more and more widely used in web-based applications. Mechanisms to provide efficient query processing over geospatial data are essential. However, due to the massive geospatial data volume, heavy geometric computation, and high access concurrency, it is difficult to get response in real time. Spatial indexes are usually used in this situation. In this paper, based on k-d tree, we introduce a distributed KD-Tree (DKD-Tree) suitbable for polygon data, and a two-step query algorithm. The spatial index construction is recursive and iterative, and the query is an in memory process. Both the index and query methods can be processed in parallel, and are implemented based on HDFS, Spark and Redis. Experiments on a large volume of Remote Sensing images metadata have been carried out, and the advantages of our method are investigated by comparing with spatial region queries executed on PostgreSQL and PostGIS. Results show that our approach not only greatly improves the efficiency of spatial region query, but also has good scalability, Moreover, the two-step spatial range query algorithm can also save cluster resources to support a large number of concurrent queries. Therefore, this method is very useful when building large geographic information systems.

  17. Smart query answering for marine sensor data.

    Science.gov (United States)

    Shahriar, Md Sumon; de Souza, Paulo; Timms, Greg

    2011-01-01

    We review existing query answering systems for sensor data. We then propose an extended query answering approach termed smart query, specifically for marine sensor data. The smart query answering system integrates pattern queries and continuous queries. The proposed smart query system considers both streaming data and historical data from marine sensor networks. The smart query also uses query relaxation technique and semantics from domain knowledge as a recommender system. The proposed smart query benefits in building data and information systems for marine sensor networks.

  18. Smart Query Answering for Marine Sensor Data

    Directory of Open Access Journals (Sweden)

    Paulo de Souza

    2011-03-01

    Full Text Available We review existing query answering systems for sensor data. We then propose an extended query answering approach termed smart query, specifically for marine sensor data. The smart query answering system integrates pattern queries and continuous queries. The proposed smart query system considers both streaming data and historical data from marine sensor networks. The smart query also uses query relaxation technique and semantics from domain knowledge as a recommender system. The proposed smart query benefits in building data and information systems for marine sensor networks.

  19. Metadata Dictionary Database: A Proposed Tool for Academic Library Metadata Management

    Science.gov (United States)

    Southwick, Silvia B.; Lampert, Cory

    2011-01-01

    This article proposes a metadata dictionary (MDD) be used as a tool for metadata management. The MDD is a repository of critical data necessary for managing metadata to create "shareable" digital collections. An operational definition of metadata management is provided. The authors explore activities involved in metadata management in…

  20. The metadata manual a practical workbook

    CERN Document Server

    Lubas, Rebecca; Schneider, Ingrid

    2013-01-01

    Cultural heritage professionals have high levels of training in metadata. However, the institutions in which they practice often depend on support staff, volunteers, and students in order to function. With limited time and funding for training in metadata creation for digital collections, there are often many questions about metadata without a reliable, direct source for answers. The Metadata Manual provides such a resource, answering basic metadata questions that may appear, and exploring metadata from a beginner's perspective. This title covers metadata basics, XML basics, Dublin Core, VRA C

  1. User perspectives on query difficulty

    DEFF Research Database (Denmark)

    Lioma, Christina; Larsen, Birger; Schütze, Hinrich

    2011-01-01

    The difficulty of a user query can affect the performance of Information Retrieval (IR) systems. What makes a query difficult and how one may predict this is an active research area, focusing mainly on factors relating to the retrieval algorithm, to the properties of the retrieval data...

  2. Querying Sentiment Development over Time

    DEFF Research Database (Denmark)

    Andreasen, Troels; Christiansen, Henning; Have, Christian Theil

    2013-01-01

    that measures how well a hypothesis characterizes a given time interval; the semantics is parameterized so it can be adjusted to different views of the data. EmoEpisodes is extended to a query language with variables standing for unknown topics and emotions, and the query-answering mechanism will return...

  3. FSA 2002 Digital Orthophoto Metadata

    Data.gov (United States)

    Minnesota Department of Natural Resources — Metadata for the 2002 FSA Color Orthophotos Layer. Each orthophoto is represented by a Quarter 24k Quad tile polygon. The polygon attributes contain the quarter-quad...

  4. phosphorus retention data and metadata

    Data.gov (United States)

    U.S. Environmental Protection Agency — phosphorus retention in wetlands data and metadata. This dataset is associated with the following publication: Lane , C., and B. Autrey. Phosphorus retention of...

  5. jQuery Pocket Reference

    CERN Document Server

    Flanagan, David

    2010-01-01

    "As someone who uses jQuery on a regular basis, it was surprising to discover how much of the library I'm not using. This book is indispensable for anyone who is serious about using jQuery for non-trivial applications."-- Raffaele Cecco, longtime developer of video games, including Cybernoid, Exolon, and Stormlord jQuery is the "write less, do more" JavaScript library. Its powerful features and ease of use have made it the most popular client-side JavaScript framework for the Web. This book is jQuery's trusty companion: the definitive "read less, learn more" guide to the library. jQuery P

  6. Instant jQuery selectors

    CERN Document Server

    De Rosa, Aurelio

    2013-01-01

    Filled with practical, step-by-step instructions and clear explanations for the most important and useful tasks. Instant jQuery Selectors follows a simple how-to format with recipes aimed at making you well versed with the wide range of selectors that jQuery has to offer through a myriad of examples.Instant jQuery Selectors is for web developers who want to delve into jQuery from its very starting point: selectors. Even if you're already familiar with the framework and its selectors, you could find several tips and tricks that you aren't aware of, especially about performance and how jQuery ac

  7. jQuery UI cookbook

    CERN Document Server

    Boduch, Adam

    2013-01-01

    Filled with a practical collection of recipes, jQuery UI Cookbook is full of clear, step-by-step instructions that will help you harness the powerful UI framework in jQuery. Depending on your needs, you can dip in and out of the Cookbook and its recipes, or follow the book from start to finish.If you are a jQuery UI developer looking to improve your existing applications, extract ideas for your new application, or to better understand the overall widget architecture, then jQuery UI Cookbook is a must-have for you. The reader should at least have a rudimentary understanding of what jQuery UI is

  8. Authenticated hash tables

    DEFF Research Database (Denmark)

    Triandopoulos, Nikolaos; Papamanthou, Charalampos; Tamassia, Roberto

    2008-01-01

    Hash tables are fundamental data structures that optimally answer membership queries. Suppose a client stores n elements in a hash table that is outsourced at a remote server so that the client can save space or achieve load balancing. Authenticating the hash table functionality, i.e., verifying...... the correctness of queries answered by the server and ensuring the integrity of the stored data, is crucial because the server, lying outside the administrative control of the client, can be malicious. We design efficient and secure protocols for optimally authenticating membership queries on hash tables: for any...... fixed constants 0 1/ε, the server can provide a proof of integrity of the answer to a (non-)membership query in constant time, requiring O(nε/logκε--1 n) time to treat updates, yet keeping the communication and verification costs constant. This is the first construction...

  9. How libraries use publisher metadata

    Directory of Open Access Journals (Sweden)

    Steve Shadle

    2013-11-01

    Full Text Available With the proliferation of electronic publishing, libraries are increasingly relying on publisher-supplied metadata to meet user needs for discovery in library systems. However, many publisher/content provider staff creating metadata are unaware of the end-user environment and how libraries use their metadata. This article provides an overview of the three primary discovery systems that are used by academic libraries, with examples illustrating how publisher-supplied metadata directly feeds into these systems and is used to support end-user discovery and access. Commonly seen metadata problems are discussed, with recommendations suggested. Based on a series of presentations given in Autumn 2012 to the staff of a large publisher, this article uses the University of Washington Libraries systems and services as illustrative examples. Judging by the feedback received from these presentations, publishers (specifically staff not familiar with the big picture of metadata standards work would benefit from a better understanding of the systems and services libraries provide using the data that is created and managed by publishers.

  10. Visualizing and Validating Metadata Traceability within the CDISC Standards

    Science.gov (United States)

    Hume, Sam; Sarnikar, Surendra; Becnel, Lauren; Bennett, Dorine

    2017-01-01

    The Food & Drug Administration has begun requiring that electronic submissions of regulated clinical studies utilize the Clinical Data Information Standards Consortium data standards. Within regulated clinical research, traceability is a requirement and indicates that the analysis results can be traced back to the original source data. Current solutions for clinical research data traceability are limited in terms of querying, validation and visualization capabilities. This paper describes (1) the development of metadata models to support computable traceability and traceability visualizations that are compatible with industry data standards for the regulated clinical research domain, (2) adaptation of graph traversal algorithms to make them capable of identifying traceability gaps and validating traceability across the clinical research data lifecycle, and (3) development of a traceability query capability for retrieval and visualization of traceability information. PMID:28815125

  11. Geo-Enrichment and Semantic Enhancement of Metadata Sets to Augment Discovery in Geoportals

    Directory of Open Access Journals (Sweden)

    Bernhard Vockner

    2014-03-01

    Full Text Available Geoportals are established to function as main gateways to find, evaluate, and start “using” geographic information. Still, current geoportal implementations face problems in optimizing the discovery process due to semantic heterogeneity issues, which leads to low recall and low precision in performing text-based searches. Therefore, we propose an enhanced semantic discovery approach that supports multilingualism and information domain context. Thus, we present workflow that enriches existing structured metadata with synonyms, toponyms, and translated terms derived from user-defined keywords based on multilingual thesauri and ontologies. To make the results easier and understandable, we also provide automated translation capabilities for the resource metadata to support the user in conceiving the thematic content of the descriptive metadata, even if it has been documented using a language the user is not familiar with. In addition, to text-enable spatial filtering capabilities, we add additional location name keywords to metadata sets. These are based on the existing bounding box and shall tweak discovery scores when performing single text line queries. In order to improve the user’s search experience, we tailor faceted search strategies presenting an enhanced query interface for geo-metadata discovery that are transparently leveraging the underlying thesauri and ontologies.

  12. Semantic Metadata for Heterogeneous Spatial Planning Documents

    Science.gov (United States)

    Iwaniak, A.; Kaczmarek, I.; Łukowicz, J.; Strzelecki, M.; Coetzee, S.; Paluszyński, W.

    2016-09-01

    Spatial planning documents contain information about the principles and rights of land use in different zones of a local authority. They are the basis for administrative decision making in support of sustainable development. In Poland these documents are published on the Web according to a prescribed non-extendable XML schema, designed for optimum presentation to humans in HTML web pages. There is no document standard, and limited functionality exists for adding references to external resources. The text in these documents is discoverable and searchable by general-purpose web search engines, but the semantics of the content cannot be discovered or queried. The spatial information in these documents is geographically referenced but not machine-readable. Major manual efforts are required to integrate such heterogeneous spatial planning documents from various local authorities for analysis, scenario planning and decision support. This article presents results of an implementation using machine-readable semantic metadata to identify relationships among regulations in the text, spatial objects in the drawings and links to external resources. A spatial planning ontology was used to annotate different sections of spatial planning documents with semantic metadata in the Resource Description Framework in Attributes (RDFa). The semantic interpretation of the content, links between document elements and links to external resources were embedded in XHTML pages. An example and use case from the spatial planning domain in Poland is presented to evaluate its efficiency and applicability. The solution enables the automated integration of spatial planning documents from multiple local authorities to assist decision makers with understanding and interpreting spatial planning information. The approach is equally applicable to legal documents from other countries and domains, such as cultural heritage and environmental management.

  13. Pathogen metadata platform: software for accessing and analyzing pathogen strain information.

    Science.gov (United States)

    Chang, Wenling E; Peterson, Matthew W; Garay, Christopher D; Korves, Tonia

    2016-09-15

    Pathogen metadata includes information about where and when a pathogen was collected and the type of environment it came from. Along with genomic nucleotide sequence data, this metadata is growing rapidly and becoming a valuable resource not only for research but for biosurveillance and public health. However, current freely available tools for analyzing this data are geared towards bioinformaticians and/or do not provide summaries and visualizations needed to readily interpret results. We designed a platform to easily access and summarize data about pathogen samples. The software includes a PostgreSQL database that captures metadata useful for disease outbreak investigations, and scripts for downloading and parsing data from NCBI BioSample and BioProject into the database. The software provides a user interface to query metadata and obtain standardized results in an exportable, tab-delimited format. To visually summarize results, the user interface provides a 2D histogram for user-selected metadata types and mapping of geolocated entries. The software is built on the LabKey data platform, an open-source data management platform, which enables developers to add functionalities. We demonstrate the use of the software in querying for a pathogen serovar and for genome sequence identifiers. This software enables users to create a local database for pathogen metadata, populate it with data from NCBI, easily query the data, and obtain visual summaries. Some of the components, such as the database, are modular and can be incorporated into other data platforms. The source code is freely available for download at https://github.com/wchangmitre/bioattribution .

  14. The role of economics in the QUERI program: QUERI Series

    Directory of Open Access Journals (Sweden)

    Smith Mark W

    2008-04-01

    Full Text Available Abstract Background The United States (U.S. Department of Veterans Affairs (VA Quality Enhancement Research Initiative (QUERI has implemented economic analyses in single-site and multi-site clinical trials. To date, no one has reviewed whether the QUERI Centers are taking an optimal approach to doing so. Consistent with the continuous learning culture of the QUERI Program, this paper provides such a reflection. Methods We present a case study of QUERI as an example of how economic considerations can and should be integrated into implementation research within both single and multi-site studies. We review theoretical and applied cost research in implementation studies outside and within VA. We also present a critique of the use of economic research within the QUERI program. Results Economic evaluation is a key element of implementation research. QUERI has contributed many developments in the field of implementation but has only recently begun multi-site implementation trials across multiple regions within the national VA healthcare system. These trials are unusual in their emphasis on developing detailed costs of implementation, as well as in the use of business case analyses (budget impact analyses. Conclusion Economics appears to play an important role in QUERI implementation studies, only after implementation has reached the stage of multi-site trials. Economic analysis could better inform the choice of which clinical best practices to implement and the choice of implementation interventions to employ. QUERI economics also would benefit from research on costing methods and development of widely accepted international standards for implementation economics.

  15. Recommendation Sets and Choice Queries

    DEFF Research Database (Denmark)

    Viappiani, Paolo Renato; Boutilier, Craig

    2011-01-01

    Utility elicitation is an important component of many applications, such as decision support systems and recommender systems. Such systems query users about their preferences and offer recommendations based on the system's belief about the user's utility function. We analyze the connection between...... the problem of generating optimal recommendation sets and the problem of generating optimal choice queries, considering both Bayesian and regret-based elicitation. Our results show that, somewhat surprisingly, under very general circumstances, the optimal recommendation set coincides with the optimal query....

  16. An integrated overview of metadata in ATLAS

    International Nuclear Information System (INIS)

    Gallas, E J; Malon, D; Hawkings, R J; Albrand, S; Torrence, E

    2010-01-01

    Metadata (data about data) arise in many contexts, from many diverse sources, and at many levels in ATLAS. Familiar examples include run-level, luminosity-block-level, and event-level metadata, and, related to processing and organization, dataset-level and file-level metadata, but these categories are neither exhaustive nor orthogonal. Some metadata are known a priori, in advance of data taking or simulation; other metadata are known only after processing, and occasionally, quite late (e.g., detector status or quality updates that may appear after initial reconstruction is complete). Metadata that may seem relevant only internally to the distributed computing infrastructure under ordinary conditions may become relevant to physics analysis under error conditions ('What can I discover about data I failed to process?'). This talk provides an overview of metadata and metadata handling in ATLAS, and describes ongoing work to deliver integrated metadata services in support of physics analysis.

  17. jQuery For Dummies

    CERN Document Server

    Beighley, Lynn

    2010-01-01

    Learn how jQuery can make your Web page or blog stand out from the crowd!. jQuery is free, open source software that allows you to extend and customize Joomla!, Drupal, AJAX, and WordPress via plug-ins. Assuming no previous programming experience, Lynn Beighley takes you through the basics of jQuery from the very start. You'll discover how the jQuery library separates itself from other JavaScript libraries through its ease of use, compactness, and friendliness if you're a beginner programmer. Written in the easy-to-understand style of the For Dummies brand, this book demonstrates how you can a

  18. Schedule Sales Query Raw Data

    Data.gov (United States)

    General Services Administration — Schedule Sales Query presents sales volume figures as reported to GSA by contractors. The reports are generated as quarterly reports for the current year and the...

  19. On the Origin of Metadata

    Directory of Open Access Journals (Sweden)

    Sam Coppens

    2012-12-01

    Full Text Available Metadata has been around and has evolved for centuries, albeit not recognized as such. Medieval manuscripts typically had illuminations at the start of each chapter, being both a kind of signature for the author writing the script and a pictorial chapter anchor for the illiterates at the time. Nowadays, there is so much fragmented information on the Internet that users sometimes fail to distinguish the real facts from some bended truth, let alone being able to interconnect different facts. Here, the metadata can both act as noise-reductors for detailed recommendations to the end-users, as it can be the catalyst to interconnect related information. Over time, metadata thus not only has had different modes of information, but furthermore, metadata’s relation of information to meaning, i.e., “semantics”, evolved. Darwin’s evolutionary propositions, from “species have an unlimited reproductive capacity”, over “natural selection”, to “the cooperation of mutations leads to adaptation to the environment” show remarkable parallels to both metadata’s different modes of information and to its relation of information to meaning over time. In this paper, we will show that the evolution of the use of (metadata can be mapped to Darwin’s nine evolutionary propositions. As mankind and its behavior are products of an evolutionary process, the evolutionary process of metadata with its different modes of information is on the verge of a new-semantic-era.

  20. THE NEW ONLINE METADATA EDITOR FOR GENERATING STRUCTURED METADATA

    Energy Technology Data Exchange (ETDEWEB)

    Devarakonda, Ranjeet [ORNL; Shrestha, Biva [ORNL; Palanisamy, Giri [ORNL; Hook, Leslie A [ORNL; Killeffer, Terri S [ORNL; Boden, Thomas A [ORNL; Cook, Robert B [ORNL; Zolly, Lisa [United States Geological Service (USGS); Hutchison, Viv [United States Geological Service (USGS); Frame, Mike [United States Geological Service (USGS); Cialella, Alice [Brookhaven National Laboratory (BNL); Lazer, Kathy [Brookhaven National Laboratory (BNL)

    2014-01-01

    Nobody is better suited to describe data than the scientist who created it. This description about a data is called Metadata. In general terms, Metadata represents the who, what, when, where, why and how of the dataset [1]. eXtensible Markup Language (XML) is the preferred output format for metadata, as it makes it portable and, more importantly, suitable for system discoverability. The newly developed ORNL Metadata Editor (OME) is a Web-based tool that allows users to create and maintain XML files containing key information, or metadata, about the research. Metadata include information about the specific projects, parameters, time periods, and locations associated with the data. Such information helps put the research findings in context. In addition, the metadata produced using OME will allow other researchers to find these data via Metadata clearinghouses like Mercury [2][4]. OME is part of ORNL s Mercury software fleet [2][3]. It was jointly developed to support projects funded by the United States Geological Survey (USGS), U.S. Department of Energy (DOE), National Aeronautics and Space Administration (NASA) and National Oceanic and Atmospheric Administration (NOAA). OME s architecture provides a customizable interface to support project-specific requirements. Using this new architecture, the ORNL team developed OME instances for USGS s Core Science Analytics, Synthesis, and Libraries (CSAS&L), DOE s Next Generation Ecosystem Experiments (NGEE) and Atmospheric Radiation Measurement (ARM) Program, and the international Surface Ocean Carbon Dioxide ATlas (SOCAT). Researchers simply use the ORNL Metadata Editor to enter relevant metadata into a Web-based form. From the information on the form, the Metadata Editor can create an XML file on the server that the editor is installed or to the user s personal computer. Researchers can also use the ORNL Metadata Editor to modify existing XML metadata files. As an example, an NGEE Arctic scientist use OME to register

  1. ncISO Facilitating Metadata and Scientific Data Discovery

    Science.gov (United States)

    Neufeld, D.; Habermann, T.

    2011-12-01

    Increasing the usability and availability climate and oceanographic datasets for environmental research requires improved metadata and tools to rapidly locate and access relevant information for an area of interest. Because of the distributed nature of most environmental geospatial data, a common approach is to use catalog services that support queries on metadata harvested from remote map and data services. A key component to effectively using these catalog services is the availability of high quality metadata associated with the underlying data sets. In this presentation, we examine the use of ncISO, and Geoportal as open source tools that can be used to document and facilitate access to ocean and climate data available from Thematic Realtime Environmental Distributed Data Services (THREDDS) data services. Many atmospheric and oceanographic spatial data sets are stored in the Network Common Data Format (netCDF) and served through the Unidata THREDDS Data Server (TDS). NetCDF and THREDDS are becoming increasingly accepted in both the scientific and geographic research communities as demonstrated by the recent adoption of netCDF as an Open Geospatial Consortium (OGC) standard. One important source for ocean and atmospheric based data sets is NOAA's Unified Access Framework (UAF) which serves over 3000 gridded data sets from across NOAA and NOAA-affiliated partners. Due to the large number of datasets, browsing the data holdings to locate data is impractical. Working with Unidata, we have created a new service for the TDS called "ncISO", which allows automatic generation of ISO 19115-2 metadata from attributes and variables in TDS datasets. The ncISO metadata records can be harvested by catalog services such as ESSI-labs GI-Cat catalog service, and ESRI's Geoportal which supports query through a number of services, including OpenSearch and Catalog Services for the Web (CSW). ESRI's Geoportal Server provides a number of user friendly search capabilities for end users

  2. The Machinic Temporality of Metadata

    Directory of Open Access Journals (Sweden)

    Claudio Celis

    2015-03-01

    Full Text Available In 1990 Deleuze introduced the hypothesis that disciplinary societies are gradually being replaced by a new logic of power: control. Accordingly, Matteo Pasquinelli has recently argued that we are moving towards societies of metadata, which correspond to a new stage of what Deleuze called control societies. Societies of metadata are characterised for the central role that meta-information acquires both as a source of surplus value and as an apparatus of social control. The aim of this article is to develop Pasquinelli’s thesis by examining the temporal scope of these emerging societies of metadata. In particular, this article employs Guattari’s distinction between human and machinic times. Through these two concepts, this article attempts to show how societies of metadata combine the two poles of capitalist power formations as identified by Deleuze and Guattari, i.e. social subjection and machinic enslavement. It begins by presenting the notion of metadata in order to identify some of the defining traits of contemporary capitalism. It then examines Berardi’s account of the temporality of the attention economy from the perspective of the asymmetric relation between cyber-time and human time. The third section challenges Berardi’s definition of the temporality of the attention economy by using Guattari’s notions of human and machinic times. Parts four and five fall back upon Deleuze and Guattari’s notions of machinic surplus labour and machinic enslavement, respectively. The concluding section tries to show that machinic and human times constitute two poles of contemporary power formations that articulate the temporal dimension of societies of metadata.

  3. Enriching The Metadata On CDS

    CERN Document Server

    Chhibber, Nalin

    2014-01-01

    The project report revolves around the open source software package called Invenio. It provides the tools for management of digital assets in a repository and drives CERN Document Server. Primary objective is to enhance the existing metadata in CDS with data from other libraries. An implicit part of this task is to manage disambiguation (within incoming data), removal of multiple entries and handle replications between new and existing records. All such elements and their corresponding changes are integrated within Invenio to make the upgraded metadata available on the CDS. Latter part of the report discuss some changes related to the Invenio code-base itself.

  4. U.S. EPA Metadata Editor (EME)

    Data.gov (United States)

    U.S. Environmental Protection Agency — The EPA Metadata Editor (EME) allows users to create geospatial metadata that meets EPA's requirements. The tool has been developed as a desktop application that...

  5. Observation Data Model Core Components, its Implementation in the Table Access Protocol Version 1.1

    Science.gov (United States)

    Louys, Mireille; Tody, Doug; Dowler, Patrick; Durand, Daniel; Michel, Laurent; Bonnarel, Francos; Micol, Alberto; IVOA DataModel Working Group; Louys, Mireille; Tody, Doug; Dowler, Patrick; Durand, Daniel

    2017-05-01

    This document defines the core components of the Observation data model that are necessary to perform data discovery when querying data centers for astronomical observations of interest. It exposes use-cases to be carried out, explains the model and provides guidelines for its implementation as a data access service based on the Table Access Protocol (TAP). It aims at providing a simple model easy to understand and to implement by data providers that wish to publish their data into the Virtual Observatory. This interface integrates data modeling and data access aspects in a single service and is named ObsTAP. It will be referenced as such in the IVOA registries. In this document, the Observation Data Model Core Components (ObsCoreDM) defines the core components of queryable metadata required for global discovery of observational data. It is meant to allow a single query to be posed to TAP services at multiple sites to perform global data discovery without having to understand the details of the services present at each site. It defines a minimal set of basic metadata and thus allows for a reasonable cost of implementation by data providers. The combination of the ObsCoreDM with TAP is referred to as an ObsTAP service. As with most of the VO Data Models, ObsCoreDM makes use of STC, Utypes, Units and UCDs. The ObsCoreDM can be serialized as a VOTable. ObsCoreDM can make reference to more complete data models such as Characterisation DM, Spectrum DM or Simple Spectral Line Data Model (SSLDM). ObsCore shares a large set of common concepts with DataSet Metadata Data Model (Cresitello-Dittmar et al. 2016) which binds together most of the data model concepts from the above models in a comprehensive and more general frame work. This current specification on the contrary provides guidelines for implementing these concepts using the TAP protocol and answering ADQL queries. It is dedicated to global discovery.

  6. Fingerprinting Keywords in Search Queries over Tor

    Directory of Open Access Journals (Sweden)

    Oh Se Eun

    2017-10-01

    Full Text Available Search engine queries contain a great deal of private and potentially compromising information about users. One technique to prevent search engines from identifying the source of a query, and Internet service providers (ISPs from identifying the contents of queries is to query the search engine over an anonymous network such as Tor.

  7. Mining Longitudinal Web Queries: Trends and Patterns.

    Science.gov (United States)

    Wang, Peiling; Berry, Michael W.; Yang, Yiheng

    2003-01-01

    Analyzed user queries submitted to an academic Web site during a four-year period, using a relational database, to examine users' query behavior, to identify problems they encounter, and to develop techniques for optimizing query analysis and mining. Linguistic analyses focus on query structures, lexicon, and word associations using statistical…

  8. MyBestQuery - A serious game to collect manual query reformulation

    OpenAIRE

    Chifu, Adrian-Gabriel; Molina, Serge; Mothe, Josiane

    2016-01-01

    This paper presents MyBestQuery, a serious game designed to collect query reformulations from players. Query reformulation is a hot topic in information retrieval and covers many aspects. One of them is query reformulation analysis which is based on users' session. It can be used to understand user's intent or to measure his satisfaction with regards to the results he obtained when querying the search engine. Automatic query reformulation is another aspect of query reformulation. It automatic...

  9. Scaling the walls of discovery: using semantic metadata for integrative problem solving.

    Science.gov (United States)

    Manning, Maurice; Aggarwal, Amit; Gao, Kevin; Tucker-Kellogg, Greg

    2009-03-01

    Current data integration approaches by bioinformaticians frequently involve extracting data from a wide variety of public and private data repositories, each with a unique vocabulary and schema, via scripts. These separate data sets must then be normalized through the tedious and lengthy process of resolving naming differences and collecting information into a single view. Attempts to consolidate such diverse data using data warehouses or federated queries add significant complexity and have shown limitations in flexibility. The alternative of complete semantic integration of data requires a massive, sustained effort in mapping data types and maintaining ontologies. We focused instead on creating a data architecture that leverages semantic mapping of experimental metadata, to support the rapid prototyping of scientific discovery applications with the twin goals of reducing architectural complexity while still leveraging semantic technologies to provide flexibility, efficiency and more fully characterized data relationships. A metadata ontology was developed to describe our discovery process. A metadata repository was then created by mapping metadata from existing data sources into this ontology, generating RDF triples to describe the entities. Finally an interface to the repository was designed which provided not only search and browse capabilities but complex query templates that aggregate data from both RDF and RDBMS sources. We describe how this approach (i) allows scientists to discover and link relevant data across diverse data sources and (ii) provides a platform for development of integrative informatics applications.

  10. Condorcet query engine: A query engine for coordinated index terms

    NARCIS (Netherlands)

    van der Vet, P.E.; Mars, Nicolaas

    1999-01-01

    On-line information retrieval systems often offer their users some means to tune the query to match the level of granularity of the information request. Users can be offered a far greater range of possibilities, however, if documents are indexed with coordinated index concepts. Coordinated index

  11. Head First jQuery

    CERN Document Server

    Benedetti, Ryan

    2011-01-01

    Want to add more interactivity and polish to your websites? Discover how jQuery can help you build complex scripting functionality in just a few lines of code. With Head First jQuery, you'll quickly get up to speed on this amazing JavaScript library by learning how to navigate HTML documents while handling events, effects, callbacks, and animations. By the time you've completed the book, you'll be incorporating Ajax apps, working seamlessly with HTML and CSS, and handling data with PHP, MySQL and JSON. If you want to learn-and understand-how to create interactive web pages, unobtrusive scrip

  12. Multi-Dimensional Path Queries

    DEFF Research Database (Denmark)

    Bækgaard, Lars

    1998-01-01

    that connects a pair of paths. A path expression is a function that maps a set of path sets into a path set. Path sets can be joined, filtering conditions can restrict the set of qualifying paths, and aggregation functions can be applied to path elements. In particular, the aggregation function SET can be used...... to create nested path structures. We present an SQL-like query language that is based on path expressions and we show how to use it to express multi-dimensional path queries that are suited for advanced data analysis in decision support environments like data warehousing environments...

  13. Spot table - RPD | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available ended gel image of the spot. Mass Spectrometry Accession No. Accession No. of homologous protein by Mascot Analysis. Mass Spectrometr...y Homologous Protein Definition of homologous protein by Mascot Analysis. Mass Spectrometry

  14. The essential guide to metadata for books

    CERN Document Server

    Register, Renee

    2013-01-01

    In The Essential Guide to Metadata for Books, you will learn exactly what you need to know to effectively generate, handle and disseminate metadata for books and ebooks. This comprehensive but digestible document will explain the life-cycle of book metadata, industry standards, XML, ONIX and the essential elements of metadata. It will also show you how effective, well-organized metadata can improve your efforts to sell a book, especially when it comes to marketing, discoverability and converting at the point of sale. This information-packed document also includes a glossary of terms

  15. A Method for Automating Geospatial Dataset Metadata

    Directory of Open Access Journals (Sweden)

    Robert I. Dunfey

    2009-11-01

    Full Text Available Metadata have long been recognised as crucial to geospatial asset management and discovery, and yet undertaking their creation remains an unenviable task often to be avoided. This paper proposes a practical approach designed to address such concerns, decomposing various data creation, management, update and documentation process steps that are subsequently leveraged to contribute towards metadata record completion. Using a customised utility embedded within a common GIS application, metadata elements are computationally derived from an imposed feature metadata standard, dataset geometry, an integrated storage protocol and pre-prepared content, and instantiated within a common geospatial discovery convention. Yielding 27 out of a 32 total metadata elements (or 15 out of 17 mandatory elements the approach demonstrably lessens the burden of metadata authorship. It also encourages improved geospatial asset management whilst outlining core requisites for developing a more open metadata strategy not bound to any particular application domain.

  16. Federated query services provided by the Seamless SAR Archive project

    Science.gov (United States)

    Baker, S.; Bryson, G.; Buechler, B.; Meertens, C. M.; Crosby, C. J.; Fielding, E. J.; Nicoll, J.; Youn, C.; Baru, C.

    2013-12-01

    The NASA Advancing Collaborative Connections for Earth System Science (ACCESS) seamless synthetic aperture radar (SAR) archive (SSARA) project is a 2-year collaboration between UNAVCO, the Alaska Satellite Facility (ASF), the Jet Propulsion Laboratory (JPL), and OpenTopography at the San Diego Supercomputer Center (SDSC) to design and implement a seamless distributed access system for SAR data and derived data products (i.e. interferograms). A major milestone for the first year of the SSARA project was a unified application programming interface (API) for SAR data search and results at ASF and UNAVCO (WInSAR and EarthScope data archives) through the use of simple web services. A federated query service was developed using the unified APIs, providing users a single search interface for both archives (http://www.unavco.org/ws/brokered/ssara/sar/search). A command line client that utilizes this new service is provided as an open source utility for the community on GitHub (https://github.com/bakerunavco/SSARA). Further API development and enhancements added more InSAR specific keywords and quality control parameters (Doppler centroid, faraday rotation, InSAR stack size, and perpendicular baselines). To facilitate InSAR processing, the federated query service incorporated URLs for DEM (from OpenTopography) and tropospheric corrections (from the JPL OSCAR service) in addition to the URLs for SAR data. This federated query service will provide relevant QC metadata for selecting pairs of SAR data for InSAR processing and all the URLs necessary for interferogram generation. Interest from the international community has prompted an effort to incorporate other SAR data archives (the ESA Virtual Archive 4 and the DLR TerraSAR-X_SSC Geohazard Supersites and Natural Laboratories collections) into the federated query service which provide data for researchers outside the US and North America.

  17. Parameter Curation for Benchmark Queries

    NARCIS (Netherlands)

    Gubichev, Andrey; Boncz, Peter

    2014-01-01

    In this paper we consider the problem of generating parameters for benchmark queries so these have stable behavior despite being executed on datasets (real-world or synthetic) with skewed data distributions and value correlations. We show that uniform random sampling of the substitution parameters

  18. Automatically Preparing Safe SQL Queries

    Science.gov (United States)

    Bisht, Prithvi; Sistla, A. Prasad; Venkatakrishnan, V. N.

    We present the first sound program source transformation approach for automatically transforming the code of a legacy web application to employ PREPARE statements in place of unsafe SQL queries. Our approach therefore opens the way for eradicating the SQL injection threat vector from legacy web applications.

  19. Fuzzy Querying: Issues and Perspectives..

    Czech Academy of Sciences Publication Activity Database

    Kacprzyk, J.; Pasi, G.; Vojtáš, Peter; Zadrozny, S.

    2000-01-01

    Roč. 36, č. 6 (2000), s. 605-616 ISSN 0023-5954 Institutional research plan: AV0Z1030915 Keywords : flexible querying * information retrieval * fuzzy databases Subject RIV: BA - General Mathematics http://dml.cz/handle/10338.dmlcz/135376

  20. Enhancing Recall in Semantic Querying

    DEFF Research Database (Denmark)

    Rouces, Jacobo

    2013-01-01

    RDF and SPARQL are currently state-of-the-art W3C standards to respectively represent and query structured information, especially when information from different sources must be federated. However, there are various reasons for which the same knowledge can be modeled in RDF graphs that are both ...

  1. Querying Large Biological Network Datasets

    Science.gov (United States)

    Gulsoy, Gunhan

    2013-01-01

    New experimental methods has resulted in increasing amount of genetic interaction data to be generated every day. Biological networks are used to store genetic interaction data gathered. Increasing amount of data available requires fast large scale analysis methods. Therefore, we address the problem of querying large biological network datasets.…

  2. A Solution to Metadata: Using XML Transformations to Automate Metadata

    Science.gov (United States)

    2010-06-01

    database development that requires little or no programming knowledge and skill; however, a working knowledge of schemas and metadata standards is...against the FGDC CSDGM schema using NCDDC’s MERMAid tool [15]. All randomly sampled records passed validation. Record content was also visually inspected...Beverley, MA: Altova GmbH & Altova, Inc., 2003. [15] NOAA National Coastal Data Development Center, MERMAid (computer software), version 1.2, Stennis Space Center, MS.

  3. Log-Less Metadata Management on Metadata Server for Parallel File Systems

    Directory of Open Access Journals (Sweden)

    Jianwei Liao

    2014-01-01

    Full Text Available This paper presents a novel metadata management mechanism on the metadata server (MDS for parallel and distributed file systems. In this technique, the client file system backs up the sent metadata requests, which have been handled by the metadata server, so that the MDS does not need to log metadata changes to nonvolatile storage for achieving highly available metadata service, as well as better performance improvement in metadata processing. As the client file system backs up certain sent metadata requests in its memory, the overhead for handling these backup requests is much smaller than that brought by the metadata server, while it adopts logging or journaling to yield highly available metadata service. The experimental results show that this newly proposed mechanism can significantly improve the speed of metadata processing and render a better I/O data throughput, in contrast to conventional metadata management schemes, that is, logging or journaling on MDS. Besides, a complete metadata recovery can be achieved by replaying the backup logs cached by all involved clients, when the metadata server has crashed or gone into nonoperational state exceptionally.

  4. Provenance Description of Metadata Vocabularies for the Long-term Maintenance of Metadata

    Directory of Open Access Journals (Sweden)

    Chunqiu Li

    2017-03-01

    Full Text Available Purpose: The purpose of this paper is to discuss provenance description of metadata terms and metadata vocabularies as a set of metadata terms. Provenance is crucial information to keep track of changes of metadata terms and metadata vocabularies for their consistent maintenance. Design/methodology/approach: The W3C PROV standard for general provenance description and Resource Description Framework (RDF are adopted as the base models to formally define provenance description for metadata vocabularies. Findings: This paper defines a few primitive change types of metadata terms, and a provenance description model of the metadata terms based on the primitive change types. We also provide examples of provenance description in RDF graphs to show the proposed model. Research limitations: The model proposed in this paper is defined based on a few primitive relationships (e.g. addition, deletion, and replacement between pre-version and post-version of a metadata term. The model is simplified and the practical changes of metadata terms can be more complicated than the primitive relationships discussed in the model. Practical implications: Formal provenance description of metadata vocabularies can improve maintainability of metadata vocabularies over time. Conventional maintenance of metadata terms is the maintenance of documents of terms. The proposed model enables effective and automated tracking of change history of metadata vocabularies using simple formal description scheme defined based on widely-used standards. Originality/value: Changes in metadata vocabularies may cause inconsistencies in the long-term use of metadata. This paper proposes a simple and formal scheme of provenance description of metadata vocabularies. The proposed model works as the basis of automated maintenance of metadata terms and their vocabularies and is applicable to various types of changes.

  5. A Vision for Next Generation Query Processors and an Associated Research Agenda

    Science.gov (United States)

    Gounaris, Anastasios

    Query processing is one of the most important mechanisms for data management, and there exist mature techniques for effective query optimization and efficient query execution. The vast majority of these techniques assume workloads of rather small transactional tasks with strong requirements for ACID properties. However, the emergence of new computing paradigms, such as grid and cloud computing, the increasingly large volumes of data commonly processed, the need to support data driven research, intensive data analysis and new scenarios, such as processing data streams on the fly or querying web services, the fact that the metadata fed to optimizers are often missing at compile time, and the growing interest in novel optimization criteria, such as monetary cost or energy consumption, create a unique set of new requirements for query processing systems. These requirements cannot be met by modern techniques in their entirety, although interesting solutions and efficient tools have already been developed for some of them in isolation. Next generation query processors are expected to combine features addressing all of these issues, and, consequently, lie at the confluence of several research initiatives. This paper aims to present a vision for such processors, to explain their functionality requirements, and to discuss the open issues, along with their challenges.

  6. Evolution in Metadata Quality: Common Metadata Repository's Role in NASA Curation Efforts

    Science.gov (United States)

    Gilman, Jason; Shum, Dana; Baynes, Katie

    2016-01-01

    Metadata Quality is one of the chief drivers of discovery and use of NASA EOSDIS (Earth Observing System Data and Information System) data. Issues with metadata such as lack of completeness, inconsistency, and use of legacy terms directly hinder data use. As the central metadata repository for NASA Earth Science data, the Common Metadata Repository (CMR) has a responsibility to its users to ensure the quality of CMR search results. This poster covers how we use humanizers, a technique for dealing with the symptoms of metadata issues, as well as our plans for future metadata validation enhancements. The CMR currently indexes 35K collections and 300M granules.

  7. Improvements to the Ontology-based Metadata Portal for Unified Semantics (OlyMPUS)

    Science.gov (United States)

    Linsinbigler, M. A.; Gleason, J. L.; Huffer, E.

    2016-12-01

    The Ontology-based Metadata Portal for Unified Semantics (OlyMPUS), funded by the NASA Earth Science Technology Office Advanced Information Systems Technology program, is an end-to-end system designed to support Earth Science data consumers and data providers, enabling the latter to register data sets and provision them with the semantically rich metadata that drives the Ontology-Driven Interactive Search Environment for Earth Sciences (ODISEES). OlyMPUS complements the ODISEES' data discovery system with an intelligent tool to enable data producers to auto-generate semantically enhanced metadata and upload it to the metadata repository that drives ODISEES. Like ODISEES, the OlyMPUS metadata provisioning tool leverages robust semantics, a NoSQL database and query engine, an automated reasoning engine that performs first- and second-order deductive inferencing, and uses a controlled vocabulary to support data interoperability and automated analytics. The ODISEES data discovery portal leverages this metadata to provide a seamless data discovery and access experience for data consumers who are interested in comparing and contrasting the multiple Earth science data products available across NASA data centers. Olympus will support scientists' services and tools for performing complex analyses and identifying correlations and non-obvious relationships across all types of Earth System phenomena using the full spectrum of NASA Earth Science data available. By providing an intelligent discovery portal that supplies users - both human users and machines - with detailed information about data products, their contents and their structure, ODISEES will reduce the level of effort required to identify and prepare large volumes of data for analysis. This poster will explain how OlyMPUS leverages deductive reasoning and other technologies to create an integrated environment for generating and exploiting semantically rich metadata.

  8. phosphorus retention data and metadata

    Science.gov (United States)

    phosphorus retention in wetlands data and metadataThis dataset is associated with the following publication:Lane , C., and B. Autrey. Phosphorus retention of forested and emergent marsh depressional wetlands in differing land uses in Florida, USA. Wetlands Ecology and Management. Springer Science and Business Media B.V;Formerly Kluwer Academic Publishers B.V., GERMANY, 24(1): 45-60, (2016).

  9. Optimizing Temporal Queries: Efficient Handling of Duplicates

    DEFF Research Database (Denmark)

    Toman, David; Bowman, Ivan Thomas

    2001-01-01

    , these query languages are implemented by translating temporal queries into standard relational queries. However, the compiled queries are often quite cumbersome and expensive to execute even using state-of-the- art relational products. This paper presents an optimization technique that produces more efficient...... translated SQL queries by taking into account the properties of the encoding used for temporal attributes. For concreteness, this translation technique is presented in the context of SQL/TP; however, these techniques are also applicable to other temporal query languages....

  10. A Comparison of Query-by-Example Methods for Spoken Term Detection

    Science.gov (United States)

    2009-09-01

    consistent “errors” between the in- dex and the query. Few query terms have more than one pro- nunciation (avg. 1.1 prons . per term), as a result, there is... pron lex. one dict entry (llr) 73.01 47.66 21.11 all dict entries (avg+llr) 73.99 48.16 20.92 all dict entries (max+llr) 74.27 48.26 20.93 Table 1

  11. Querying Natural Logic Knowledge Bases

    DEFF Research Database (Denmark)

    Andreasen, Troels; Bulskov, Henrik; Jensen, Per Anker

    2017-01-01

    This paper describes the principles of a system applying natural logic as a knowledge base language. Natural logics are regimented fragments of natural language employing high level inference rules. We advocate the use of natural logic for knowledge bases dealing with querying of classes in ontol......This paper describes the principles of a system applying natural logic as a knowledge base language. Natural logics are regimented fragments of natural language employing high level inference rules. We advocate the use of natural logic for knowledge bases dealing with querying of classes...... in ontologies and class-relationships such as are common in life-science descriptions. The paper adopts a version of natural logic with recursive restrictive clauses such as relative clauses and adnominal prepositional phrases. It includes passive as well as active voice sentences. We outline a prototype...

  12. Flexible Query Answering Systems 2006

    DEFF Research Database (Denmark)

    This volume constitutes the proceedings of the Seventh International Conference on Flexible Query Answering Systems, FQAS 2006, held in Milan, Italy, on June 7--10, 2006. FQAS is the premier conference for researchers and practitioners concerned with the vital task of providing easy, flexible......, and intuitive access to information for every type of need. This multidisciplinary conference draws on several research areas, including information retrieval, database management, information filtering, knowledge representation, soft computing, management of multimedia information, and human...... submissions, relating to the topic of users posing queries and systems producing answers. The papers cover the fields: Database Management, Information Retrieval, Domain Modeling, Knowledge Representation and Ontologies, Knowledge Discovery and Data Mining, Artificial Intelligence, Classical and Non...

  13. Querying Sentiment Development over Time

    DEFF Research Database (Denmark)

    Andreasen, Troels; Christiansen, Henning; Have, Christian Theil

    2013-01-01

    A new language is introduced for describing hypotheses about fluctuations of measurable properties in streams of timestamped data, and as prime example, we consider trends of emotions in the constantly flowing stream of Twitter messages. The language, called EmoEpisodes, has a precise semantics...... that measures how well a hypothesis characterizes a given time interval; the semantics is parameterized so it can be adjusted to different views of the data. EmoEpisodes is extended to a query language with variables standing for unknown topics and emotions, and the query-answering mechanism will return...... instantiations for topics and emotions as well as time intervals that provide the largest deflections in this measurement. Experiments are performed on a selection of Twitter data to demonstrates the usefulness of the approach....

  14. XML Multidimensional Modelling and Querying

    OpenAIRE

    Boucher, Serge; Verhaegen, Boris; Zimányi, Esteban

    2009-01-01

    As XML becomes ubiquitous and XML storage and processing becomes more efficient, the range of use cases for these technologies widens daily. One promising area is the integration of XML and data warehouses, where an XML-native database stores multidimensional data and processes OLAP queries written in the XQuery interrogation language. This paper explores issues arising in the implementation of such a data warehouse. We first compare approaches for multidimensional data modelling in XML, then...

  15. Security in a Replicated Metadata Catalogue

    CERN Document Server

    Koblitz, B

    2007-01-01

    The gLite-AMGA metadata has been developed by NA4 to provide simple relational metadata access for the EGEE user community. As advanced features, which will be the focus of this presentation, AMGA provides very fine-grained security also in connection with the built-in support for replication and federation of metadata. AMGA is extensively used by the biomedical community to store medical images metadata, digital libraries, in HEP for logging and bookkeeping data and in the climate community. The biomedical community intends to deploy a distributed metadata system for medical images consisting of various sites, which range from hospitals to computing centres. Only safe sharing of the highly sensitive metadata as provided in AMGA makes such a scenario possible. Other scenarios are digital libraries, which federate copyright protected (meta-) data into a common catalogue. The biomedical and digital libraries have been deployed using a centralized structure already for some time. They now intend to decentralize ...

  16. XML for catalogers and metadata librarians

    CERN Document Server

    Cole, Timothy W

    2013-01-01

    How are today's librarians to manage and describe the everexpanding volumes of resources, in both digital and print formats? The use of XML in cataloging and metadata workflows can improve metadata quality, the consistency of cataloging workflows, and adherence to standards. This book is intended to enable current and future catalogers and metadata librarians to progress beyond a bare surfacelevel acquaintance with XML, thereby enabling them to integrate XML technologies more fully into their cataloging workflows. Building on the wealth of work on library descriptive practices, cataloging, and metadata, XML for Catalogers and Metadata Librarians explores the use of XML to serialize, process, share, and manage library catalog and metadata records. The authors' expert treatment of the topic is written to be accessible to those with little or no prior practical knowledge of or experience with how XML is used. Readers will gain an educated appreciation of the nuances of XML and grasp the benefit of more advanced ...

  17. PERANCANGAN SISTEM METADATA UNTUK DATA WAREHOUSE DENGAN STUDI KASUS REVENUE TRACKING PADA PT. TELKOM DIVRE V JAWA TIMUR

    Directory of Open Access Journals (Sweden)

    Yudhi Purwananto

    2004-07-01

    Full Text Available Normal 0 false false false IN X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} Data warehouse merupakan media penyimpanan data dalam perusahaan yang diambil dari berbagai sistem dan dapat digunakan untuk berbagai keperluan seperti analisis dan pelaporan. Di PT Telkom Divre V Jawa Timur telah dibangun sebuah data warehouse yang disebut dengan Regional Database. Di Regional Database memerlukan sebuah komponen penting dalam data warehouse yaitu metadata. Definisi metadata secara sederhana adalah "data tentang data". Dalam penelitian ini dirancang sistem metadata dengan studi kasus Revenue Tracking sebagai komponen analisis dan pelaporan pada Regional Database. Metadata sangat perlu digunakan dalam pengelolaan dan memberikan informasi tentang data warehouse. Proses - proses di dalam data warehouse serta komponen - komponen yang berkaitan dengan data warehouse harus saling terintegrasi untuk mewujudkan karakteristik data warehouse yang subject-oriented, integrated, time-variant, dan non-volatile. Karena itu metadata juga harus memiliki kemampuan mempertukarkan informasi (exchange antar komponen dalam data warehouse tersebut. Web service digunakan sebagai mekanisme pertukaran ini. Web service menggunakan teknologi XML dan protokol HTTP dalam berkomunikasi. Dengan web service, setiap komponen

  18. PROGRAM SYSTEM AND INFORMATION METADATA BANK OF TERTIARY PROTEIN STRUCTURES

    Directory of Open Access Journals (Sweden)

    T. A. Nikitin

    2013-01-01

    Full Text Available The article deals with the architecture of metadata storage model for check results of three-dimensional protein structures. Concept database model was built. The service and procedure of database update as well as data transformation algorithms for protein structures and their quality were presented. Most important information about entries and their submission forms to store, access, and delivery to users were highlighted. Software suite was developed for the implementation of functional tasks using Java programming language in the NetBeans v.7.0 environment and JQL to query and interact with the database JavaDB. The service was tested and results have shown system effectiveness while protein structures filtration.

  19. A Distributed Infrastructure for Metadata about Metadata: The HDMM Architectural Style and PORTAL-DOORS System

    Directory of Open Access Journals (Sweden)

    Carl Taswell

    2010-06-01

    Full Text Available Both the IRIS-DNS System and the PORTAL-DOORS System share a common architectural style for pervasive metadata networks that operate as distributed metadata management systems with hierarchical authorities for entity registering and attribute publishing. Hierarchical control of metadata redistribution throughout the registry-directory networks constitutes an essential characteristic of this architectural style called Hierarchically Distributed Mobile Metadata (HDMM with its focus on moving the metadata for who what where as fast as possible from servers in response to requests from clients. The novel concept of multilevel metadata about metadata has also been defined for the PORTAL-DOORS System with the use of entity, record, infoset, representation and message metadata. Other new features implemented include the use of aliases, priorities and metaresources.

  20. Automated Atmospheric Composition Dataset Level Metadata Discovery. Difficulties and Surprises

    Science.gov (United States)

    Strub, R. F.; Falke, S. R.; Kempler, S.; Fialkowski, E.; Goussev, O.; Lynnes, C.

    2015-12-01

    The Atmospheric Composition Portal (ACP) is an aggregator and curator of information related to remotely sensed atmospheric composition data and analysis. It uses existing tools and technologies and, where needed, enhances those capabilities to provide interoperable access, tools, and contextual guidance for scientists and value-adding organizations using remotely sensed atmospheric composition data. The initial focus is on Essential Climate Variables identified by the Global Climate Observing System - CH4, CO, CO2, NO2, O3, SO2 and aerosols. This poster addresses our efforts in building the ACP Data Table, an interface to help discover and understand remotely sensed data that are related to atmospheric composition science and applications. We harvested GCMD, CWIC, GEOSS metadata catalogs using machine to machine technologies - OpenSearch, Web Services. We also manually investigated the plethora of CEOS data providers portals and other catalogs where that data might be aggregated. This poster is our experience of the excellence, variety, and challenges we encountered.Conclusions:1.The significant benefits that the major catalogs provide are their machine to machine tools like OpenSearch and Web Services rather than any GUI usability improvements due to the large amount of data in their catalog.2.There is a trend at the large catalogs towards simulating small data provider portals through advanced services. 3.Populating metadata catalogs using ISO19115 is too complex for users to do in a consistent way, difficult to parse visually or with XML libraries, and too complex for Java XML binders like CASTOR.4.The ability to search for Ids first and then for data (GCMD and ECHO) is better for machine to machine operations rather than the timeouts experienced when returning the entire metadata entry at once. 5.Metadata harvest and export activities between the major catalogs has led to a significant amount of duplication. (This is currently being addressed) 6.Most (if not

  1. International Metadata Standards and Enterprise Data Quality Metadata Systems

    Science.gov (United States)

    Habermann, Ted

    2016-01-01

    Well-documented data quality is critical in situations where scientists and decision-makers need to combine multiple datasets from different disciplines and collection systems to address scientific questions or difficult decisions. Standardized data quality metadata could be very helpful in these situations. Many efforts at developing data quality standards falter because of the diversity of approaches to measuring and reporting data quality. The one size fits all paradigm does not generally work well in this situation. I will describe these and other capabilities of ISO 19157 with examples of how they are being used to describe data quality across the NASA EOS Enterprise and also compare these approaches with other standards.

  2. Identifying Aspects for Web-Search Queries

    OpenAIRE

    Wu, Fei; Madhavan, Jayant; Halevy, Alon

    2014-01-01

    Many web-search queries serve as the beginning of an exploration of an unknown space of information, rather than looking for a specific web page. To answer such queries effec- tively, the search engine should attempt to organize the space of relevant information in a way that facilitates exploration. We describe the Aspector system that computes aspects for a given query. Each aspect is a set of search queries that together represent a distinct information need relevant to the original search...

  3. Capturing Sensor Metadata for Cross-Domain Interoperability

    Science.gov (United States)

    Fredericks, J.

    2015-12-01

    Envision a world where a field operator turns on an instrument, and is queried for information needed to create standardized encoded descriptions that, together with the sensor manufacturer knowledge, fully describe the capabilities, limitations and provenance of observational data. The Cross-Domain Observational Metadata Environmental Sensing Network (X-DOMES) pilot project (with support from the NSF/EarthCube IA) is taking the first steps needed in realizing this vision. The knowledge of how an observable physical property becomes a measured observation must be captured at each stage of its creation. Each sensor-based observation is made through the use of applied technologies, each with specific limitations and capabilities. Environmental sensors typically provide a variety of options that can be configured differently for each unique deployment, affecting the observational results. By capturing the information (metadata) at each stage of its generation, a more complete and accurate description of data provenance can be communicated. By documenting the information in machine-harvestable, standards-based encodings, metadata can be shared across disciplinary and geopolitical boundaries. Using standards-based frameworks enables automated harvesting and translation to other community-adopted standards, which facilitates the use of shared tools and workflows. The establishment of a cross-domain network of stakeholders (sensor manufacturers, data providers, domain experts, data centers), called the X-DOMES Network, provides a unifying voice for the specification of content and implementation of standards, as well as a central repository for sensor profiles, vocabularies, guidance and product vetting. The ability to easily share fully described observational data provides a better understanding of data provenance and enables the use of common data processing and assessment workflows, fostering a greater trust in our shared global resources. The X-DOMES Network

  4. How Good Are Query Optimizers, Really?

    NARCIS (Netherlands)

    Leis, Viktor; Gubichev, Andrey; Mirchev, Atanas; Boncz, Peter; Kemper, Alfons; Neumann, Thomas

    2016-01-01

    Finding a good join order is crucial for query performance. In this paper, we introduce the Join Order Benchmark (JOB) and experimentally revisit the main components in the classic query optimizer architecture using a complex, real-world data set and realistic multi-join queries. We investigate the

  5. Heuristics-based query optimisation for SPARQL

    NARCIS (Netherlands)

    P. Tsialiamanis (Petros); E. Sidirourgos (Eleftherios); I. Fundulaki; V. Christophides; P.A. Boncz (Peter)

    2012-01-01

    textabstractQuery optimization in RDF Stores is a challenging problem as SPARQL queries typically contain many more joins than equivalent relational plans, and hence lead to a large join order search space. In such cases, cost-based query optimization often is not possible. One practical reason for

  6. How Good Are Query Optimizers, Really?

    NARCIS (Netherlands)

    V. Leis (Viktor); A. Gubichev (Andrey); A. Mirchev (Atanas); P.A. Boncz (Peter); T. Neumann (Thomas); A. Kemper (Alfons)

    2015-01-01

    htmlabstractFinding a good join order is crucial for query performance. In this paper, we introduce the Join Order Benchmark (JOB) and experimentally revisit the main components in the classic query optimizer architecture using a complex, real-world data set and realistic multi-join queries. We

  7. Predecessor queries in dynamic integer sets

    DEFF Research Database (Denmark)

    Brodal, Gerth Stølting

    1997-01-01

    We consider the problem of maintaining a set of n integers in the range 0.2w–1 under the operations of insertion, deletion, predecessor queries, minimum queries and maximum queries on a unit cost RAM with word size w bits. Let f (n) be an arbitrary nondecreasing smooth function satisfying n...

  8. The I4 Online Query Tool for Earth Observations Data

    Science.gov (United States)

    Stefanov, William L.; Vanderbloemen, Lisa A.; Lawrence, Samuel J.

    2015-01-01

    The NASA Earth Observation System Data and Information System (EOSDIS) delivers an average of 22 terabytes per day of data collected by orbital and airborne sensor systems to end users through an integrated online search environment (the Reverb/ECHO system). Earth observations data collected by sensors on the International Space Station (ISS) are not currently included in the EOSDIS system, and are only accessible through various individual online locations. This increases the effort required by end users to query multiple datasets, and limits the opportunity for data discovery and innovations in analysis. The Earth Science and Remote Sensing Unit of the Exploration Integration and Science Directorate at NASA Johnson Space Center has collaborated with the School of Earth and Space Exploration at Arizona State University (ASU) to develop the ISS Instrument Integration Implementation (I4) data query tool to provide end users a clean, simple online interface for querying both current and historical ISS Earth Observations data. The I4 interface is based on the Lunaserv and Lunaserv Global Explorer (LGE) open-source software packages developed at ASU for query of lunar datasets. In order to avoid mirroring existing databases - and the need to continually sync/update those mirrors - our design philosophy is for the I4 tool to be a pure query engine only. Once an end user identifies a specific scene or scenes of interest, I4 transparently takes the user to the appropriate online location to download the data. The tool consists of two public-facing web interfaces. The Map Tool provides a graphic geobrowser environment where the end user can navigate to an area of interest and select single or multiple datasets to query. The Map Tool displays active image footprints for the selected datasets (Figure 1). Selecting a footprint will open a pop-up window that includes a browse image and a link to available image metadata, along with a link to the online location to order or

  9. NetCDF4/HDF5 and Linked Data in the Real World - Enriching Geoscientific Metadata without Bloat

    Science.gov (United States)

    Ip, Alex; Car, Nicholas; Druken, Kelsey; Poudjom-Djomani, Yvette; Butcher, Stirling; Evans, Ben; Wyborn, Lesley

    2017-04-01

    NetCDF4 has become the dominant generic format for many forms of geoscientific data, leveraging (and constraining) the versatile HDF5 container format, while providing metadata conventions for interoperability. However, the encapsulation of detailed metadata within each file can lead to metadata "bloat", and difficulty in maintaining consistency where metadata is replicated to multiple locations. Complex conceptual relationships are also difficult to represent in simple key-value netCDF metadata. Linked Data provides a practical mechanism to address these issues by associating the netCDF files and their internal variables with complex metadata stored in Semantic Web vocabularies and ontologies, while complying with and complementing existing metadata conventions. One of the stated objectives of the netCDF4/HDF5 formats is that they should be self-describing: containing metadata sufficient for cataloguing and using the data. However, this objective can be regarded as only partially-met where details of conventions and definitions are maintained externally to the data files. For example, one of the most widely used netCDF community standards, the Climate and Forecasting (CF) Metadata Convention, maintains standard vocabularies for a broad range of disciplines across the geosciences, but this metadata is currently neither readily discoverable nor machine-readable. We have previously implemented useful Linked Data and netCDF tooling (ncskos) that associates netCDF files, and individual variables within those files, with concepts in vocabularies formulated using the Simple Knowledge Organization System (SKOS) ontology. NetCDF files contain Uniform Resource Identifier (URI) links to terms represented as SKOS Concepts, rather than plain-text representations of those terms, so we can use simple, standardised web queries to collect and use rich metadata for the terms from any Linked Data-presented SKOS vocabulary. Geoscience Australia (GA) manages a large volume of diverse

  10. Design and evaluation of a NoSQL database for storing and querying RDF data

    Directory of Open Access Journals (Sweden)

    Kanda Runapongsa Saikaew

    2014-12-01

    Full Text Available Currently the amount of web data has increased excessively. Its metadata is widely used in order to fully exploit web information resources. This causes the need for Semantic Web technology to quickly analyze such big data. Resource Description Framework (RDF is a standard for describing web resources. In this paper, we propose a method to exploit a NoSQL database, specifically MongoDB, to store and query RDF data. We choose MongoDB to represent a NoSQL database because it is one of the most popular high-performance NoSQL databases. We evaluate the proposed design and implementation by using the Berlin SPARQL Benchmark, which is one of the most widely accepted benchmarks for comparing the performance of RDF storage systems. We compare three database systems, which are Apache Jena TDB (native RDF store, MySQL (relational database, and our proposed system with MongoDB (NoSQL database. Based on the experimental results analysis, our proposed system outperforms other database systems for most queries when the data set size is small. However, for a larger data set, MongoDB performs well for queries with simple operators while MySQL offers an efficient solution for complex queries. The result of this work can provide some guideline for choosing an appropriate RDF database system and applying a NoSQL database in storing and querying RDF data.

  11. Metadata for Content-Based Image Retrieval

    Directory of Open Access Journals (Sweden)

    Adrian Sterca

    2010-12-01

    Full Text Available This paper presents an image retrieval technique that combines content based image retrieval with pre-computed metadata-based image retrieval. The resulting system will have the advantages of both approaches: the speed/efficiency of metadata-based image retrieval and the accuracy/power of content-based image retrieval.

  12. Leveraging Metadata to Create Better Web Services

    Science.gov (United States)

    Mitchell, Erik

    2012-01-01

    Libraries have been increasingly concerned with data creation, management, and publication. This increase is partly driven by shifting metadata standards in libraries and partly by the growth of data and metadata repositories being managed by libraries. In order to manage these data sets, libraries are looking for new preservation and discovery…

  13. GlamMap : Visualizing library metadata

    NARCIS (Netherlands)

    Betti, Arianna; Gerrits, Dirk; Speckmann, Bettina; van den Berg, Hein

    2014-01-01

    Libraries provide access to large amounts of library metadata. Unfortunately, many libraries only offer textual interfaces for searching and browsing their holdings. Visualisations provide simpler, faster, and more efficient ways to navigate, search and study large quantities of metadata. This paper

  14. A Dynamic Metadata Community Profile for CUAHSI

    Science.gov (United States)

    Bermudez, L.; Piasecki, M.

    2004-12-01

    Common Metadata standards typically lack of domain specific elements, have limited extensibility and do not always resolve semantic heterogeneities that could occur in the annotations. To facilitate the use and extension of metadata specifications a methodology called Dynamic Community Profiles, DCP, is presented. The methodology allows to overwrite elements definitions and to specify core elements as metadata tree paths. DCP uses the Web Ontology Language (OWL), the Resource Description Framework (RDF) and XML syntax to formalize specifications and to create controlled vocabularies in ontologies, which enhances interoperability. This methodology was employed to create a metadata profile for the Consortium of Universities for the Advancement of Hydrologic Science Inc. (CUAHSI). The profile was created by extending ISO-19115:2003 geographic metadata standard and restricting the permissible values of some elements. The values used as controlled vocabularies were inferred from hydrologic keywords found in the Global Change Master Directory (GCMD) and from measurement units found in the Hydrologic Handbook. Also, a core metadata set for CUAHSI was formally expressed as tree paths, containing the ISO core set plus additional elements. Finally a tool was developed to test the extension and to allow creation of metadata instances in RDF/XML which conforms to the profile. Also this tool is able to export the core elements to other schema formats such as Metadata Template Files (MTF).

  15. Contingency Tables.

    Science.gov (United States)

    1980-02-01

    in Table 2, from Waite [1915], give the cross-classification or right-hand fingerprints according to the nimber of whorls and small loops. The total...number of whorls and small loops is at most 5, and the resulting table is triangular: Table 2: Fingerprints of the right hand classified by the number...Table 4. Estimated Expected Values for Fingerprint Data Under Quasi- Independence Small loops Whorls 0 1 2 3 4 5 Total 0 200.6 167.4 166.6 150.3

  16. Object-Extended OLAP Querying

    DEFF Research Database (Denmark)

    Pedersen, Torben Bach; Gu, Junmin; Shoshani, Arie

    2009-01-01

    On-line analytical processing (OLAP) systems based on a dimensional view of data have found widespread use in business applications and are being used increasingly in non-standard applications. These systems provide good performance and ease-of-use. However, the complex structures and relationships...... inherent in data in non-standard applications are not accommodated well by OLAP systems. In contrast, object database systems are built to handle such complexity, but do not support OLAP-type querying well. This paper presents the concepts and techniques underlying a flexible, "multi-model" federated...... system that enables OLAP users to exploit simultaneously the features of OLAP and object systems. The system allows data to be handled using the most appropriate data model and technology: OLAP systems for dimensional data and object database systems for more complex, general data. This allows data...

  17. Metadata Standards and Workflow Systems

    Science.gov (United States)

    Habermann, T.

    2012-12-01

    All modern workflow systems include mechanisms for recording inputs, outputs and processes. These descriptions can include details required to reproduce the workflows exactly and, in some cases, can include virtual images of the hardware and operating system. There are several on-going and emerging standards for representing these detailed workflows including the Open Provenance Model (OPM) and the W3C PROV. At the same time, ISO metadata standards include a simple provenance or lineage model that includes many important elements of workflows. The ISO model could play a critical role in sharing and discovering workflow information for collections and perhaps in recording some details in granules. In order for this goal to be reached, connections between the detailed standards and ISO must be understood and conventions for using them must be developed.

  18. Compressed Representations of Conjunctive Query Results

    OpenAIRE

    Deep, Shaleen; Koutris, Paraschos

    2017-01-01

    Relational queries, and in particular join queries, often generate large output results when executed over a huge dataset. In such cases, it is often infeasible to store the whole materialized output if we plan to reuse it further down a data processing pipeline. Motivated by this problem, we study the construction of space-efficient compressed representations of the output of conjunctive queries, with the goal of supporting the efficient access of the intermediate compressed result for a giv...

  19. Hierarchical Fuzzy Sets To Query Possibilistic Databases

    OpenAIRE

    Thomopoulos, Rallou; Buche, Patrice; Haemmerlé, Ollivier

    2008-01-01

    Within the framework of flexible querying of possibilistic databases, based on the fuzzy set theory, this chapter focuses on the case where the vocabulary used both in the querying language and in the data is hierarchically organized, which occurs in systems that use ontologies. We give an overview of previous works concerning two issues: firstly, flexible querying of imprecise data in the relational model; secondly, the introduction of fuzziness in hierarchies. Concerning the latter point, w...

  20. jQuery Tools UI Library

    CERN Document Server

    Libby, Alex

    2012-01-01

    A practical tutorial with powerful yet simple projects that are quick to implement. This book is aimed at developers who have prior jQuery knowledge, but may not have any prior experience with jQuery Tools. It is possible that they may have started with the basics of jQuery Tools, but want to learn more about how it can be used, as well as get ideas for future projects.

  1. A structural query system for Han characters

    DEFF Research Database (Denmark)

    Skala, Matthew

    2016-01-01

    The IDSgrep structural query system for Han character dictionaries is presented. This dictionary search system represents the spatial structure of Han characters using Extended Ideographic Description Sequences (EIDSes), a data model and syntax based on the Unicode IDS concept. It includes a query...... language for EIDS databases, with a freely available implementation and format translation from popular third-party IDS and XML character databases. The system is designed to suit the needs of font developers and foreign language learners. The search algorithm includes a bit vector index inspired by Bloom...... filters to support faster query operations. Experimental results are presented, evaluating the effect of the indexing on query performance....

  2. Secure Skyline Queries on Cloud Platform.

    Science.gov (United States)

    Liu, Jinfei; Yang, Juncheng; Xiong, Li; Pei, Jian

    2017-04-01

    Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multi-criteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semantically-secure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions.

  3. CanCore: Metadata for Learning Objects

    Directory of Open Access Journals (Sweden)

    Norm Friesen

    2002-10-01

    Full Text Available The vision of reusable digital learning resources or objects, made accessible through coordinated repository architectures and metadata technologies, has gained considerable attention within distance education and training communities. However, the pivotal role of metadata in this vision raises important and longstanding issues about classification, description and meaning. The purpose of this paper is to provide an overview of this vision, focusing specifically on issues of semantics. It will describe the CanCore Learning Object Metadata Application Profile as an important first step in addressing these issues in the context of the discovery, reuse and management of learning resources or objects.

  4. Handbook of metadata, semantics and ontologies

    CERN Document Server

    Sicilia, Miguel-Angel

    2013-01-01

    Metadata research has emerged as a discipline cross-cutting many domains, focused on the provision of distributed descriptions (often called annotations) to Web resources or applications. Such associated descriptions are supposed to serve as a foundation for advanced services in many application areas, including search and location, personalization, federation of repositories and automated delivery of information. Indeed, the Semantic Web is in itself a concrete technological framework for ontology-based metadata. For example, Web-based social networking requires metadata describing people and

  5. Metadata in Chaos: how researchers tag radio broadcasts

    DEFF Research Database (Denmark)

    Lykke, Marianne; Lund, Haakon; Skov, Mette

    2015-01-01

    . To optimally support the researchers a user-centred approach was taken to develop the platform and related metadata scheme. Based on the requirements a three level metadata scheme was developed: (1) core archival metadata, (2) LARM metadata, and (3) project-specific metadata. The paper analyses how researchers...... apply the metadata scheme in their research work. The study consists of two studies, a) a qualitative study of subjects and vocabulary of the applied metadata and annotations, and 5 semi-structured interviews about goals for tagging. The findings clearly show that the primary role of LARM...

  6. SnoVault and encodeD: A novel object-based storage system and applications to ENCODE metadata.

    Science.gov (United States)

    Hitz, Benjamin C; Rowe, Laurence D; Podduturi, Nikhil R; Glick, David I; Baymuradov, Ulugbek K; Malladi, Venkat S; Chan, Esther T; Davidson, Jean M; Gabdank, Idan; Narayana, Aditi K; Onate, Kathrina C; Hilton, Jason; Ho, Marcus C; Lee, Brian T; Miyasato, Stuart R; Dreszer, Timothy R; Sloan, Cricket A; Strattan, J Seth; Tanaka, Forrest Y; Hong, Eurie L; Cherry, J Michael

    2017-01-01

    The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a comprehensive catalog of functional elements initiated shortly after the completion of the Human Genome Project. The current database exceeds 6500 experiments across more than 450 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the H. sapiens and M. musculus genomes. All ENCODE experimental data, metadata, and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage, unified processing, and distribution to community resources and the scientific community. As the volume of data increases, the identification and organization of experimental details becomes increasingly intricate and demands careful curation. The ENCODE DCC has created a general purpose software system, known as SnoVault, that supports metadata and file submission, a database used for metadata storage, web pages for displaying the metadata and a robust API for querying the metadata. The software is fully open-source, code and installation instructions can be found at: http://github.com/ENCODE-DCC/snovault/ (for the generic database) and http://github.com/ENCODE-DCC/encoded/ to store genomic data in the manner of ENCODE. The core database engine, SnoVault (which is completely independent of ENCODE, genomic data, or bioinformatic data) has been released as a separate Python package.

  7. SnoVault and encodeD: A novel object-based storage system and applications to ENCODE metadata.

    Directory of Open Access Journals (Sweden)

    Benjamin C Hitz

    Full Text Available The Encyclopedia of DNA elements (ENCODE project is an ongoing collaborative effort to create a comprehensive catalog of functional elements initiated shortly after the completion of the Human Genome Project. The current database exceeds 6500 experiments across more than 450 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the H. sapiens and M. musculus genomes. All ENCODE experimental data, metadata, and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC for validation, tracking, storage, unified processing, and distribution to community resources and the scientific community. As the volume of data increases, the identification and organization of experimental details becomes increasingly intricate and demands careful curation. The ENCODE DCC has created a general purpose software system, known as SnoVault, that supports metadata and file submission, a database used for metadata storage, web pages for displaying the metadata and a robust API for querying the metadata. The software is fully open-source, code and installation instructions can be found at: http://github.com/ENCODE-DCC/snovault/ (for the generic database and http://github.com/ENCODE-DCC/encoded/ to store genomic data in the manner of ENCODE. The core database engine, SnoVault (which is completely independent of ENCODE, genomic data, or bioinformatic data has been released as a separate Python package.

  8. Structural Metadata Research in the Ears Program

    National Research Council Canada - National Science Library

    Liu, Yang; Shriberg, Elizabeth; Stolcke, Andreas; Peskin, Barbara; Ang, Jeremy; Hillard, Dustin; Ostendorf, Mari; Tomalin, Marcus; Woodland, Phil; Harper, Mary

    2005-01-01

    Both human and automatic processing of speech require recognition of more than just words. In this paper we provide a brief overview of research on structural metadata extraction in the DARPA EARS rich transcription program...

  9. USGS Digital Orthophoto Quad (DOQ) Metadata

    Data.gov (United States)

    Minnesota Department of Natural Resources — Metadata for the USGS DOQ Orthophoto Layer. Each orthophoto is represented by a Quarter 24k Quad tile polygon. The polygon attributes contain the quarter-quad tile...

  10. WDCC Metadata Generation with GeoNetwork

    Science.gov (United States)

    Ramthun, Hans; Lautenschlager, Michael; Winter, Hans-Hermann

    2010-05-01

    Earth system science data like modeling output data are described by metadata. At the WDCC (World Data Center of Climate) the data and metadata are stored inside the CERA (Climate and Environmental Retrieval and Archive) relational database. To fill in the describing metadata several types of XML documents are used to upload data into the database. GeoNetwork is an Ajax based web framework, which offers a wide range of XML data handling for search and update and is especially designed to meet the ISO19115/19139 standard. This framework was extended by the schema's which allow create and update CERA upload XML records. An upload function is also included as well as a connection to the local LDAP (Lightweight Directory Access Protocol) for authentication. Keywords: metadata, WDCC, CERA, Ajax

  11. FSA 2003-2004 Digital Orthophoto Metadata

    Data.gov (United States)

    Minnesota Department of Natural Resources — Metadata for the 2003-2004 FSA Color Orthophotos Layer. Each orthophoto is represented by a Quarter 24k Quad tile polygon. The polygon attributes contain the...

  12. Optimising metadata workflows in a distributed information environment

    OpenAIRE

    Robertson, R. John; Barton, Jane

    2005-01-01

    The different purposes present within a distributed information environment create the potential for repositories to enhance their metadata by capitalising on the diversity of metadata available for any given object. This paper presents three conceptual reference models required to achieve this optimisation of metadata workflow: the ecology of repositories, the object lifecycle model, and the metadata lifecycle model. It suggests a methodology for developing the metadata lifecycle model, and ...

  13. A general approach to query flattening

    NARCIS (Netherlands)

    van Ruth, J.

    The translation of queries from complex data models to simpler data models is a recurring theme in the construction of efficient data management systems. In this paper we propose a general framework to guide the translation from data models with nested types to a flat relational model (query

  14. The Data Cyclotron query processing scheme

    NARCIS (Netherlands)

    Goncalves, R.; Kersten, M.

    2011-01-01

    A grand challenge of distributed query processing is to devise a self-organizing architecture which exploits all hardware resources optimally to manage the database hot set, minimize query response time, and maximize throughput without single point global coordination. The Data Cyclotron

  15. The Data Cyclotron query processing scheme.

    NARCIS (Netherlands)

    R.A. Goncalves (Romulo); M.L. Kersten (Martin)

    2011-01-01

    htmlabstractA grand challenge of distributed query processing is to devise a self-organizing architecture which exploits all hardware resources optimally to manage the database hot set, minimize query response time, and maximize throughput without single point global coordination. The Data Cyclotron

  16. Querying and Mining Strings Made Easy

    KAUST Repository

    Sahli, Majed

    2017-10-13

    With the advent of large string datasets in several scientific and business applications, there is a growing need to perform ad-hoc analysis on strings. Currently, strings are stored, managed, and queried using procedural codes. This limits users to certain operations supported by existing procedural applications and requires manual query planning with limited tuning opportunities. This paper presents StarQL, a generic and declarative query language for strings. StarQL is based on a native string data model that allows StarQL to support a large variety of string operations and provide semantic-based query optimization. String analytic queries are too intricate to be solved on one machine. Therefore, we propose a scalable and efficient data structure that allows StarQL implementations to handle large sets of strings and utilize large computing infrastructures. Our evaluation shows that StarQL is able to express workloads of application-specific tools, such as BLAST and KAT in bioinformatics, and to mine Wikipedia text for interesting patterns using declarative queries. Furthermore, the StarQL query optimizer shows an order of magnitude reduction in query execution time.

  17. Fuzzy Query Processing Using Clustering Techniques.

    Science.gov (United States)

    Kamel, M.; And Others

    1990-01-01

    Discusses the problem of processing fuzzy queries in databases and information retrieval systems and presents a prototype of a fuzzy query processing system for databases that is based on data clustering and uses Pascal programing language. Clustering schemes are explained, and the system architecture that uses natural language is described. (14…

  18. Automated Test Methods for XML Metadata

    Science.gov (United States)

    2017-12-28

    definition (XSD) format and other standards and conventions. This method should be of interest primarily to parties having tools or applications that...consume RCC metadata standard documents, and may be of interest to developers of tools or applications that produce RCC metadata standard documents...instance document and encodings to verify that the rules engines and other tools work together. 1. Initialize the programming environment. 2. Write test

  19. Querying metabolism under different physiological constraints.

    Science.gov (United States)

    Cakmak, Ali; Ozsoyoglu, Gultekin; Hanson, Richard W

    2010-04-01

    Metabolism is a representation of the biochemical principles that govern the production, consumption, degradation, and biosynthesis of metabolites in living cells. Organisms respond to changes in their physiological conditions or environmental perturbations (i.e. constraints) via cooperative implementation of such principles. Querying inner working principles of metabolism under different constraints provides invaluable insights for both researchers and educators. In this paper, we propose a metabolism query language (MQL) and discuss its query processing. MQL enables researchers to explore the behavior of the metabolism with a wide-range of predicates including dietary and physiological condition specifications. The query results of MQL are enriched with both textual and visual representations, and its query processing is completely tailored based on the underlying metabolic principles.

  20. Science friction: data, metadata, and collaboration.

    Science.gov (United States)

    Edwards, Paul N; Mayernik, Matthew S; Batcheller, Archer L; Bowker, Geoffrey C; Borgman, Christine L

    2011-10-01

    When scientists from two or more disciplines work together on related problems, they often face what we call 'science friction'. As science becomes more data-driven, collaborative, and interdisciplinary, demand increases for interoperability among data, tools, and services. Metadata--usually viewed simply as 'data about data', describing objects such as books, journal articles, or datasets--serve key roles in interoperability. Yet we find that metadata may be a source of friction between scientific collaborators, impeding data sharing. We propose an alternative view of metadata, focusing on its role in an ephemeral process of scientific communication, rather than as an enduring outcome or product. We report examples of highly useful, yet ad hoc, incomplete, loosely structured, and mutable, descriptions of data found in our ethnographic studies of several large projects in the environmental sciences. Based on this evidence, we argue that while metadata products can be powerful resources, usually they must be supplemented with metadata processes. Metadata-as-process suggests the very large role of the ad hoc, the incomplete, and the unfinished in everyday scientific work.

  1. Distributed metadata servers for cluster file systems using shared low latency persistent key-value metadata store

    Science.gov (United States)

    Bent, John M.; Faibish, Sorin; Pedone, Jr., James M.; Tzelnic, Percy; Ting, Dennis P. J.; Ionkov, Latchesar A.; Grider, Gary

    2017-12-26

    A cluster file system is provided having a plurality of distributed metadata servers with shared access to one or more shared low latency persistent key-value metadata stores. A metadata server comprises an abstract storage interface comprising a software interface module that communicates with at least one shared persistent key-value metadata store providing a key-value interface for persistent storage of key-value metadata. The software interface module provides the key-value metadata to the at least one shared persistent key-value metadata store in a key-value format. The shared persistent key-value metadata store is accessed by a plurality of metadata servers. A metadata request can be processed by a given metadata server independently of other metadata servers in the cluster file system. A distributed metadata storage environment is also disclosed that comprises a plurality of metadata servers having an abstract storage interface to at least one shared persistent key-value metadata store.

  2. The ANSS Station Information System: A Centralized Station Metadata Repository for Populating, Managing and Distributing Seismic Station Metadata

    Science.gov (United States)

    Thomas, V. I.; Yu, E.; Acharya, P.; Jaramillo, J.; Chowdhury, F.

    2015-12-01

    Maintaining and archiving accurate site metadata is critical for seismic network operations. The Advanced National Seismic System (ANSS) Station Information System (SIS) is a repository of seismic network field equipment, equipment response, and other site information. Currently, there are 187 different sensor models and 114 data-logger models in SIS. SIS has a web-based user interface that allows network operators to enter information about seismic equipment and assign response parameters to it. It allows users to log entries for sites, equipment, and data streams. Users can also track when equipment is installed, updated, and/or removed from sites. When seismic equipment configurations change for a site, SIS computes the overall gain of a data channel by combining the response parameters of the underlying hardware components. Users can then distribute this metadata in standardized formats such as FDSN StationXML or dataless SEED. One powerful advantage of SIS is that existing data in the repository can be leveraged: e.g., new instruments can be assigned response parameters from the Incorporated Research Institutions for Seismology (IRIS) Nominal Response Library (NRL), or from a similar instrument already in the inventory, thereby reducing the amount of time needed to determine parameters when new equipment (or models) are introduced into a network. SIS is also useful for managing field equipment that does not produce seismic data (eg power systems, telemetry devices or GPS receivers) and gives the network operator a comprehensive view of site field work. SIS allows users to generate field logs to document activities and inventory at sites. Thus, operators can also use SIS reporting capabilities to improve planning and maintenance of the network. Queries such as how many sensors of a certain model are installed or what pieces of equipment have active problem reports are just a few examples of the type of information that is available to SIS users.

  3. NERIES: Seismic Data Gateways and User Composed Datasets Metadata Management

    Science.gov (United States)

    Spinuso, Alessandro; Trani, Luca; Kamb, Linus; Frobert, Laurent

    2010-05-01

    One of the NERIES EC project main objectives is to establish and improve the networking of seismic waveform data exchange and access among four main data centers in Europe: INGV, GFZ, ORFEUS and IPGP. Besides the implementation of the data backbone, several investigations and developments have been conducted in order to offer to the users the data available from this network, either programmatically or interactively. One of the challenges is to understand how to enable users` activities such as discovering, aggregating, describing and sharing datasets to obtain a decrease in the replication of similar data queries towards the network, exempting the data centers to guess and create useful pre-packed products. We`ve started to transfer this task more and more towards the users community, where the users` composed data products could be extensively re-used. The main link to the data is represented by a centralized webservice (SeismoLink) acting like a single access point to the whole data network. Users can download either waveform data or seismic station inventories directly from their own software routines by connecting to this webservice, which routes the request to the data centers. The provenance of the data is maintained and transferred to the users in the form of URIs, that identify the dataset and implicitly refer to the data provider. SeismoLink, combined with other webservices (eg EMSC-QuakeML earthquakes catalog service), is used from a community gateway such as the NERIES web portal (http://www.seismicportal.eu). Here the user interacts with a map based portlet which allows the dynamic composition of a data product, binding seismic event`s parameters with a set of seismic stations. The requested data is collected by the back-end processes of the portal, preserved and offered to the user in a personal data cart, where metadata can be generated interactively on-demand. The metadata, expressed in RDF, can also be remotely ingested. They offer rating

  4. A Semantically Enabled Metadata Repository for Solar Irradiance Data Products

    Science.gov (United States)

    Wilson, A.; Cox, M.; Lindholm, D. M.; Nadiadi, I.; Traver, T.

    2014-12-01

    The Laboratory for Atmospheric and Space Physics, LASP, has been conducting research in Atmospheric and Space science for over 60 years, and providing the associated data products to the public. LASP has a long history, in particular, of making space-based measurements of the solar irradiance, which serves as crucial input to several areas of scientific research, including solar-terrestrial interactions, atmospheric, and climate. LISIRD, the LASP Interactive Solar Irradiance Data Center, serves these datasets to the public, including solar spectral irradiance (SSI) and total solar irradiance (TSI) data. The LASP extended metadata repository, LEMR, is a database of information about the datasets served by LASP, such as parameters, uncertainties, temporal and spectral ranges, current version, alerts, etc. It serves as the definitive, single source of truth for that information. The database is populated with information garnered via web forms and automated processes. Dataset owners keep the information current and verified for datasets under their purview. This information can be pulled dynamically for many purposes. Web sites such as LISIRD can include this information in web page content as it is rendered, ensuring users get current, accurate information. It can also be pulled to create metadata records in various metadata formats, such as SPASE (for heliophysics) and ISO 19115. Once these records are be made available to the appropriate registries, our data will be discoverable by users coming in via those organizations. The database is implemented as a RDF triplestore, a collection of instances of subject-object-predicate data entities identifiable with a URI. This capability coupled with SPARQL over HTTP read access enables semantic queries over the repository contents. To create the repository we leveraged VIVO, an open source semantic web application, to manage and create new ontologies and populate repository content. A variety of ontologies were used in

  5. Exposing and Harvesting Metadata Using the OAI Metadata Harvesting Protocol A Tutorial

    CERN Document Server

    Warner, Simeon

    2001-01-01

    In this article I outline the ideas behind the Open Archives Initiative metadata harvesting protocol (OAIMH), and attempt to clarify some common misconceptions. I then consider how the OAIMH protocol can be used to expose and harvest metadata. Perl code examples are given as practical illustration.

  6. Mapping metadata for SWHi : Aligning schemas with library metadata for a historical ontology

    NARCIS (Netherlands)

    Zhang, Junte; Fahmi, I.; Ellermann, Henk; Bouma, G.; Weske, M; Hacid, MS; Godart, C

    2007-01-01

    What are the possibilities of Semantic Web technologies for organizations which traditionally have lots of structured data, such as metadata, available? A library is such a particular organization. We mapped a digital library's descriptive (bibliographic) metadata for a large historical document

  7. Query Optimizations over Decentralized RDF Graphs

    KAUST Repository

    Abdelaziz, Ibrahim

    2017-05-18

    Applications in life sciences, decentralized social networks, Internet of Things, and statistical linked dataspaces integrate data from multiple decentralized RDF graphs via SPARQL queries. Several approaches have been proposed to optimize query processing over a small number of heterogeneous data sources by utilizing schema information. In the case of schema similarity and interlinks among sources, these approaches cause unnecessary data retrieval and communication, leading to poor scalability and response time. This paper addresses these limitations and presents Lusail, a system for scalable and efficient SPARQL query processing over decentralized graphs. Lusail achieves scalability and low query response time through various optimizations at compile and run times. At compile time, we use a novel locality-aware query decomposition technique that maximizes the number of query triple patterns sent together to a source based on the actual location of the instances satisfying these triple patterns. At run time, we use selectivity-awareness and parallel query execution to reduce network latency and to increase parallelism by delaying the execution of subqueries expected to return large results. We evaluate Lusail using real and synthetic benchmarks, with data sizes up to billions of triples on an in-house cluster and a public cloud. We show that Lusail outperforms state-of-the-art systems by orders of magnitude in terms of scalability and response time.

  8. Advanced Data Analysis: From Excel PivotTables to Microsoft Access

    OpenAIRE

    Brown, Christopher C; Pan, Denise; Wiersma, Gabrielle

    2015-01-01

    Most librarians run for the hills when they hear about Microsoft Excel PivotTables and relational databases such as Microsoft Access. PivotTables can be a powerful analysis tool. However, Microsoft Access can move beyond PivotTables by exploring more complex relationships between datasets. Building from the morning session, participants learned additional Excel functions including PivotTables and PivotCharts, as well as Access tables, queries, forms, and reports. The session was held in a cla...

  9. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata

    Energy Technology Data Exchange (ETDEWEB)

    Liolios, Konstantinos; Chen, Amy; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Phil; Markowitz, Victor; Kyrpides, Nikos C.

    2009-09-01

    The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification.

  10. Responsive web design with jQuery

    CERN Document Server

    Carlos, Gilberto

    2013-01-01

    Responsive Web Design with jQuery follows a standard tutorial-based approach, covering various aspects of responsive web design by building a comprehensive website.""Responsive Web Design with jQuery"" is aimed at web designers who are interested in building device-agnostic websites. You should have a grasp of standard HTML, CSS, and JavaScript development, and have a familiarity with graphic design. Some exposure to jQuery and HTML5 will be beneficial but isn't essential.

  11. Experimental quantum private queries with linear optics

    International Nuclear Information System (INIS)

    De Martini, Francesco; Giovannetti, Vittorio; Lloyd, Seth; Maccone, Lorenzo; Nagali, Eleonora; Sansoni, Linda; Sciarrino, Fabio

    2009-01-01

    The quantum private query is a quantum cryptographic protocol to recover information from a database, preserving both user and data privacy: the user can test whether someone has retained information on which query was asked and the database provider can test the amount of information released. Here we discuss a variant of the quantum private query algorithm that admits a simple linear optical implementation: it employs the photon's momentum (or time slot) as address qubits and its polarization as bus qubit. A proof-of-principle experimental realization is implemented.

  12. Instant MDX queries for SQL Server 2012

    CERN Document Server

    Emond, Nicholas

    2013-01-01

    Get to grips with a new technology, understand what it is and what it can do for you, and then get to work with the most important features and tasks. This short, focused guide is a great way to get stated with writing MDX queries. New developers can use this book as a reference for how to use functions and the syntax of a query as well as how to use Calculated Members and Named Sets.This book is great for new developers who want to learn the MDX query language from scratch and install SQL Server 2012 with Analysis Services

  13. Federated query processing for the semantic web

    CERN Document Server

    Buil-Aranda, C

    2014-01-01

    During the last years, the amount of RDF data has increased exponentially over the Web, exposed via SPARQL endpoints. These SPARQL endpoints allow users to direct SPARQL queries to the RDF data. Federated SPARQL query processing allows to query several of these RDF databases as if they were a single one, integrating the results from all of them. This is a key concept in the Web of Data and it is also a hot topic in the community. Besides of that, the W3C SPARQL-WG has standardized it in the new Recommendation SPARQL 1.1.This book provides a formalisation of the W3C proposed recommendation. Thi

  14. Cooperative Scalable Moving Continuous Query Processing

    DEFF Research Database (Denmark)

    Li, Xiaohui; Karras, Panagiotis; Jensen, Christian S.

    2012-01-01

    A range of applications call for a mobile client to continuously monitor others in close proximity. Past research on such problems has covered two extremes: It has offered totally centralized solutions, where a server takes care of all queries, and totally distributed solutions, in which...... there is no central authority at all. Unfortunately, none of these two solutions scales to intensive moving object tracking applications, where each client poses a query. In this paper, we formulate the moving continuous query (MCQ) problem and propose a balanced model where servers cooperatively take care...... and computation cost for both servers and clients. An experimental study demonstrates that our approaches offer better scalability than competitors...

  15. Python, Google Sheets, and the Thesaurus for Graphic Materials for Efficient Metadata Project Workflows

    Directory of Open Access Journals (Sweden)

    Jeremy Bartczak

    2017-01-01

    Full Text Available In 2017, the University of Virginia (U.Va. will launch a two year initiative to celebrate the bicentennial anniversary of the University’s founding in 1819. The U.Va. Library is participating in this event by digitizing some 20,000 photographs and negatives that document student life on the U.Va. grounds in the 1960s and 1970s. Metadata librarians and archivists are well-versed in the challenges associated with generating digital content and accompanying description within the context of limited resources. This paper describes how technology and new approaches to metadata design have enabled the University of Virginia’s Metadata Analysis and Design Department to rapidly and successfully generate accurate description for these digital objects. Python’s pandas module improves efficiency by cleaning and repurposing data recorded at digitization, while the lxml module builds MODS XML programmatically from CSV tables. A simplified technique for subject heading selection and assignment in Google Sheets provides a collaborative environment for streamlined metadata creation and data quality control.

  16. OntoQuery: easy-to-use web-based OWL querying

    Science.gov (United States)

    Tudose, Ilinca; Hastings, Janna; Muthukrishnan, Venkatesh; Owen, Gareth; Turner, Steve; Dekker, Adriano; Kale, Namrata; Ennis, Marcus; Steinbeck, Christoph

    2013-01-01

    Summary: The Web Ontology Language (OWL) provides a sophisticated language for building complex domain ontologies and is widely used in bio-ontologies such as the Gene Ontology. The Protégé-OWL ontology editing tool provides a query facility that allows composition and execution of queries with the human-readable Manchester OWL syntax, with syntax checking and entity label lookup. No equivalent query facility such as the Protégé Description Logics (DL) query yet exists in web form. However, many users interact with bio-ontologies such as chemical entities of biological interest and the Gene Ontology using their online Web sites, within which DL-based querying functionality is not available. To address this gap, we introduce the OntoQuery web-based query utility. Availability and implementation: The source code for this implementation together with instructions for installation is available at http://github.com/IlincaTudose/OntoQuery. OntoQuery software is fully compatible with all OWL-based ontologies and is available for download (CC-0 license). The ChEBI installation, ChEBI OntoQuery, is available at http://www.ebi.ac.uk/chebi/tools/ontoquery. Contact: hastings@ebi.ac.uk PMID:24008420

  17. Schedule Sales Query Report Generation System

    Data.gov (United States)

    General Services Administration — Schedule Sales Query presents sales volume figures as reported to GSA by contractors. The reports are generated as quarterly reports for the current year and the...

  18. Querying temporal databases via OWL 2 QL

    CSIR Research Space (South Africa)

    Klarman, S

    2014-06-01

    Full Text Available SQL:2011, the most recently adopted version of the SQL query language, has unprecedentedly standardized the representation of temporal data in relational databases. Following the successful paradigm of ontology-based data access, we develop a...

  19. Pro PHP and jQuery

    CERN Document Server

    Lengstorf, Jason

    2010-01-01

    This book is for intermediate programmers interested in building AJAX web applications using jQuery and PHP. Along with teaching some advanced PHP techniques, it will teach you how to take your dynamic applications to the next level by adding a JavaScript layer with jQuery. * Learn to utilize built-in PHP functions to build calendar tools.* Learn how jQuery can be used for AJAX, animation, client-side validation, and more.What you'll learn* Use PHP to build a calendar application that allows users to post, view, edit, and delete events.* Use jQuery to allow the calendar app to be viewed and ed

  20. Clean Air Markets - Compliance Query Wizard

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Compliance Query Wizard is part of a suite of Clean Air Markets-related tools that are accessible at http://ampd.epa.gov/ampd/. The Compliance module provides...

  1. Clean Air Markets - Allowances Query Wizard

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Allowances Query Wizard is part of a suite of Clean Air Markets-related tools that are accessible at http://camddataandmaps.epa.gov/gdm/index.cfm. The Allowances...

  2. ANSWERING GEOSPARQL QUERIES OVER RELATIONAL DATA

    Directory of Open Access Journals (Sweden)

    K. Bereta

    2017-07-01

    Full Text Available In this paper we present the system Ontop-spatial that is able to answer GeoSPARQL queries on top of geospatial relational databases, performing on-the-fly GeoSPARQL-to-SQL translation using ontologies and mappings. GeoSPARQL is a geospatial extension of the query language SPARQL standardized by OGC for querying geospatial RDF data. Our approach goes beyond relational databases and covers all data that can have a relational structure even at the logical level. Our purpose is to enable GeoSPARQL querying on-the-fly integrating multiple geospatial sources, without converting and materializing original data as RDF and then storing them in a triple store. This approach is more suitable in the cases where original datasets are stored in large relational databases (or generally in files with relational structure and/or get frequently updated.

  3. Path-based Queries on Trajectory Data

    DEFF Research Database (Denmark)

    Krogh, Benjamin Bjerre; Pelekis, Nikos; Theodoridis, Yannis

    2014-01-01

    In traffic research, management, and planning a number of path-based analyses are heavily used, e.g., for computing turn-times, evaluating green waves, or studying traffic flow. These analyses require retrieving the trajectories that follow the full path being analyzed. Existing path queries cannot...... sufficiently support such path-based analyses because they retrieve all trajectories that touch any edge in the path. In this paper, we define and formalize the strict path query. This is a novel query type tailored to support path-based analysis, where trajectories must follow all edges in the path...... a specific path by only retrieving data from the first and last edge in the path. To correctly answer strict path queries existing network-constrained trajectory indexes must retrieve data from all edges in the path. An extensive performance study of NETTRA using a very large real-world trajectory data set...

  4. Efficient Approximate OLAP Querying Over Time Series

    DEFF Research Database (Denmark)

    Perera, Kasun Baruhupolage Don Kasun Sanjeewa; Hahmann, Martin; Lehner, Wolfgang

    2016-01-01

    The ongoing trend for data gathering not only produces larger volumes of data, but also increases the variety of recorded data types. Out of these, especially time series, e.g. various sensor readings, have attracted attention in the domains of business intelligence and decision making. As OLAP...... queries play a major role in these domains, it is desirable to also execute them on time series data. While this is not a problem on the conceptual level, it can become a bottleneck with regards to query run-time. In general, processing OLAP queries gets more computationally intensive as the volume...... are either costly or require continuous maintenance. In this paper we propose an approach for approximate OLAP querying of time series that offers constant latency and is maintenance-free. To achieve this, we identify similarities between aggregation cuboids and propose algorithms that eliminate...

  5. Superfund Chemical Data Matrix (SCDM) Query

    Science.gov (United States)

    This site allows you to to easily query the Superfund Chemical Data Matrix (SCDM) and generate a list of the corresponding Hazard Ranking System (HRS) factor values, benchmarks, and data elements that you need.

  6. A semantically rich and standardised approach enhancing discovery of sensor data and metadata

    Science.gov (United States)

    Kokkinaki, Alexandra; Buck, Justin; Darroch, Louise

    2016-04-01

    The marine environment plays an essential role in the earth's climate. To enhance the ability to monitor the health of this important system, innovative sensors are being produced and combined with state of the art sensor technology. As the number of sensors deployed is continually increasing,, it is a challenge for data users to find the data that meet their specific needs. Furthermore, users need to integrate diverse ocean datasets originating from the same or even different systems. Standards provide a solution to the above mentioned challenges. The Open Geospatial Consortium (OGC) has created Sensor Web Enablement (SWE) standards that enable different sensor networks to establish syntactic interoperability. When combined with widely accepted controlled vocabularies, they become semantically rich and semantic interoperability is achievable. In addition, Linked Data is the recommended best practice for exposing, sharing and connecting information on the Semantic Web using Uniform Resource Identifiers (URIs), Resource Description Framework (RDF) and RDF Query Language (SPARQL). As part of the EU-funded SenseOCEAN project, the British Oceanographic Data Centre (BODC) is working on the standardisation of sensor metadata enabling 'plug and play' sensor integration. Our approach combines standards, controlled vocabularies and persistent URIs to publish sensor descriptions, their data and associated metadata as 5 star Linked Data and OGC SWE (SensorML, Observations & Measurements) standard. Thus sensors become readily discoverable, accessible and useable via the web. Content and context based searching is also enabled since sensors descriptions are understood by machines. Additionally, sensor data can be combined with other sensor or Linked Data datasets to form knowledge. This presentation will describe the work done in BODC to achieve syntactic and semantic interoperability in the sensor domain. It will illustrate the reuse and extension of the Semantic Sensor

  7. Menangkal Serangan SQL Injection Dengan Parameterized Query

    Directory of Open Access Journals (Sweden)

    Yulianingsih Yulianingsih

    2016-06-01

    Full Text Available Semakin meningkat pertumbuhan layanan informasi maka semakin tinggi pula tingkat kerentanan keamanan dari suatu sumber informasi. Melalui tulisan ini disajikan penelitian yang dilakukan secara eksperimen yang membahas tentang kejahatan penyerangan database secara SQL Injection. Penyerangan dilakukan melalui halaman autentikasi dikarenakan halaman ini merupakan pintu pertama akses yang seharusnya memiliki pertahanan yang cukup. Kemudian dilakukan eksperimen terhadap metode Parameterized Query untuk mendapatkan solusi terhadap permasalahan tersebut.   Kata kunci— Layanan Informasi, Serangan, eksperimen, SQL Injection, Parameterized Query.

  8. Queryll: Java Database Queries through Bytecode Rewriting

    OpenAIRE

    Iu, Christopher Ming-Yee; Zwaenepoel, Willy

    2006-01-01

    When interfacing Java with other systems such as databases, programmers must often program in special interface languages like SQL. Code written in these languages often needs to be embedded in strings where they cannot be error-checked at compile-time, or the Java compiler needs to be altered to directly recognize code written in these languages. We have taken a different approach to adding database query facilities to Java. Bytecode rewriting allows us to add query facilities to Java whose ...

  9. Nearest Neighbor Queries in Road Networks

    DEFF Research Database (Denmark)

    Jensen, Christian Søndergaard; Kolar, Jan; Pedersen, Torben Bach

    2003-01-01

    With wireless communications and geo-positioning being widely available, it becomes possible to offer new e-services that provide mobile users with information about other mobile objects. This paper concerns active, ordered k-nearest neighbor queries for query and data objects that are moving in ...... for the nearest neighbor search in the prototype is presented in detail. In addition, the paper reports on results from experiments with the prototype system....

  10. Evaluating SPARQL queries on massive RDF datasets

    KAUST Repository

    Al-Harbi, Razen

    2015-08-01

    Distributed RDF systems partition data across multiple computer nodes. Partitioning is typically based on heuristics that minimize inter-node communication and it is performed in an initial, data pre-processing phase. Therefore, the resulting partitions are static and do not adapt to changes in the query workload; as a result, existing systems are unable to consistently avoid communication for queries that are not favored by the initial data partitioning. Furthermore, for very large RDF knowledge bases, the partitioning phase becomes prohibitively expensive, leading to high startup costs. In this paper, we propose AdHash, a distributed RDF system which addresses the shortcomings of previous work. First, AdHash initially applies lightweight hash partitioning, which drastically minimizes the startup cost, while favoring the parallel processing of join patterns on subjects, without any data communication. Using a locality-aware planner, queries that cannot be processed in parallel are evaluated with minimal communication. Second, AdHash monitors the data access patterns and adapts dynamically to the query load by incrementally redistributing and replicating frequently accessed data. As a result, the communication cost for future queries is drastically reduced or even eliminated. Our experiments with synthetic and real data verify that AdHash (i) starts faster than all existing systems, (ii) processes thousands of queries before other systems become online, and (iii) gracefully adapts to the query load, being able to evaluate queries on billion-scale RDF data in sub-seconds. In this demonstration, audience can use a graphical interface of AdHash to verify its performance superiority compared to state-of-the-art distributed RDF systems.

  11. Omics Metadata Management Software v. 1 (OMMS)

    Energy Technology Data Exchange (ETDEWEB)

    2013-09-09

    Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and to perform bioinformatics analyses and information management tasks via a simple and intuitive web-based interface. Several use cases with short-read sequence datasets are provided to showcase the full functionality of the OMMS, from metadata curation tasks, to bioinformatics analyses and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for web-based deployment supporting geographically dispersed research teams. Our software was developed with open-source bundles, is flexible, extensible and easily installed and run by operators with general system administration and scripting language literacy.

  12. Scanning table

    CERN Multimedia

    1960-01-01

    Before the invention of wire chambers, particles tracks were analysed on scanning tables like this one. Today, the process is electronic and much faster. Bubble chamber film - currently available - (links can be found below) was used for this analysis of the particle tracks.

  13. Linked Metadata - lightweight semantics for data integration (Invited)

    Science.gov (United States)

    Hendler, J. A.

    2013-12-01

    The "Linked Open Data" cloud (http://linkeddata.org) is currently used to show how the linking of datasets, supported by SPARQL endpoints, is creating a growing set of linked data assets. This linked data space has been growing rapidly, and the last version collected is estimated to have had over 35 billion 'triples.' As impressive as this may sound, there is an inherent flaw in the way the linked data story is conceived. The idea is that all of the data is represented in a linked format (generally RDF) and applications will essentially query this cloud and provide mashup capabilities between the various kinds of data that are found. The view of linking in the cloud is fairly simple -links are provided by either shared URIs or by URIs that are asserted to be owl:sameAs. This view of the linking, which primarily focuses on shared objects and subjects in RDF's subject-predicate-object representation, misses a critical aspect of Semantic Web technology. Given triples such as * A:person1 foaf:knows A:person2 * B:person3 foaf:knows B:person4 * C:person5 foaf:name 'John Doe' this view would not consider them linked (barring other assertions) even though they share a common vocabulary. In fact, we get significant clues that there are commonalities in these data items from the shared namespaces and predicates, even if the traditional 'graph' view of RDF doesn't appear to join on these. Thus, it is the linking of the data descriptions, whether as metadata or other vocabularies, that provides the linking in these cases. This observation is crucial to scientific data integration where the size of the datasets, or even the individual relationships within them, can be quite large. (Note that this is not restricted to scientific data - search engines, social networks, and massive multiuser games also create huge amounts of data.) To convert all the triples into RDF and provide individual links is often unnecessary, and is both time and space intensive. Those looking to do on the

  14. Pembuatan Aplikasi Metadata Generator untuk Koleksi Peninggalan Warisan Budaya

    Directory of Open Access Journals (Sweden)

    Wimba Agra Wicesa

    2017-03-01

    Full Text Available Warisan budaya merupakan suatu aset penting yang digunakan sebagai sumber informasi dalam mempelajari ilmu sejarah. Mengelola data warisan budaya menjadi suatu hal yang harus diperhatikan guna menjaga keutuhan data warisan budaya di masa depan. Menciptakan sebuah metadata warisan budaya merupakan salah satu langkah yang dapat diambil untuk menjaga nilai dari sebuah artefak. Dengan menggunakan konsep metadata, informasi dari setiap objek warisan budaya tersebut menjadi mudah untuk dibaca, dikelola, maupun dicari kembali meskipun telah tersimpan lama. Selain itu dengan menggunakan konsep metadata, informasi tentang warisan budaya dapat digunakan oleh banyak sistem. Metadata warisan budaya merupakan metadata yang cukup besar. Sehingga untuk membangun metada warisan budaya dibutuhkan waktu yang cukup lama. Selain itu kesalahan (human error juga dapat menghambat proses pembangunan metadata warisan budaya. Proses pembangkitan metadata warisan budaya melalui Aplikasi Metadata Generator menjadi lebih cepat dan mudah karena dilakukan secara otomatis oleh sistem. Aplikasi ini juga dapat menekan human error sehingga proses pembangkitan menjadi lebih efisien.

  15. U.S. EPAs Public Geospatial Metadata Service

    Data.gov (United States)

    U.S. Environmental Protection Agency — EPAs public geospatial metadata service provides external parties (Data.gov, GeoPlatform.gov, and the general public) with access to EPA's geospatial metadata...

  16. Defining an Open Metadata Framework for Proteomics: The PROMIS Project

    OpenAIRE

    MacMullen, W. John; Parmelee, Mary C.; Fenstermacher, David A.; Hemminger, Bradley M.

    2002-01-01

    This presentation describes the PROMIS project under development at UNC Chapel Hill. PROMIS (Proteomics Metadata Interchange Schema) is a proof-of-concept prototype of an open metadata standard for compositional proteomics.

  17. A Query Cache Tool for Optimizing Repeatable and Parallel OLAP Queries

    Science.gov (United States)

    Santos, Ricardo Jorge; Bernardino, Jorge

    On-line analytical processing against data warehouse databases is a common form of getting decision making information for almost every business field. Decision support information oftenly concerns periodic values based on regular attributes, such as sales amounts, percentages, most transactioned items, etc. This means that many similar OLAP instructions are periodically repeated, and simultaneously, between the several decision makers. Our Query Cache Tool takes advantage of previously executed queries, storing their results and the current state of the data which was accessed. Future queries only need to execute against the new data, inserted since the queries were last executed, and join these results with the previous ones. This makes query execution much faster, because we only need to process the most recent data. Our tool also minimizes the execution time and resource consumption for similar queries simultaneously executed by different users, putting the most recent ones on hold until the first finish and returns the results for all of them. The stored query results are held until they are considered outdated, then automatically erased. We present an experimental evaluation of our tool using a data warehouse based on a real-world business dataset and use a set of typical decision support queries to discuss the results, showing a very high gain in query execution time.

  18. Modeling and Querying Moving Objects with Social Relationships

    Directory of Open Access Journals (Sweden)

    Hengcai Zhang

    2016-07-01

    Full Text Available Current moving-object database (MOD systems focus on management of movement data, but pay less attention to modelling social relationships between moving objects and spatial-temporal trajectories in an integrated manner. This paper combines moving-object database and social network systems and presents a novel data model called Geo-Social-Moving (GSM that enables the unified management of trajectories, underlying geographical space and social relationships for mass moving objects. A bulk of user-defined data types and corresponding operators are also proposed to facilitate geo-social queries on moving objects. An implementation framework for the GSM model is proposed, and a prototype system based on native Neo4J is then developed with two real-world data sets from the location-based social network systems. Compared with solutions based on traditional extended relational database management systems characterized by time-consuming table join operations, the proposed GSM model characterized by graph traversal is argued to be more powerful in representing mass moving objects with social relationships, and more efficient and stable for geo-social querying.

  19. Creating metadata that work for digital libraries and Google

    OpenAIRE

    Dawson, Alan

    2004-01-01

    For many years metadata has been recognised as a significant component of the digital information environment. Substantial work has gone into creating complex metadata schemes for describing digital content. Yet increasingly Web search engines, and Google in particular, are the primary means of discovering and selecting digital resources, although they make little use of metadata. This article considers how digital libraries can gain more value from their metadata by adapting it for Google us...

  20. Multimedia Learning Systems Based on IEEE Learning Object Metadata (LOM).

    Science.gov (United States)

    Holzinger, Andreas; Kleinberger, Thomas; Muller, Paul

    One of the "hottest" topics in recent information systems and computer science is metadata. Learning Object Metadata (LOM) appears to be a very powerful mechanism for representing metadata, because of the great variety of LOM Objects. This is on of the reasons why the LOM standard is repeatedly cited in projects in the field of eLearning…

  1. Handling multiple metadata streams regarding digital learning material

    NARCIS (Netherlands)

    Roes, J.B.M.; Vuuren, J. van; Verbeij, N.; Nijstad, H.

    2010-01-01

    This paper presents the outcome of a study performed in the Netherlands on handling multiple metadata streams regarding digital learning material. The paper describes the present metadata architecture in the Netherlands, the present suppliers and users of metadata and digital learning materials. It

  2. A quick scan on possibilities for automatic metadata generation

    NARCIS (Netherlands)

    Benneker, Frank

    2006-01-01

    The Quick Scan is a report on research into useable solutions for automatic generation of metadata or parts of metadata. The aim of this study is to explore possibilities for facilitating the process of attaching metadata to learning objects. This document is aimed at developers of digital learning

  3. Enabling Semantic Queries Against the Spatial Database

    Directory of Open Access Journals (Sweden)

    PENG, X.

    2012-02-01

    Full Text Available The spatial database based upon the object-relational database management system (ORDBMS has the merits of a clear data model, good operability and high query efficiency. That is why it has been widely used in spatial data organization and management. However, it cannot express the semantic relationships among geospatial objects, making the query results difficult to meet the user's requirement well. Therefore, this paper represents an attempt to combine the Semantic Web technology with the spatial database so as to make up for the traditional database's disadvantages. In this way, on the one hand, users can take advantages of ORDBMS to store and manage spatial data; on the other hand, if the spatial database is released in the form of Semantic Web, the users could describe a query more concisely with the cognitive pattern which is similar to that of daily life. As a consequence, this methodology enables the benefits of both Semantic Web and the object-relational database (ORDB available. The paper discusses systematically the semantic enriched spatial database's architecture, key technologies and implementation. Subsequently, we demonstrate the function of spatial semantic queries via a practical prototype system. The query results indicate that the method used in this study is feasible.

  4. Index and query methods in road networks

    CERN Document Server

    Feng, Jun

    2015-01-01

    This book presents the index and query techniques on road network and moving objects which are limited to road network. Here, the road network of non-Euclidean space has its unique characteristics such that two moving objects may be very close in a straight line distance. The index used in two-dimensional Euclidean space is not always appropriate for moving objects on road network. Therefore, the index structure needs to be improved in order to obtain suitable indexing methods, explore the shortest path and acquire nearest neighbor query and aggregation query methods under the new index structures. Chapter 1 of this book introduces the present situation of intelligent traffic and index in road network, Chapter 2 introduces the relevant existing spatial indexing methods. Chapter 3-5 focus on several issues of road network and query, they involves: traffic road network models (see Chapter 3), index structures (see Chapter 4) and aggregate query methods (see Chapter 5). Finally, in Chapter 6, the book briefly de...

  5. Metadata Guidelines for Digital Moving Images

    National Research Council Canada - National Science Library

    Flynn, Marcy

    2000-01-01

    ...." Examples for each data element and sample records are presented. Technical metadata essential to the preservation and management of digital materials is also addressed in the Guidelines. This manual is also available at the Defense Virtual Library Web site, http://dvl.dtic.mil:8100/notes.html.

  6. Metadata-catalogue of European spatial datasets

    NARCIS (Netherlands)

    Willemen, J.P.M.; Kooistra, L.

    2004-01-01

    In order to facilitate a more effective accessibility of European spatial datasets, an assessment was carried out by the GeoDesk of the WUR to identify and describe key datasets that will be relevant for research carried out within WUR and MNP. The outline of the Metadata catalogue European spatial

  7. Metadata Effectiveness in Internet Discovery: An Analysis of Digital Collection Metadata Elements and Internet Search Engine Keywords

    Science.gov (United States)

    Yang, Le

    2016-01-01

    This study analyzed digital item metadata and keywords from Internet search engines to learn what metadata elements actually facilitate discovery of digital collections through Internet keyword searching and how significantly each metadata element affects the discovery of items in a digital repository. The study found that keywords from Internet…

  8. ADF/ADC Web Tools for Browsing and Visualizing Astronomical Catalogs and NASA Astrophysics Mission Metadata

    Science.gov (United States)

    Shaya, E.; Kargatis, V.; Blackwell, J.; Borne, K.; White, R. A.; Cheung, C.

    1998-05-01

    Several new web based services have been introduced this year by the Astrophysics Data Facility (ADF) at the NASA Goddard Space Flight Center. IMPReSS is a graphical interface to astrophysics databases that presents the user with the footprints of observations of space-based missions. It also aids astronomers in retrieving these data by sending requests to distributed data archives. The VIEWER is a reader of ADC astronomical catalogs and journal tables that allows subsetting of catalogs by column choices and range selection and provides database-like search capability within each table. With it, the user can easily find the table data most appropriate for their purposes and then download either the subset table or the original table. CATSEYE is a tool that plots output tables from the VIEWER (and soon AMASE), making exploring the datasets fast and easy. Having completed the basic functionality of these systems, we are enhancing the site to provide advanced functionality. These will include: market basket storage of tables and records of VIEWER output for IMPReSS and AstroBrowse queries, non-HTML table responses to AstroBrowse type queries, general column arithmetic, modularity to allow entrance into the sequence of web pages at any point, histogram plots, navigable maps, and overplotting of catalog objects on mission footprint maps. When completed, the ADF/ADC web facilities will provide astronomical tabled data and mission retrieval information in several hyperlinked environments geared for users at any level, from the school student to the typical astronomer to the expert datamining tools at state-of-the-art data centers.

  9. Metadata Authoring with Versatility and Extensibility

    Science.gov (United States)

    Pollack, Janine; Olsen, Lola

    2004-01-01

    NASA's Global Change Master Directory (GCMD) assists the scientific community in the discovery of and linkage to Earth science data sets and related services. The GCMD holds over 13,800 data set descriptions in Directory Interchange Format (DIF) and 700 data service descriptions in Service Entry Resource Format (SERF), encompassing the disciplines of geology, hydrology, oceanography, meteorology, and ecology. Data descriptions also contain geographic coverage information and direct links to the data, thus allowing researchers to discover data pertaining to a geographic location of interest, then quickly acquire those data. The GCMD strives to be the preferred data locator for world-wide directory-level metadata. In this vein, scientists and data providers must have access to intuitive and efficient metadata authoring tools. Existing GCMD tools are attracting widespread usage; however, a need for tools that are portable, customizable and versatile still exists. With tool usage directly influencing metadata population, it has become apparent that new tools are needed to fill these voids. As a result, the GCMD has released a new authoring tool allowing for both web-based and stand-alone authoring of descriptions. Furthermore, this tool incorporates the ability to plug-and-play the metadata format of choice, offering users options of DIF, SERF, FGDC, ISO or any other defined standard. Allowing data holders to work with their preferred format, as well as an option of a stand-alone application or web-based environment, docBUlLDER will assist the scientific community in efficiently creating quality data and services metadata.

  10. jQuery Mobile Up and Running

    CERN Document Server

    Firtman, Maximiliano

    2012-01-01

    Would you like to build one mobile web application that works on iPad and Kindle Fire as well as iPhone and Android smartphones? This introductory guide to jQuery Mobile shows you how. Through a series of hands-on exercises, you'll learn the best ways to use this framework's many interface components to build customizable, multiplatform apps. You don't need any programming skills or previous experience with jQuery to get started. By the time you finish this book, you'll know how to create responsive, Ajax-based interfaces that work on a variety of smartphones and tablets, using jQuery Mobile

  11. A Query System for Texts with Macros

    Science.gov (United States)

    Kwon, Keehang; Kang, Dae-Seong; Kim, Jinsoo

    We propose a query language based on extended regular expressions. This language extends texts with text-generating macros. These macros make it possible to define languages in a compressed, elegant way. This paper also extends queries with linear implications and additive (classical) conjunctions. To be precise, it allows goals of the form D _??_ G and G1 & G2 where D is a text or a macro and G is a query. The first goal is solved by adding D to the current text and then solving G. This goal is flexible in controlling the current text dynamically. The second goal is solved by solving both G1 and G2 from the current text. This goal is particularly useful for internet search.

  12. Optimal Planar Orthogonal Skyline Counting Queries

    DEFF Research Database (Denmark)

    Brodal, Gerth Stølting; Larsen, Kasper Green

    2014-01-01

    The skyline of a set of points in the plane is the subset of maximal points, where a point (x,y) is maximal if no other point (x',y') satisfies x'≥ x and y'≥ x. We consider the problem of preprocessing a set P of n points into a space efficient static data structure supporting orthogonal skyline...... counting queries, i.e. given a query rectangle R to report the size of the skyline of P\\cap R. We present a data structure for storing n points with integer coordinates having query time O(lg n/lglg n) and space usage O(n). The model of computation is a unit cost RAM with logarithmic word size. We prove...

  13. jQuery for designers beginner's guide

    CERN Document Server

    MacLees, Natalie

    2014-01-01

    A step-by-step guide that spices up your web pages and designs them in the way you want using the most widely used JavaScript library, jQuery. The beginner-friendly and easy-to-understand approach of the book will help get to grips with jQuery in no time. If you know the fundamentals of HTML and CSS, and want to extend your knowledge by learning to use JavaScript, then this is just the book for you. jQuery makes JavaScript straightforward and approachable - you'll be surprised at how easy it can be to add animations and special effects to your beautifully designed pages.

  14. A journey to Semantic Web query federation in the life sciences.

    Science.gov (United States)

    Cheung, Kei-Hoi; Frost, H Robert; Marshall, M Scott; Prud'hommeaux, Eric; Samwald, Matthias; Zhao, Jun; Paschke, Adrian

    2009-10-01

    As interest in adopting the Semantic Web in the biomedical domain continues to grow, Semantic Web technology has been evolving and maturing. A variety of technological approaches including triplestore technologies, SPARQL endpoints, Linked Data, and Vocabulary of Interlinked Datasets have emerged in recent years. In addition to the data warehouse construction, these technological approaches can be used to support dynamic query federation. As a community effort, the BioRDF task force, within the Semantic Web for Health Care and Life Sciences Interest Group, is exploring how these emerging approaches can be utilized to execute distributed queries across different neuroscience data sources. We have created two health care and life science knowledge bases. We have explored a variety of Semantic Web approaches to describe, map, and dynamically query multiple datasets. We have demonstrated several federation approaches that integrate diverse types of information about neurons and receptors that play an important role in basic, clinical, and translational neuroscience research. Particularly, we have created a prototype receptor explorer which uses OWL mappings to provide an integrated list of receptors and executes individual queries against different SPARQL endpoints. We have also employed the AIDA Toolkit, which is directed at groups of knowledge workers who cooperatively search, annotate, interpret, and enrich large collections of heterogeneous documents from diverse locations. We have explored a tool called "FeDeRate", which enables a global SPARQL query to be decomposed into subqueries against the remote databases offering either SPARQL or SQL query interfaces. Finally, we have explored how to use the vocabulary of interlinked Datasets (voiD) to create metadata for describing datasets exposed as Linked Data URIs or SPARQL endpoints. We have demonstrated the use of a set of novel and state-of-the-art Semantic Web technologies in support of a neuroscience query

  15. Using Bitmap Indexing Technology for Combined Numerical and TextQueries

    Energy Technology Data Exchange (ETDEWEB)

    Stockinger, Kurt; Cieslewicz, John; Wu, Kesheng; Rotem, Doron; Shoshani, Arie

    2006-10-16

    In this paper, we describe a strategy of using compressedbitmap indices to speed up queries on both numerical data and textdocuments. By using an efficient compression algorithm, these compressedbitmap indices are compact even for indices with millions of distinctterms. Moreover, bitmap indices can be used very efficiently to answerBoolean queries over text documents involving multiple query terms.Existing inverted indices for text searches are usually inefficient forcorpora with a very large number of terms as well as for queriesinvolving a large number of hits. We demonstrate that our compressedbitmap index technology overcomes both of those short-comings. In aperformance comparison against a commonly used database system, ourindices answer queries 30 times faster on average. To provide full SQLsupport, we integrated our indexing software, called FastBit, withMonetDB. The integrated system MonetDB/FastBit provides not onlyefficient searches on a single table as FastBit does, but also answersjoin queries efficiently. Furthermore, MonetDB/FastBit also provides avery efficient retrieval mechanism of result records.

  16. Multi-facetted Metadata - Describing datasets with different metadata schemas at the same time

    Science.gov (United States)

    Ulbricht, Damian; Klump, Jens; Bertelmann, Roland

    2013-04-01

    Inspired by the wish to re-use research data a lot of work is done to bring data systems of the earth sciences together. Discovery metadata is disseminated to data portals to allow building of customized indexes of catalogued dataset items. Data that were once acquired in the context of a scientific project are open for reappraisal and can now be used by scientists that were not part of the original research team. To make data re-use easier, measurement methods and measurement parameters must be documented in an application metadata schema and described in a written publication. Linking datasets to publications - as DataCite [1] does - requires again a specific metadata schema and every new use context of the measured data may require yet another metadata schema sharing only a subset of information with the meta information already present. To cope with the problem of metadata schema diversity in our common data repository at GFZ Potsdam we established a solution to store file-based research data and describe these with an arbitrary number of metadata schemas. Core component of the data repository is an eSciDoc infrastructure that provides versioned container objects, called eSciDoc [2] "items". The eSciDoc content model allows assigning files to "items" and adding any number of metadata records to these "items". The eSciDoc items can be submitted, revised, and finally published, which makes the data and metadata available through the internet worldwide. GFZ Potsdam uses eSciDoc to support its scientific publishing workflow, including mechanisms for data review in peer review processes by providing temporary web links for external reviewers that do not have credentials to access the data. Based on the eSciDoc API, panMetaDocs [3] provides a web portal for data management in research projects. PanMetaDocs, which is based on panMetaWorks [4], is a PHP based web application that allows to describe data with any XML-based schema. It uses the eSciDoc infrastructures

  17. Evaluating Trajectory Queries over Imprecise Location Data

    DEFF Research Database (Denmark)

    Xie, Scott, Xike; Cheng, Reynold; Yiu, Man Lung

    2012-01-01

    Trajectory queries, which retrieve nearby objects for every point of a given route, can be used to identify alerts of potential threats along a vessel route, or monitor the adjacent rescuers to a travel path. However, the locations of these objects (e.g., threats, succours) may not be precisely...... obtained due to hardware limitations of measuring devices, as well as the constantly-changing nature of the external environment. Ignoring data uncertainty can render low query quality, and cause undesirable consequences such as missing alerts of threats and poor response time in rescue operations. Also...

  18. Query Optimization Techniques in Microsoft SQL Server

    Directory of Open Access Journals (Sweden)

    Costel Gabriel CORLATAN

    2014-09-01

    Full Text Available Microsoft SQL Server is a relational database management system, having MS-SQL and Transact-SQL as primary structured programming languages. They rely on relational algebra which is mainly used for data insertion, modifying, deletion and retrieval, as well as for data access controlling. The problem with getting the expected results is handled by the management system which has the purpose of finding the best execution plan, this process being called optimization. The most frequently used queries are those of data retrieval through SELECT command. We have to take into consideration that not only the select queries need optimization, but also other objects, such as: index, view or statistics.

  19. Fundamentals of Physical Design and Query Compilation

    CERN Document Server

    Toman, David

    2011-01-01

    Query compilation is the problem of translating user requests formulated over purely conceptual and domain specific ways of understanding data, commonly called logical designs, to efficient executable programs called query plans. Such plans access various concrete data sources through their low-level often iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and how such capabilities relate to logical design is commonly called a physical design. This book is an introduction to the fundamental methods underlying database technology that solves the problem of

  20. Implementation of Quantum Private Queries Using Nuclear Magnetic Resonance

    International Nuclear Information System (INIS)

    Wang Chuan; Hao Liang; Zhao Lian-Jie

    2011-01-01

    We present a modified protocol for the realization of a quantum private query process on a classical database. Using one-qubit query and CNOT operation, the query process can be realized in a two-mode database. In the query process, the data privacy is preserved as the sender would not reveal any information about the database besides her query information, and the database provider cannot retain any information about the query. We implement the quantum private query protocol in a nuclear magnetic resonance system. The density matrix of the memory registers are constructed. (general)

  1. Combined use of semantics and metadata to manage Research Data Life Cycle in Environmental Sciences

    Science.gov (United States)

    Aguilar Gómez, Fernando; de Lucas, Jesús Marco; Pertinez, Esther; Palacio, Aida

    2017-04-01

    The use of metadata to contextualize datasets is quite extended in Earth System Sciences. There are some initiatives and available tools to help data managers to choose the best metadata standard that fit their use cases, like the DCC Metadata Directory (http://www.dcc.ac.uk/resources/metadata-standards). In our use case, we have been gathering physical, chemical and biological data from a water reservoir since 2010. A well metadata definition is crucial not only to contextualize our own data but also to integrate datasets from other sources like satellites or meteorological agencies. That is why we have chosen EML (Ecological Metadata Language), which integrates many different elements to define a dataset, including the project context, instrumentation and parameters definition, and the software used to process, provide quality controls and include the publication details. Those metadata elements can contribute to help both human and machines to understand and process the dataset. However, the use of metadata is not enough to fully support the data life cycle, from the Data Management Plan definition to the Publication and Re-use. To do so, we need to define not only metadata and attributes but also the relationships between them, so semantics are needed. Ontologies, being a knowledge representation, can contribute to define the elements of a research data life cycle, including DMP, datasets, software, etc. They also can define how the different elements are related between them and how they interact. The first advantage of developing an ontology of a knowledge domain is that they provide a common vocabulary hierarchy (i.e. a conceptual schema) that can be used and standardized by all the agents interested in the domain (either humans or machines). This way of using ontologies is one of the basis of the Semantic Web, where ontologies are set to play a key role in establishing a common terminology between agents. To develop an ontology we are using a graphical tool

  2. Evolutionary Algorithms for Boolean Queries Optimization

    Czech Academy of Sciences Publication Activity Database

    Húsek, Dušan; Snášel, Václav; Neruda, Roman; Owais, S.S.J.; Krömer, P.

    2006-01-01

    Roč. 3, č. 1 (2006), s. 15-20 ISSN 1790-0832 R&D Projects: GA AV ČR 1ET100300414 Institutional research plan: CEZ:AV0Z10300504 Keywords : evolutionary algorithms * genetic algorithms * information retrieval * Boolean query Subject RIV: BA - General Mathematics

  3. Boolean Queries Optimization by Genetic Algorithms

    Czech Academy of Sciences Publication Activity Database

    Húsek, Dušan; Owais, S.S.J.; Krömer, P.; Snášel, Václav

    2005-01-01

    Roč. 15, - (2005), s. 395-409 ISSN 1210-0552 R&D Projects: GA AV ČR 1ET100300414 Institutional research plan: CEZ:AV0Z10300504 Keywords : evolutionary algorithms * genetic algorithms * genetic programming * information retrieval * Boolean query Subject RIV: BB - Applied Statistics, Operational Research

  4. Flattening Queries over Nested Data Types

    NARCIS (Netherlands)

    van Ruth, J.

    2006-01-01

    The theory developed in this thesis provides a method to improve the efficiency of querying nested data. The roots of this research lie in the tension between data model expressiveness and performance. Obviously, more expressive data models are more convenient for application programmers. For many

  5. Web-Based Distributed XML Query Processing

    NARCIS (Netherlands)

    Smiljanic, M.; Feng, L.; Jonker, Willem; Blanken, Henk; Grabs, T.; Schek, H-J.; Schenkel, R.; Weikum, G.

    2003-01-01

    Web-based distributed XML query processing has gained in importance in recent years due to the widespread popularity of XML on the Web. Unlike centralized and tightly coupled distributed systems, Web-based distributed database systems are highly unpredictable and uncontrollable, with a rather

  6. Path Minima Queries in Dynamic Weighted Trees

    DEFF Research Database (Denmark)

    Davoodi, Pooya; Brodal, Gerth Stølting; Satti, Srinivasa Rao

    2011-01-01

    update time?} in the comparison and the RAM models. These structures also support inserting a node on an edge, inserting a leaf, and contracting edges. When only insertion and deletion of leaves are desired, we give data structures in the comparison and the RAM models, with optimal query time...

  7. Sonata: Query-Driven Network Telemetry

    KAUST Repository

    Gupta, Arpit

    2017-05-02

    Operating networks depends on collecting and analyzing measurement data. Current technologies do not make it easy to do so, typically because they separate data collection (e.g., packet capture or flow monitoring) from analysis, producing either too much data to answer a general question or too little data to answer a detailed question. In this paper, we present Sonata, a network telemetry system that uses a uniform query interface to drive the joint collection and analysis of network traffic. Sonata takes the advantage of two emerging technologies---streaming analytics platforms and programmable network devices---to facilitate joint collection and analysis. Sonata allows operators to more directly express network traffic analysis tasks in terms of a high-level language. The underlying runtime partitions each query into a portion that runs on the switch and another that runs on the streaming analytics platform iteratively refines the query to efficiently capture only the traffic that pertains to the operator\\'s query, and exploits sketches to reduce state in switches in exchange for more approximate results. Through an evaluation of a prototype implementation, we demonstrate that Sonata can support a wide range of network telemetry tasks with less state in the network, and lower data rates to streaming analytics systems, than current approaches can achieve.

  8. Enabling Incremental Query Re-Optimization.

    Science.gov (United States)

    Liu, Mengmeng; Ives, Zachary G; Loo, Boon Thau

    2016-01-01

    As declarative query processing techniques expand to the Web, data streams, network routers, and cloud platforms, there is an increasing need to re-plan execution in the presence of unanticipated performance changes. New runtime information may affect which query plan we prefer to run. Adaptive techniques require innovation both in terms of the algorithms used to estimate costs , and in terms of the search algorithm that finds the best plan. We investigate how to build a cost-based optimizer that recomputes the optimal plan incrementally given new cost information, much as a stream engine constantly updates its outputs given new data. Our implementation especially shows benefits for stream processing workloads. It lays the foundations upon which a variety of novel adaptive optimization algorithms can be built. We start by leveraging the recently proposed approach of formulating query plan enumeration as a set of recursive datalog queries ; we develop a variety of novel optimization approaches to ensure effective pruning in both static and incremental cases. We further show that the lessons learned in the declarative implementation can be equally applied to more traditional optimizer implementations.

  9. Query-by-Emoji Video Search

    NARCIS (Netherlands)

    Cappallo, S.; Mensink, T.; Snoek, C.G.M.

    2015-01-01

    This technical demo presents Emoji2Video, a query-by-emoji interface for exploring video collections. Ideogram-based video search and representation presents an opportunity for an intuitive, visual interface and concise non-textual summary of video contents, in a form factor that is ideal for small

  10. Beginning SQL queries from novice to professional

    CERN Document Server

    Churcher, Clare

    2016-01-01

    Anyone who does any work at all with databases needs to know something of SQL. This is a friendly and easy-to-read guide to writing queries with the all-important - in the database world - SQL language. The author writes with exceptional clarity.

  11. Approximate Nearest Neighbor Queries among Parallel Segments

    DEFF Research Database (Denmark)

    Emiris, Ioannis Z.; Malamatos, Theocharis; Tsigaridas, Elias

    2010-01-01

    We develop a data structure for answering efficiently approximate nearest neighbor queries over a set of parallel segments in three dimensions. We connect this problem to approximate nearest neighbor searching under weight constraints and approximate nearest neighbor searching on historical data...

  12. Query and document models for enterprise search

    NARCIS (Netherlands)

    Balog, K.; Hofmann, K.; Weerkamp, W.; de Rijke, M.; Voorhees, E.M.; Buckland, L.P.

    2008-01-01

    We describe our participation in the TREC 2007 Enterprise track and detail our language modeling-based approaches. For document search, our focus was on estimating a mixture model using a standard web collection, and on constructing query models by employing blind relevance feedback and using the

  13. Exploiting cost distributions for query optimization

    NARCIS (Netherlands)

    F. Waas; A.J. Pellenkoft (Jan)

    1998-01-01

    textabstractLarge-scale query optimization is, besides its practical relevance, a hard test case for optimization techniques. Since exact methods cannot be applied due to the combinatorial explosion of the search space, heuristics and probabilistic strategies have been deployed for more than a

  14. CUFID-query: accurate network querying through random walk based network flow estimation.

    Science.gov (United States)

    Jeong, Hyundoo; Qian, Xiaoning; Yoon, Byung-Jun

    2017-12-28

    Functional modules in biological networks consist of numerous biomolecules and their complicated interactions. Recent studies have shown that biomolecules in a functional module tend to have similar interaction patterns and that such modules are often conserved across biological networks of different species. As a result, such conserved functional modules can be identified through comparative analysis of biological networks. In this work, we propose a novel network querying algorithm based on the CUFID (Comparative network analysis Using the steady-state network Flow to IDentify orthologous proteins) framework combined with an efficient seed-and-extension approach. The proposed algorithm, CUFID-query, can accurately detect conserved functional modules as small subnetworks in the target network that are expected to perform similar functions to the given query functional module. The CUFID framework was recently developed for probabilistic pairwise global comparison of biological networks, and it has been applied to pairwise global network alignment, where the framework was shown to yield accurate network alignment results. In the proposed CUFID-query algorithm, we adopt the CUFID framework and extend it for local network alignment, specifically to solve network querying problems. First, in the seed selection phase, the proposed method utilizes the CUFID framework to compare the query and the target networks and to predict the probabilistic node-to-node correspondence between the networks. Next, the algorithm selects and greedily extends the seed in the target network by iteratively adding nodes that have frequent interactions with other nodes in the seed network, in a way that the conductance of the extended network is maximally reduced. Finally, CUFID-query removes irrelevant nodes from the querying results based on the personalized PageRank vector for the induced network that includes the fully extended network and its neighboring nodes. Through extensive

  15. Metadata Analysis at the Command-Line

    Directory of Open Access Journals (Sweden)

    Mark Phillips

    2013-01-01

    Full Text Available Over the past few years the University of North Texas Libraries' Digital Projects Unit (DPU has developed a set of metadata analysis tools, processes, and methodologies aimed at helping to focus limited quality control resources on the areas of the collection where they might have the most benefit. The key to this work lies in its simplicity: records harvested from OAI-PMH-enabled digital repositories are transformed into a format that makes them easily parsable using traditional Unix/Linux-based command-line tools. This article describes the overall methodology, introduces two simple open-source tools developed to help with the aforementioned harvesting and breaking, and provides example commands to demonstrate some common metadata analysis requests. All software tools described in the article are available with an open-source license via the author's GitHub account.

  16. Testing Metadata Existence of Web Map Services

    Directory of Open Access Journals (Sweden)

    Jan Růžička

    2011-05-01

    Full Text Available For a general user is quite common to use data sources available on WWW. Almost all GIS software allow to use data sources available via Web Map Service (ISO/OGC standard interface. The opportunity to use different sources and combine them brings a lot of problems that were discussed many times on conferences or journal papers. One of the problem is based on non existence of metadata for published sources. The question was: were the discussions effective? The article is partly based on comparison of situation for metadata between years 2007 and 2010. Second part of the article is focused only on 2010 year situation. The paper is created in a context of research of intelligent map systems, that can be used for an automatic or a semi-automatic map creation or a map evaluation.

  17. The Usefulness of Multilevel Hash Tables with Multiple Hash Functions in Large Databases

    Directory of Open Access Journals (Sweden)

    A.T. Akinwale

    2009-05-01

    Full Text Available In this work, attempt is made to select three good hash functions which uniformly distribute hash values that permute their internal states and allow the input bits to generate different output bits. These functions are used in different levels of hash tables that are coded in Java Programming Language and a quite number of data records serve as primary data for testing the performances. The result shows that the two-level hash tables with three different hash functions give a superior performance over one-level hash table with two hash functions or zero-level hash table with one function in term of reducing the conflict keys and quick lookup for a particular element. The result assists to reduce the complexity of join operation in query language from O( n2 to O( 1 by placing larger query result, if any, in multilevel hash tables with multiple hash functions and generate shorter query result.

  18. A Highly Available Grid Metadata Catalog

    DEFF Research Database (Denmark)

    Jensen, Henrik Thostrup; Kleist, Joshva

    2009-01-01

    This article presents a metadata catalog, intended foruse in grids. The catalog provides high availability, by replication across several hosts. The replicas are kept consistent using a replication protocol based on the Paxos algorithm. A majority of the replicas must be available in order...... HTTP with proxy certificates, and uses GACL for flexible access control.The performance of the catalog is tested in several ways, including a distributed setup between geographically separated sites....

  19. GraphMeta: Managing HPC Rich Metadata in Graphs

    Energy Technology Data Exchange (ETDEWEB)

    Dai, Dong; Chen, Yong; Carns, Philip; Jenkins, John; Zhang, Wei; Ross, Robert

    2016-01-01

    High-performance computing (HPC) systems face increasingly critical metadata management challenges, especially in the approaching exascale era. These challenges arise not only from exploding metadata volumes, but also from increasingly diverse metadata, which contains data provenance and arbitrary user-defined attributes in addition to traditional POSIX metadata. This ‘rich’ metadata is becoming critical to supporting advanced data management functionality such as data auditing and validation. In our prior work, we identified a graph-based model as a promising solution to uniformly manage HPC rich metadata due to its flexibility and generality. However, at the same time, graph-based HPC rich metadata anagement also introduces significant challenges to the underlying infrastructure. In this study, we first identify the challenges on the underlying infrastructure to support scalable, high-performance rich metadata management. Based on that, we introduce GraphMeta, a graphbased engine designed for this use case. It achieves performance scalability by introducing a new graph partitioning algorithm and a write-optimal storage engine. We evaluate GraphMeta under both synthetic and real HPC metadata workloads, compare it with other approaches, and demonstrate its advantages in terms of efficiency and usability for rich metadata management in HPC systems.

  20. Transforming Dermatologic Imaging for the Digital Era: Metadata and Standards.

    Science.gov (United States)

    Caffery, Liam J; Clunie, David; Curiel-Lewandrowski, Clara; Malvehy, Josep; Soyer, H Peter; Halpern, Allan C

    2018-01-17

    Imaging is increasingly being used in dermatology for documentation, diagnosis, and management of cutaneous disease. The lack of standards for dermatologic imaging is an impediment to clinical uptake. Standardization can occur in image acquisition, terminology, interoperability, and metadata. This paper presents the International Skin Imaging Collaboration position on standardization of metadata for dermatologic imaging. Metadata is essential to ensure that dermatologic images are properly managed and interpreted. There are two standards-based approaches to recording and storing metadata in dermatologic imaging. The first uses standard consumer image file formats, and the second is the file format and metadata model developed for the Digital Imaging and Communication in Medicine (DICOM) standard. DICOM would appear to provide an advantage over using consumer image file formats for metadata as it includes all the patient, study, and technical metadata necessary to use images clinically. Whereas, consumer image file formats only include technical metadata and need to be used in conjunction with another actor-for example, an electronic medical record-to supply the patient and study metadata. The use of DICOM may have some ancillary benefits in dermatologic imaging including leveraging DICOM network and workflow services, interoperability of images and metadata, leveraging existing enterprise imaging infrastructure, greater patient safety, and better compliance to legislative requirements for image retention.

  1. Mdmap: A Tool for Metadata Collection and Matching

    Directory of Open Access Journals (Sweden)

    Rico Simke

    2014-10-01

    Full Text Available This paper describes a front-end for the semi-automatic collection, matching, and generation of bibliographic metadata obtained from different sources for use within a digitization architecture. The Library of a Billion Words project is building an infrastructure for digitizing text that requires high-quality bibliographic metadata, but currently only sparse metadata from digitized editions is available. The project’s approach is to collect metadata for each digitized item from as many sources as possible. An expert user can then use an intuitive front-end tool to choose matching metadata. The collected metadata are centrally displayed in an interactive grid view. The user can choose which metadata they want to assign to a certain edition, and export these data as MARCXML. This paper presents a new approach to bibliographic work and metadata correction. We try to achieve a high quality of the metadata by generating a large amount of metadata to choose from, as well as by giving librarians an intuitive tool to manage their data.

  2. Leveraging Metadata to Create Interactive Images... Today!

    Science.gov (United States)

    Hurt, Robert L.; Squires, G. K.; Llamas, J.; Rosenthal, C.; Brinkworth, C.; Fay, J.

    2011-01-01

    The image gallery for NASA's Spitzer Space Telescope has been newly rebuilt to fully support the Astronomy Visualization Metadata (AVM) standard to create a new user experience both on the website and in other applications. We encapsulate all the key descriptive information for a public image, including color representations and astronomical and sky coordinates and make it accessible in a user-friendly form on the website, but also embed the same metadata within the image files themselves. Thus, images downloaded from the site will carry with them all their descriptive information. Real-world benefits include display of general metadata when such images are imported into image editing software (e.g. Photoshop) or image catalog software (e.g. iPhoto). More advanced support in Microsoft's WorldWide Telescope can open a tagged image after it has been downloaded and display it in its correct sky position, allowing comparison with observations from other observatories. An increasing number of software developers are implementing AVM support in applications and an online image archive for tagged images is under development at the Spitzer Science Center. Tagging images following the AVM offers ever-increasing benefits to public-friendly imagery in all its standard forms (JPEG, TIFF, PNG). The AVM standard is one part of the Virtual Astronomy Multimedia Project (VAMP); http://www.communicatingastronomy.org

  3. The European Database of Seismogenic Faults (EDSF) for EPOS: implementation of OGC services and metadata publication

    Science.gov (United States)

    Vallone, Roberto; Basili, Roberto; Tarabusi, Gabriele; Burrato, Pierfrancesco; Valensise, Gianluca

    2017-04-01

    The European Database of Seismogenic Faults (EDSF; http://diss.rm.ingv.it/share-edsf/; doi: 10.6092/INGV.IT-SHARE-EDSF) is part of the Hazard & Risk pillar of EPOS-Implementation Phase (WP8, Seismology). Its tables contain faults that are deemed capable of generating earthquakes of magnitude 5.5 and larger, and aims at making available a homogenous input dataset for use in the assessment of ground-shaking hazard in the extended Euro-Mediterranean area or for developing regional tectonic and geodynamic models. In keeping with the goals set forth by EPOS, EDSF data are currently distributed through the Open Geospatial Consortium (OGC) service standards known as WFS (Web Feature Service) and WMS (Web Map Service), both complying with ISO standards. We present the software infrastructure implemented for the publication of the EDSF-OGC services and of the related metadata. The infrastructure was entirely built using free and open source software. The metadata were published as web services following the recommendations of the EPOS metadata model reference.

  4. jQuery UI 1.10 the user interface library for jQuery

    CERN Document Server

    Libby, Alex

    2013-01-01

    This book consists of an easy-to-follow, example-based approach that leads you step-by-step through the implementation and customization of each library component.This book is for frontend designers and developers who need to learn how to use jQuery UI quickly. To get the most out of this book, you should have a good working knowledge of HTML, CSS, and JavaScript, and should ideally be comfortable using jQuery.

  5. Graphical modeling and query language for hospitals.

    Science.gov (United States)

    Barzdins, Janis; Barzdins, Juris; Rencis, Edgars; Sostaks, Agris

    2013-01-01

    So far there has been little evidence that implementation of the health information technologies (HIT) is leading to health care cost savings. One of the reasons for this lack of impact by the HIT likely lies in the complexity of the business process ownership in the hospitals. The goal of our research is to develop a business model-based method for hospital use which would allow doctors to retrieve directly the ad-hoc information from various hospital databases. We have developed a special domain-specific process modelling language called the MedMod. Formally, we define the MedMod language as a profile on UML Class diagrams, but we also demonstrate it on examples, where we explain the semantics of all its elements informally. Moreover, we have developed the Process Query Language (PQL) that is based on MedMod process definition language. The purpose of PQL is to allow a doctor querying (filtering) runtime data of hospital's processes described using MedMod. The MedMod language tries to overcome deficiencies in existing process modeling languages, allowing to specify the loosely-defined sequence of the steps to be performed in the clinical process. The main advantages of PQL are in two main areas - usability and efficiency. They are: 1) the view on data through "glasses" of familiar process, 2) the simple and easy-to-perceive means of setting filtering conditions require no more expertise than using spreadsheet applications, 3) the dynamic response to each step in construction of the complete query that shortens the learning curve greatly and reduces the error rate, and 4) the selected means of filtering and data retrieving allows to execute queries in O(n) time regarding the size of the dataset. We are about to continue developing this project with three further steps. First, we are planning to develop user-friendly graphical editors for the MedMod process modeling and query languages. The second step is to do evaluation of usability the proposed language and tool

  6. Multiple k Nearest Neighbor Query Processing in Spatial Network Databases

    DEFF Research Database (Denmark)

    Xuegang, Huang; Jensen, Christian Søndergaard; Saltenis, Simonas

    2006-01-01

    This paper concerns the efficient processing of multiple k nearest neighbor queries in a road-network setting. The assumed setting covers a range of scenarios such as the one where a large population of mobile service users that are constrained to a road network issue nearest-neighbor queries...... for points of interest that are accessible via the road network. Given multiple k nearest neighbor queries, the paper proposes progressive techniques that selectively cache query results in main memory and subsequently reuse these for query processing. The paper initially proposes techniques for the case...... neighbor query processing....

  7. The survey of large-scale query classification

    Science.gov (United States)

    Zhou, Sanduo; Cheng, Kefei; Men, Lijun

    2017-04-01

    In recent years, a lot of researches have been done on query classification. The paper introduces the recent researches on query classification in detail, mainly including the source of query log, the category systems, the feature extraction methods, classification methods and the evaluation methodology. Then it discusses the issues of large-scale query classification and the solved methods combined with big data analysis systems. The research result shows there still are several problems and challenges, such as lack of authoritative classification system and evaluation methodology, efficiency of the feature extraction method, uncertainty of the performance on large-scale query log and the further query classification on the big data platform, etc.

  8. GEO Label Web Services for Dynamic and Effective Communication of Geospatial Metadata Quality

    Science.gov (United States)

    Lush, Victoria; Nüst, Daniel; Bastin, Lucy; Masó, Joan; Lumsden, Jo

    2014-05-01

    -like label, which are coloured according to metadata availability and are clickable to allow a user to engage with the original metadata and explore specific aspects in more detail. To support this graphical representation and allow for wider deployment architectures we have implemented two Web services, a PHP and a Java implementation, that generate GEO label representations by combining producer metadata (from standard catalogues or other published locations) with structured user feedback. Both services accept encoded URLs of publicly available metadata documents or metadata XML files as HTTP POST and GET requests and apply XPath and XSLT mappings to transform producer and feedback XML documents into clickable SVG GEO label representations. The label and services are underpinned by two XML-based quality models. The first is a producer model that extends ISO 19115 and 19157 to allow fuller citation of reference data, presentation of pixel- and dataset- level statistical quality information, and encoding of 'traceability' information on the lineage of an actual quality assessment. The second is a user quality model (realised as a feedback server and client) which allows reporting and query of ratings, usage reports, citations, comments and other domain knowledge. Both services are Open Source and are available on GitHub at https://github.com/lushv/geolabel-service and https://github.com/52North/GEO-label-java. The functionality of these services can be tested using our GEO label generation demos, available online at http://www.geolabel.net/demo.html and http://geoviqua.dev.52north.org/glbservice/index.jsf.

  9. Study on high-level waste geological disposal metadata model

    International Nuclear Information System (INIS)

    Ding Xiaobin; Wang Changhong; Zhu Hehua; Li Xiaojun

    2008-01-01

    This paper expatiated the concept of metadata and its researches within china and abroad, then explain why start the study on the metadata model of high-level nuclear waste deep geological disposal project. As reference to GML, the author first set up DML under the framework of digital underground space engineering. Based on DML, a standardized metadata employed in high-level nuclear waste deep geological disposal project is presented. Then, a Metadata Model with the utilization of internet is put forward. With the standardized data and CSW services, this model may solve the problem in the data sharing and exchanging of different data form A metadata editor is build up in order to search and maintain metadata based on this model. (authors)

  10. An emergent theory of digital library metadata enrich then filter

    CERN Document Server

    Stevens, Brett

    2015-01-01

    An Emergent Theory of Digital Library Metadata is a reaction to the current digital library landscape that is being challenged with growing online collections and changing user expectations. The theory provides the conceptual underpinnings for a new approach which moves away from expert defined standardised metadata to a user driven approach with users as metadata co-creators. Moving away from definitive, authoritative, metadata to a system that reflects the diversity of users’ terminologies, it changes the current focus on metadata simplicity and efficiency to one of metadata enriching, which is a continuous and evolving process of data linking. From predefined description to information conceptualised, contextualised and filtered at the point of delivery. By presenting this shift, this book provides a coherent structure in which future technological developments can be considered.

  11. Cumulative query method for influenza surveillance using search engine data.

    Science.gov (United States)

    Seo, Dong-Woo; Jo, Min-Woo; Sohn, Chang Hwan; Shin, Soo-Yong; Lee, JaeHo; Yu, Maengsoo; Kim, Won Young; Lim, Kyoung Soo; Lee, Sang-Il

    2014-12-16

    Internet search queries have become an important data source in syndromic surveillance system. However, there is currently no syndromic surveillance system using Internet search query data in South Korea. The objective of this study was to examine correlations between our cumulative query method and national influenza surveillance data. Our study was based on the local search engine, Daum (approximately 25% market share), and influenza-like illness (ILI) data from the Korea Centers for Disease Control and Prevention. A quota sampling survey was conducted with 200 participants to obtain popular queries. We divided the study period into two sets: Set 1 (the 2009/10 epidemiological year for development set 1 and 2010/11 for validation set 1) and Set 2 (2010/11 for development Set 2 and 2011/12 for validation Set 2). Pearson's correlation coefficients were calculated between the Daum data and the ILI data for the development set. We selected the combined queries for which the correlation coefficients were .7 or higher and listed them in descending order. Then, we created a cumulative query method n representing the number of cumulative combined queries in descending order of the correlation coefficient. In validation set 1, 13 cumulative query methods were applied, and 8 had higher correlation coefficients (min=.916, max=.943) than that of the highest single combined query. Further, 11 of 13 cumulative query methods had an r value of ≥.7, but 4 of 13 combined queries had an r value of ≥.7. In validation set 2, 8 of 15 cumulative query methods showed higher correlation coefficients (min=.975, max=.987) than that of the highest single combined query. All 15 cumulative query methods had an r value of ≥.7, but 6 of 15 combined queries had an r value of ≥.7. Cumulative query method showed relatively higher correlation with national influenza surveillance data than combined queries in the development and validation set.

  12. The role of metadata in managing large environmental science datasets. Proceedings

    Energy Technology Data Exchange (ETDEWEB)

    Melton, R.B.; DeVaney, D.M. [eds.] [Pacific Northwest Lab., Richland, WA (United States); French, J. C. [Univ. of Virginia, (United States)

    1995-06-01

    The purpose of this workshop was to bring together computer science researchers and environmental sciences data management practitioners to consider the role of metadata in managing large environmental sciences datasets. The objectives included: establishing a common definition of metadata; identifying categories of metadata; defining problems in managing metadata; and defining problems related to linking metadata with primary data.

  13. PCDDB: the Protein Circular Dichroism Data Bank, a repository for circular dichroism spectral and metadata.

    Science.gov (United States)

    Whitmore, Lee; Woollett, Benjamin; Miles, Andrew John; Klose, D P; Janes, Robert W; Wallace, B A

    2011-01-01

    The Protein Circular Dichroism Data Bank (PCDDB) is a public repository that archives and freely distributes circular dichroism (CD) and synchrotron radiation CD (SRCD) spectral data and their associated experimental metadata. All entries undergo validation and curation procedures to ensure completeness, consistency and quality of the data included. A web-based interface enables users to browse and query sample types, sample conditions, experimental parameters and provides spectra in both graphical display format and as downloadable text files. The entries are linked, when appropriate, to primary sequence (UniProt) and structural (PDB) databases, as well as to secondary databases such as the Enzyme Commission functional classification database and the CATH fold classification database, as well as to literature citations. The PCDDB is available at: http://pcddb.cryst.bbk.ac.uk.

  14. Date restricted queries in web search engines

    OpenAIRE

    Lewandowski, Dirk

    2004-01-01

    Search engines usually offer a date restricted search on their advanced search pages. But determining the actual update of a web page is not without problems. We conduct a study testing date restricted queries on the search engines Google, Teoma and Yahoo!. We find that these searches fail to work properly in the examined engines. We discuss implications of this for further research and search engine development.

  15. Advanced SPARQL querying in small molecule databases

    Czech Academy of Sciences Publication Activity Database

    Galgonek, Jakub; Hurt, T.; Michlíková, V.; Onderka, P.; Schwarz, J.; Vondrášek, Jiří

    2016-01-01

    Roč. 8, Jun 6 (2016), č. článku 31. ISSN 1758-2946 R&D Projects : GA MŠk(CZ) LM2015047 Institutional support: RVO:61388963 Keywords : Resource Description Framework * SPARQL query language * Database of small molecules Subject RIV: CF - Physical ; Theoretical Chemistry Impact factor: 4.220, year: 2016 http://jcheminf.springeropen.com/articles/10.1186/s13321-016-0144-4

  16. Query-Structure Based Web Page Indexing

    Science.gov (United States)

    2012-11-01

    task. 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT Same as Report (SAR) 18. NUMBER OF PAGES 13 19a. NAME OF...finding, Entity finding, and Web pages classification . The design of highly-scalable indexing algorithms is needed, especially with an estimate of one...content, e.g., “ Fibromyalgia " or "Lipoma". • Combining: this type of query is processed using primitive keywords from urls and/or titles that imply

  17. TEMPORAL QUERY PROCESSIG USING SQL SERVER

    OpenAIRE

    Vali Shaik, Mastan; Sujatha, P

    2017-01-01

    Most data sources in real-life are not static but change their information in time. This evolution of data in time can give valuable insights to business analysts. Temporal data refers to data, where changes over time or temporal aspects play a central role. Temporal data denotes the evaluation of object characteristics over time. One of the main unresolved problems that arise during the data mining process is treating data that contains temporal information. Temporal queries on time evolving...

  18. A Practical Python API for Querying AFLOWLIB

    OpenAIRE

    Rosenbrock, Conred W.

    2017-01-01

    Large databases such as aflowlib.org provide valuable data sources for discovering material trends through machine learning. Although a REST API and query language are available, there is a learning curve associated with the AFLUX language that acts as a barrier for new users. Additionally, the data is stored using non-standard serialization formats. Here we present a high-level API that allows immediate access to the aflowlib data using standard python operators and language features. It pro...

  19. STOQS: The Spatial Temporal Oceanographic Query System

    Science.gov (United States)

    McCann, M. P.; Schramm, R.

    2010-12-01

    The Spatial-Temporal Oceanographic Query System (STOQS) has been developed at the Monterey Bay Aquarium Research Institute to improve access and visualization of a multi-decadal archive of upper water column observations. STOQS consists of a set of applications, operational procedures, and a geospatial relational database. Borrowing a database schema from the Geographic Information System community we've implemented a database that is tuned for efficient queries across several dimensions of the data model. An Object Relational Mapping (ORM) tool was used to hide the complexity of SQL that results from our highly normalized data model. The Python scripting language is used to write the Extract Translate Load (ETL) programs for populating the database with data from our long-term operational archives. These archives include collections of Climate Forecast convention netCDF files of mooring and autonomous underwater vehicle data and other special purpose relational databases. This poster describes the specific tools and techniques used to implement STOQS. Though still in development the system already provides benefits to users through a Google Earth interface and an ability to conduct fast queries across multiple previously non-interoperable data sets.

  20. Assessing Metadata Quality of a Federally Sponsored Health Data Repository

    OpenAIRE

    Marc, David T.; Beattie, James; Herasevich, Vitaly; Gatewood, La?l; Zhang, Rui

    2017-01-01

    The U.S. Federal Government developed HealthData.gov to disseminate healthcare datasets to the public. Metadata is provided for each datasets and is the sole source of information to find and retrieve data. This study employed automated quality assessments of the HealthData.gov metadata published from 2012 to 2014 to measure completeness, accuracy, and consistency of applying standards. The results demonstrated that metadata published in earlier years had lower completeness, accuracy, and con...

  1. CMO: Cruise Metadata Organizer for JAMSTEC Research Cruises

    Science.gov (United States)

    Fukuda, K.; Saito, H.; Hanafusa, Y.; Vanroosebeke, A.; Kitayama, T.

    2011-12-01

    JAMSTEC's Data Research Center for Marine-Earth Sciences manages and distributes a wide variety of observational data and samples obtained from JAMSTEC research vessels and deep sea submersibles. Generally, metadata are essential to identify data and samples were obtained. In JAMSTEC, cruise metadata include cruise information such as cruise ID, name of vessel, research theme, and diving information such as dive number, name of submersible and position of diving point. They are submitted by chief scientists of research cruises in the Microsoft Excel° spreadsheet format, and registered into a data management database to confirm receipt of observational data files, cruise summaries, and cruise reports. The cruise metadata are also published via "JAMSTEC Data Site for Research Cruises" within two months after end of cruise. Furthermore, these metadata are distributed with observational data, images and samples via several data and sample distribution websites after a publication moratorium period. However, there are two operational issues in the metadata publishing process. One is that duplication efforts and asynchronous metadata across multiple distribution websites due to manual metadata entry into individual websites by administrators. The other is that differential data types or representation of metadata in each website. To solve those problems, we have developed a cruise metadata organizer (CMO) which allows cruise metadata to be connected from the data management database to several distribution websites. CMO is comprised of three components: an Extensible Markup Language (XML) database, an Enterprise Application Integration (EAI) software, and a web-based interface. The XML database is used because of its flexibility for any change of metadata. Daily differential uptake of metadata from the data management database to the XML database is automatically processed via the EAI software. Some metadata are entered into the XML database using the web

  2. CHIME: A Metadata-Based Distributed Software Development Environment

    National Research Council Canada - National Science Library

    Dossick, Stephen E; Kaiser, Gail E

    2005-01-01

    We introduce CHIME, the Columbia Hypermedia IMmersion Environment, a metadata-based information environment, and describe its potential applications for internet and intranet-based distributed software development...

  3. Chemical machine vision: automated extraction of chemical metadata from raster images.

    Science.gov (United States)

    Gkoutos, Georgios V; Rzepa, Henry; Clark, Richard M; Adjei, Osei; Johal, Harpal

    2003-01-01

    We present a novel application of machine vision methods for the identification of chemical composition diagrams from two-dimensional digital raster images. The method is based on the use of Gabor wavelets and an energy function to derive feature vectors from digital images. These are used for training and classification purposes using a Kohonen network for classification with the Euclidean distance norm. We compare this method with previous approaches to transforming such images to a molecular connection table, which are designed to achieve complete atom connection table fidelity but at the expense of requiring human interaction. The present texture-based approach is complementary in attempting to recognize higher order features such as the presence of a chemical representation in the original raster image. This information can be used for providing chemical metadata descriptors of the original image as part of a robot-based Internet resource discovery tool.

  4. Approximate furthest neighbor with application to annulus query

    DEFF Research Database (Denmark)

    Pagh, Rasmus; Silvestri, Francesco; Sivertsen, Johan von Tangen

    2016-01-01

    Much recent work has been devoted to approximate nearest neighbor queries. Motivated by applications in recommender systems, we consider approximate furthest neighbor (AFN) queries and present a simple, fast, and highly practical data structure for answering AFN queries in high-dimensional Euclid......Much recent work has been devoted to approximate nearest neighbor queries. Motivated by applications in recommender systems, we consider approximate furthest neighbor (AFN) queries and present a simple, fast, and highly practical data structure for answering AFN queries in high...... a variation based on a query-independent ordering of the database points; while this does not have the provable approximation factor of the query-dependent data structure, it offers significant improvement in time and space complexity. We give a theoretical analysis and experimental results. As an application...

  5. An Observation Capability Metadata Model for EO Sensor Discovery in Sensor Web Enablement Environments

    Directory of Open Access Journals (Sweden)

    Chuli Hu

    2014-10-01

    Full Text Available Accurate and fine-grained discovery by diverse Earth observation (EO sensors ensures a comprehensive response to collaborative observation-required emergency tasks. This discovery remains a challenge in an EO sensor web environment. In this study, we propose an EO sensor observation capability metadata model that reuses and extends the existing sensor observation-related metadata standards to enable the accurate and fine-grained discovery of EO sensors. The proposed model is composed of five sub-modules, namely, ObservationBreadth, ObservationDepth, ObservationFrequency, ObservationQuality and ObservationData. The model is applied to different types of EO sensors and is formalized by the Open Geospatial Consortium Sensor Model Language 1.0. The GeosensorQuery prototype retrieves the qualified EO sensors based on the provided geo-event. An actual application to flood emergency observation in the Yangtze River Basin in China is conducted, and the results indicate that sensor inquiry can accurately achieve fine-grained discovery of qualified EO sensors and obtain enriched observation capability information. In summary, the proposed model enables an efficient encoding system that ensures minimum unification to represent the observation capabilities of EO sensors. The model functions as a foundation for the efficient discovery of EO sensors. In addition, the definition and development of this proposed EO sensor observation capability metadata model is a helpful step in extending the Sensor Model Language (SensorML 2.0 Profile for the description of the observation capabilities of EO sensors.

  6. A federated semantic metadata registry framework for enabling interoperability across clinical research and care domains.

    Science.gov (United States)

    Sinaci, A Anil; Laleci Erturkmen, Gokce B

    2013-10-01

    In order to enable secondary use of Electronic Health Records (EHRs) by bridging the interoperability gap between clinical care and research domains, in this paper, a unified methodology and the supporting framework is introduced which brings together the power of metadata registries (MDR) and semantic web technologies. We introduce a federated semantic metadata registry framework by extending the ISO/IEC 11179 standard, and enable integration of data element registries through Linked Open Data (LOD) principles where each Common Data Element (CDE) can be uniquely referenced, queried and processed to enable the syntactic and semantic interoperability. Each CDE and their components are maintained as LOD resources enabling semantic links with other CDEs, terminology systems and with implementation dependent content models; hence facilitating semantic search, much effective reuse and semantic interoperability across different application domains. There are several important efforts addressing the semantic interoperability in healthcare domain such as IHE DEX profile proposal, CDISC SHARE and CDISC2RDF. Our architecture complements these by providing a framework to interlink existing data element registries and repositories for multiplying their potential for semantic interoperability to a greater extent. Open source implementation of the federated semantic MDR framework presented in this paper is the core of the semantic interoperability layer of the SALUS project which enables the execution of the post marketing safety analysis studies on top of existing EHR systems. Copyright © 2013 Elsevier Inc. All rights reserved.

  7. Automated DICOM metadata and volumetric anatomical information extraction for radiation dosimetry

    Science.gov (United States)

    Papamichail, D.; Ploussi, A.; Kordolaimi, S.; Karavasilis, E.; Papadimitroulas, P.; Syrgiamiotis, V.; Efstathopoulos, E.

    2015-09-01

    Patient-specific dosimetry calculations based on simulation techniques have as a prerequisite the modeling of the modality system and the creation of voxelized phantoms. This procedure requires the knowledge of scanning parameters and patients’ information included in a DICOM file as well as image segmentation. However, the extraction of this information is complicated and time-consuming. The objective of this study was to develop a simple graphical user interface (GUI) to (i) automatically extract metadata from every slice image of a DICOM file in a single query and (ii) interactively specify the regions of interest (ROI) without explicit access to the radiology information system. The user-friendly application developed in Matlab environment. The user can select a series of DICOM files and manage their text and graphical data. The metadata are automatically formatted and presented to the user as a Microsoft Excel file. The volumetric maps are formed by interactively specifying the ROIs and by assigning a specific value in every ROI. The result is stored in DICOM format, for data and trend analysis. The developed GUI is easy, fast and and constitutes a very useful tool for individualized dosimetry. One of the future goals is to incorporate a remote access to a PACS server functionality.

  8. Keeping Dublin Core Simple: Cross-Domain Discovery or Resource Description?; First Steps in an Information Commerce Economy: Digital Rights Management in the Emerging E-Book Environment; Interoperability: Digital Rights Management and the Emerging EBook Environment; Searching the Deep Web: Direct Query Engine Applications at the Department of Energy.

    Science.gov (United States)

    Lagoze, Carl; Neylon, Eamonn; Mooney, Stephen; Warnick, Walter L.; Scott, R. L.; Spence, Karen J.; Johnson, Lorrie A.; Allen, Valerie S.; Lederman, Abe

    2001-01-01

    Includes four articles that discuss Dublin Core metadata, digital rights management and electronic books, including interoperability; and directed query engines, a type of search engine designed to access resources on the deep Web that is being used at the Department of Energy. (LRW)

  9. Semantic querying of data guided by Formal Concept Analysis

    OpenAIRE

    Codocedo , Victor; Lykourentzou , Ioanna; Napoli , Amedeo

    2012-01-01

    International audience; In this paper we present a novel approach to handle querying over a concept lattice of documents and annotations. We focus on the problem of "non-matching documents", which are those that, despite being semantically relevant to the user query, do not contain the query's elements and hence cannot be retrieved by typical string matching approaches. In order to find these documents, we modify the initial user query using the concept lattice as a guide. We achieve this by ...

  10. Parallelizing Federated SPARQL Queries in Presence of Replicated Data

    DEFF Research Database (Denmark)

    Minier, Thomas; Montoya, Gabriela; Skaf-Molli, Hala

    2017-01-01

    Federated query engines have been enhanced to exploit new data localities created by replicated data, e.g., Fedra. However, existing replication aware federated query engines mainly focus on pruning sources during the source selection and query decomposition in order to reduce intermediate results...

  11. Multiple Query Evaluation Based on an Enhanced Genetic Algorithm.

    Science.gov (United States)

    Tamine, Lynda; Chrisment, Claude; Boughanem, Mohand

    2003-01-01

    Explains the use of genetic algorithms to combine results from multiple query evaluations to improve relevance in information retrieval. Discusses niching techniques, relevance feedback techniques, and evolution heuristics, and compares retrieval results obtained by both genetic multiple query evaluation and classical single query evaluation…

  12. User Simulations for Interactive Search : Evaluating Personalized Query Suggestion

    NARCIS (Netherlands)

    Verberne, S.; Sappelli, M.; Järvelin, K.; Kraaij, W.

    2015-01-01

    In this paper, we address the question “what is the influence of user search behaviour on the effectiveness of personalized query suggestion?”. We implemented a method for query suggestion that generates candidate follow-up queries from the documents clicked by the user. This is a potentially

  13. Automatic Query Generation and Query Relevance Measurement for Unsupervised Language Model Adaptation of Speech Recognition

    Directory of Open Access Journals (Sweden)

    Suzuki Motoyuki

    2009-01-01

    Full Text Available Abstract We are developing a method of Web-based unsupervised language model adaptation for recognition of spoken documents. The proposed method chooses keywords from the preliminary recognition result and retrieves Web documents using the chosen keywords. A problem is that the selected keywords tend to contain misrecognized words. The proposed method introduces two new ideas for avoiding the effects of keywords derived from misrecognized words. The first idea is to compose multiple queries from selected keyword candidates so that the misrecognized words and correct words do not fall into one query. The second idea is that the number of Web documents downloaded for each query is determined according to the "query relevance." Combining these two ideas, we can alleviate bad effect of misrecognized keywords by decreasing the number of downloaded Web documents from queries that contain misrecognized keywords. Finally, we examine a method of determining the number of iterative adaptations based on the recognition likelihood. Experiments have shown that the proposed stopping criterion can determine almost the optimum number of iterations. In the final experiment, the word accuracy without adaptation (55.29% was improved to 60.38%, which was 1.13 point better than the result of the conventional unsupervised adaptation method (59.25%.

  14. Automatic Query Generation and Query Relevance Measurement for Unsupervised Language Model Adaptation of Speech Recognition

    Directory of Open Access Journals (Sweden)

    Akinori Ito

    2009-01-01

    Full Text Available We are developing a method of Web-based unsupervised language model adaptation for recognition of spoken documents. The proposed method chooses keywords from the preliminary recognition result and retrieves Web documents using the chosen keywords. A problem is that the selected keywords tend to contain misrecognized words. The proposed method introduces two new ideas for avoiding the effects of keywords derived from misrecognized words. The first idea is to compose multiple queries from selected keyword candidates so that the misrecognized words and correct words do not fall into one query. The second idea is that the number of Web documents downloaded for each query is determined according to the “query relevance.” Combining these two ideas, we can alleviate bad effect of misrecognized keywords by decreasing the number of downloaded Web documents from queries that contain misrecognized keywords. Finally, we examine a method of determining the number of iterative adaptations based on the recognition likelihood. Experiments have shown that the proposed stopping criterion can determine almost the optimum number of iterations. In the final experiment, the word accuracy without adaptation (55.29% was improved to 60.38%, which was 1.13 point better than the result of the conventional unsupervised adaptation method (59.25%.

  15. Metadata For Identity Management of Population Registers

    Directory of Open Access Journals (Sweden)

    Olivier Glassey

    2011-04-01

    Full Text Available A population register is an inventory of residents within a country, with their characteristics (date of birth, sex, marital status, etc. and other socio-economic data, such as occupation or education. However, data on population are also stored in numerous other public registers such as tax, land, building and housing, military, foreigners, vehicles, etc. Altogether they contain vast amounts of personal and sensitive information. Access to public information is granted by law in many countries, but this transparency is generally subject to tensions with data protection laws. This paper proposes a framework to analyze data access (or protection requirements, as well as a model of metadata for data exchange.

  16. Information resource description creating and managing metadata

    CERN Document Server

    Hider, Philip

    2012-01-01

    An overview of the field of information organization that examines resource description as both a product and process of the contemporary digital environment.This timely book employs the unifying mechanism of the semantic web and the resource description framework to integrate the various traditions and practices of information and knowledge organization. Uniquely, it covers both the domain-specific traditions and practices and the practices of the ?metadata movement' through a single lens ? that of resource description in the broadest, semantic web sense.This approach more readily accommodate

  17. Secure Count Query on Encrypted Genomic Data.

    Science.gov (United States)

    Hasan, Mohammad Zahidul; Rahman Mahdi, Md Safiur; Sadat, Md Nazmus; Mohammed, Noman

    2018-03-14

    Human genomic information can yield more effective healthcare by guiding medical decisions. Therefore, genomics research is gaining popularity as it can identify potential correlations between a disease and a certain gene, which improves the safety and efficacy of drug treatment and can also develop more effective prevention strategies [1]. To reduce the sampling error and to increase the statistical accuracy of this type of research projects, data from different sources need to be brought together since a single organization does not necessarily possess required amount of data. In this case, data sharing among multiple organizations must satisfy strict policies (for instance, HIPAA and PIPEDA) that have been enforced to regulate privacy-sensitive data sharing. Storage and computation on the shared data can be outsourced to a third party cloud service provider, equipped with enormous storage and computation resources. However, outsourcing data to a third party is associated with a potential risk of privacy violation of the participants, whose genomic sequence or clinical profile is used in these studies. In this article, we propose a method for secure sharing and computation on genomic data in a semi-honest cloud server. In particular, there are two main contributions. Firstly, the proposed method can handle biomedical data containing both genotype and phenotype. Secondly, our proposed index tree scheme reduces the computational overhead significantly for executing secure count query operation. In our proposed method, the confidentiality of shared data is ensured through encryption, while making the entire computation process efficient and scalable for cutting-edge biomedical applications. We evaluated our proposed method in terms of efficiency on a database of Single-Nucleotide Polymorphism (SNP) sequences, and experimental results demonstrate that the execution time for a query of 50 SNPs in a database of 50000 records is approximately 5 seconds, where each

  18. Spatio-temporal databases complex motion pattern queries

    CERN Document Server

    Vieira, Marcos R

    2013-01-01

    This brief presents several new query processing techniques, called complex motion pattern queries, specifically designed for very large spatio-temporal databases of moving objects. The brief begins with the definition of flexible pattern queries, which are powerful because of the integration of variables and motion patterns. This is followed by a summary of the expressive power of patterns and flexibility of pattern queries. The brief then present the Spatio-Temporal Pattern System (STPS) and density-based pattern queries. STPS databases contain millions of records with information about mobi

  19. Deep web query interface understanding and integration

    CERN Document Server

    Dragut, Eduard C; Yu, Clement T

    2012-01-01

    There are millions of searchable data sources on the Web and to a large extent their contents can only be reached through their own query interfaces. There is an enormous interest in making the data in these sources easily accessible. There are primarily two general approaches to achieve this objective. The first is to surface the contents of these sources from the deep Web and add the contents to the index of regular search engines. The second is to integrate the searching capabilities of these sources and support integrated access to them. In this book, we introduce the state-of-the-art tech

  20. Downloading Multiple Records Using Query Strings

    Directory of Open Access Journals (Sweden)

    Adam Crymble

    2012-11-01

    Full Text Available Downloading a single record from a website is easy, but downloading many records at a time – an increasingly frequent need for a historian – is much more efficient using a programming language such as Python. In this lesson, we will write a program that will download a series of records from the Old Bailey Online using custom search criteria, and save them to a directory on our computer. This process involves interpreting and manipulating URL Query Strings. In this case, the tutorial will seek to download sources that contain references to people of African descent that were published in the Old Bailey Proceedings between 1700 and 1750.

  1. Optimizing queries in SQL Server 2008

    Directory of Open Access Journals (Sweden)

    Ion LUNGU

    2010-05-01

    Full Text Available Starting from the need to develop efficient IT systems, we intend to review theoptimization methods and tools that can be used by SQL Server database administratorsand developers of applications based on Microsoft technology, focusing on the latestversion of the proprietary DBMS, SQL Server 2008. We’ll reflect on the objectives tobe considered in improving the performance of SQL Server instances, we will tackle themostly used techniques for analyzing and optimizing queries and we will describe the“Optimize for ad hoc workloads”, “Plan Freezing” and “Optimize for unknown" newoptions, accompanied by relevant code examples.

  2. Mobile Information Access with Spoken Query Answering

    DEFF Research Database (Denmark)

    Brøndsted, Tom; Larsen, Henrik Legind; Larsen, Lars Bo

    2006-01-01

    This paper addresses the problem of information and service accessibility in mobile devices with limited resources. A solution is developed and tested through a prototype that applies state-of-the-art Distributed Speech Recognition (DSR) and knowledge-based Information Retrieval (IR) processing...... for spoken query answering. For the DSR part, a configurable DSR system is implemented on the basis of the ETSI-DSR advanced front-end and the SPHINX IV recognizer. For the knowledge-based IR part, a distributed system solution is developed for fast retrieval of the most relevant documents, with a text...

  3. Metadata Laws, Journalism and Resistance in Australia

    Directory of Open Access Journals (Sweden)

    Benedetta Brevini

    2017-03-01

    Full Text Available The intelligence leaks from Edward Snowden in 2013 unveiled the sophistication and extent of data collection by the United States’ National Security Agency and major global digital firms prompting domestic and international debates about the balance between security and privacy, openness and enclosure, accountability and secrecy. It is difficult not to see a clear connection with the Snowden leaks in the sharp acceleration of new national security legislations in Australia, a long term member of the Five Eyes Alliance. In October 2015, the Australian federal government passed controversial laws that require telecommunications companies to retain the metadata of their customers for a period of two years. The new acts pose serious threats for the profession of journalism as they enable government agencies to easily identify and pursue journalists’ sources. Bulk data collections of this type of information deter future whistleblowers from approaching journalists, making the performance of the latter’s democratic role a challenge. After situating this debate within the scholarly literature at the intersection between surveillance studies and communication studies, this article discusses the political context in which journalists are operating and working in Australia; assesses how metadata laws have affected journalism practices and addresses the possibility for resistance.

  4. Metadata Access Tool for Climate and Health

    Science.gov (United States)

    Trtanji, J.

    2012-12-01

    The need for health information resources to support climate change adaptation and mitigation decisions is growing, both in the United States and around the world, as the manifestations of climate change become more evident and widespread. In many instances, these information resources are not specific to a changing climate, but have either been developed or are highly relevant for addressing health issues related to existing climate variability and weather extremes. To help address the need for more integrated data, the Interagency Cross-Cutting Group on Climate Change and Human Health, a working group of the U.S. Global Change Research Program, has developed the Metadata Access Tool for Climate and Health (MATCH). MATCH is a gateway to relevant information that can be used to solve problems at the nexus of climate science and public health by facilitating research, enabling scientific collaborations in a One Health approach, and promoting data stewardship that will enhance the quality and application of climate and health research. MATCH is a searchable clearinghouse of publicly available Federal metadata including monitoring and surveillance data sets, early warning systems, and tools for characterizing the health impacts of global climate change. Examples of relevant databases include the Centers for Disease Control and Prevention's Environmental Public Health Tracking System and NOAA's National Climate Data Center's national and state temperature and precipitation data. This presentation will introduce the audience to this new web-based geoportal and demonstrate its features and potential applications.

  5. Visual exploration of the attribute space of DANS EASY metadata

    NARCIS (Netherlands)

    ten Bosch, Olav; Scharnhorst, A.M.; Doorn, P.K.; Koning, Henk

    2012-01-01

    Study of the metadata of the Electronic Archiving System (EASY) of Data Archiving and Networked Services (DANS) for the purpose of getting insight in the internal structure of the collection. The visualization contains a dump of the EASY metadata set and all important data files that were generated

  6. Forensic devices for activism: Metadata tracking and public proof

    NARCIS (Netherlands)

    van der Velden, L.

    2015-01-01

    The central topic of this paper is a mobile phone application, ‘InformaCam’, which turns metadata from a surveillance risk into a method for the production of public proof. InformaCam allows one to manage and delete metadata from images and videos in order to diminish surveillance risks related to

  7. Metadata as a means for correspondence on digital media

    NARCIS (Netherlands)

    Stouffs, R.; Kooistra, J.; Tuncer, B.

    2004-01-01

    Metadata derive their action from their association to data and from the relationship they maintain with this data. An interpretation of this action is that the metadata lays claim to the data collection to which it is associated, where the claim is successful if the data collection gains quality as

  8. Shared Geospatial Metadata Repository for Ontario University Libraries: Collaborative Approaches

    Science.gov (United States)

    Forward, Erin; Leahey, Amber; Trimble, Leanne

    2015-01-01

    Successfully providing access to special collections of digital geospatial data in academic libraries relies upon complete and accurate metadata. Creating and maintaining metadata using specialized standards is a formidable challenge for libraries. The Ontario Council of University Libraries' Scholars GeoPortal project, which created a shared…

  9. Learning Object Metadata in a Web-Based Learning Environment

    NARCIS (Netherlands)

    Avgeriou, Paris; Koutoumanos, Anastasios; Retalis, Symeon; Papaspyrou, Nikolaos

    2000-01-01

    The plethora and variance of learning resources embedded in modern web-based learning environments require a mechanism to enable their structured administration. This goal can be achieved by defining metadata on them and constructing a system that manages the metadata in the context of the learning

  10. Developing Cyberinfrastructure Tools and Services for Metadata Quality Evaluation

    Science.gov (United States)

    Mecum, B.; Gordon, S.; Habermann, T.; Jones, M. B.; Leinfelder, B.; Powers, L. A.; Slaughter, P.

    2016-12-01

    Metadata and data quality are at the core of reusable and reproducible science. While great progress has been made over the years, much of the metadata collected only addresses data discovery, covering concepts such as titles and keywords. Improving metadata beyond the discoverability plateau means documenting detailed concepts within the data such as sampling protocols, instrumentation used, and variables measured. Given that metadata commonly do not describe their data at this level, how might we improve the state of things? Giving scientists and data managers easy to use tools to evaluate metadata quality that utilize community-driven recommendations is the key to producing high-quality metadata. To achieve this goal, we created a set of cyberinfrastructure tools and services that integrate with existing metadata and data curation workflows which can be used to improve metadata and data quality across the sciences. These tools work across metadata dialects (e.g., ISO19115, FGDC, EML, etc.) and can be used to assess aspects of quality beyond what is internal to the metadata such as the congruence between the metadata and the data it describes. The system makes use of a user-friendly mechanism for expressing a suite of checks as code in popular data science programming languages such as Python and R. This reduces the burden on scientists and data managers to learn yet another language. We demonstrated these services and tools in three ways. First, we evaluated a large corpus of datasets in the DataONE federation of data repositories against a metadata recommendation modeled after existing recommendations such as the LTER best practices and the Attribute Convention for Dataset Discovery (ACDD). Second, we showed how this service can be used to display metadata and data quality information to data producers during the data submission and metadata creation process, and to data consumers through data catalog search and access tools. Third, we showed how the centrally

  11. Making the Case for Embedded Metadata in Digital Images

    DEFF Research Database (Denmark)

    Smith, Kari R.; Saunders, Sarah; Kejser, U.B.

    2014-01-01

    exchange in heritage institutions and the culture sector. Our examples and findings support the case for embedded metadata in digital images and the opportunities for such use more broadly in non-heritage sectors as well. We encourage the adoption of embedded metadata by digital image content creators......This paper discusses the standards, methods, use cases, and opportunities for using embedded metadata in digital images. In this paper we explain the past and current work engaged with developing specifications, standards for embedding metadata of different types, and the practicalities of data...... and curators as well as those developing software and hardware that support the creation or re-use of digital images. We conclude that the usability of born digital images as well as physical objects that are digitized can be extended and the files preserved more readily with embedded metadata....

  12. Interpreting the ASTM 'content standard for digital geospatial metadata'

    Science.gov (United States)

    Nebert, Douglas D.

    1996-01-01

    ASTM and the Federal Geographic Data Committee have developed a content standard for spatial metadata to facilitate documentation, discovery, and retrieval of digital spatial data using vendor-independent terminology. Spatial metadata elements are identifiable quality and content characteristics of a data set that can be tied to a geographic location or area. Several Office of Management and Budget Circulars and initiatives have been issued that specify improved cataloguing of and accessibility to federal data holdings. An Executive Order further requires the use of the metadata content standard to document digital spatial data sets. Collection and reporting of spatial metadata for field investigations performed for the federal government is an anticipated requirement. This paper provides an overview of the draft spatial metadata content standard and a description of how the standard could be applied to investigations collecting spatially-referenced field data.

  13. EXIF Custom: Automatic image metadata extraction for Scratchpads and Drupal

    Directory of Open Access Journals (Sweden)

    Ed Baker

    2013-09-01

    Full Text Available Many institutions and individuals use embedded metadata to aid in the management of their image collections. Many deskop image management solutions such as Adobe Bridge and online tools such as Flickr also make use of embedded metadata to describe, categorise and license images. Until now Scratchpads (a data management system and virtual research environment for biodiversity  have not made use of these metadata, and users have had to manually re-enter this information if they have wanted to display it on their Scratchpad site. The Drupal described here allows users to map metadata embedded in their images to the associated field in the Scratchpads image form using one or more customised mappings. The module works seamlessly with the bulk image uploader used on Scratchpads and it is therefore possible to upload hundreds of images easily with automatic metadata (EXIF, XMP and IPTC extraction and mapping.

  14. Managing ebook metadata in academic libraries taming the tiger

    CERN Document Server

    Frederick, Donna E

    2016-01-01

    Managing ebook Metadata in Academic Libraries: Taming the Tiger tackles the topic of ebooks in academic libraries, a trend that has been welcomed by students, faculty, researchers, and library staff. However, at the same time, the reality of acquiring ebooks, making them discoverable, and managing them presents library staff with many new challenges. Traditional methods of cataloging and managing library resources are no longer relevant where the purchasing of ebooks in packages and demand driven acquisitions are the predominant models for acquiring new content. Most academic libraries have a complex metadata environment wherein multiple systems draw upon the same metadata for different purposes. This complexity makes the need for standards-based interoperable metadata more important than ever. In addition to complexity, the nature of the metadata environment itself typically varies slightly from library to library making it difficult to recommend a single set of practices and procedures which would be releva...

  15. Metafier - a Tool for Annotating and Structuring Building Metadata

    DEFF Research Database (Denmark)

    Holmegaard, Emil; Johansen, Aslak; Kjærgaard, Mikkel Baun

    2017-01-01

    , describing the instrumentation of the building. We have created Metafier, a tool for annotating and structuring metadata for buildings. Metafier optimizes the workflow of establishing metadata for buildings by enabling a human-in-the-loop to validate, search and group points. We have evaluated Metafier...... for two buildings, with different sizes, locations, ages and purposes. The evaluation was performed as a user test with three subjects with different backgrounds. The evaluation results indicates that the tool enabled the users to validate, search and group points while annotating metadata. One challenge...... is to get users to understand the concept of metadata for the tool to be useable. Based on our evaluation, we have listed guidelines for creating a tool for annotating building metadata....

  16. Department of the Interior metadata implementation guide—Framework for developing the metadata component for data resource management

    Science.gov (United States)

    Obuch, Raymond C.; Carlino, Jennifer; Zhang, Lin; Blythe, Jonathan; Dietrich, Christopher; Hawkinson, Christine

    2018-04-12

    The Department of the Interior (DOI) is a Federal agency with over 90,000 employees across 10 bureaus and 8 agency offices. Its primary mission is to protect and manage the Nation’s natural resources and cultural heritage; provide scientific and other information about those resources; and honor its trust responsibilities or special commitments to American Indians, Alaska Natives, and affiliated island communities. Data and information are critical in day-to-day operational decision making and scientific research. DOI is committed to creating, documenting, managing, and sharing high-quality data and metadata in and across its various programs that support its mission. Documenting data through metadata is essential in realizing the value of data as an enterprise asset. The completeness, consistency, and timeliness of metadata affect users’ ability to search for and discover the most relevant data for the intended purpose; and facilitates the interoperability and usability of these data among DOI bureaus and offices. Fully documented metadata describe data usability, quality, accuracy, provenance, and meaning.Across DOI, there are different maturity levels and phases of information and metadata management implementations. The Department has organized a committee consisting of bureau-level points-of-contacts to collaborate on the development of more consistent, standardized, and more effective metadata management practices and guidance to support this shared mission and the information needs of the Department. DOI’s metadata implementation plans establish key roles and responsibilities associated with metadata management processes, procedures, and a series of actions defined in three major metadata implementation phases including: (1) Getting started—Planning Phase, (2) Implementing and Maintaining Operational Metadata Management Phase, and (3) the Next Steps towards Improving Metadata Management Phase. DOI’s phased approach for metadata management addresses

  17. Semantic technologies improving the recall and precision of the Mercury metadata search engine

    Science.gov (United States)

    Pouchard, L. C.; Cook, R. B.; Green, J.; Palanisamy, G.; Noy, N.

    2011-12-01

    The Mercury federated metadata system [1] was developed at the Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC), a NASA-sponsored effort holding datasets about biogeochemical dynamics, ecological data, and environmental processes. Mercury currently indexes over 100,000 records from several data providers conforming to community standards, e.g. EML, FGDC, FGDC Biological Profile, ISO 19115 and DIF. With the breadth of sciences represented in Mercury, the potential exists to address some key interdisciplinary scientific challenges related to climate change, its environmental and ecological impacts, and mitigation of these impacts. However, this wealth of metadata also hinders pinpointing datasets relevant to a particular inquiry. We implemented a semantic solution after concluding that traditional search approaches cannot improve the accuracy of the search results in this domain because: a) unlike everyday queries, scientific queries seek to return specific datasets with numerous parameters that may or may not be exposed to search (Deep Web queries); b) the relevance of a dataset cannot be judged by its popularity, as each scientific inquiry tends to be unique; and c)each domain science has its own terminology, more or less curated, consensual, and standardized depending on the domain. The same terms may refer to different concepts across domains (homonyms), but different terms mean the same thing (synonyms). Interdisciplinary research is arduous because an expert in a domain must become fluent in the language of another, just to find relevant datasets. Thus, we decided to use scientific ontologies because they can provide a context for a free-text search, in a way that string-based keywords never will. With added context, relevant datasets are more easily discoverable. To enable search and programmatic access to ontology entities in Mercury, we are using an instance of the BioPortal ontology repository. Mercury accesses ontology entities

  18. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata

    Science.gov (United States)

    Liolios, Konstantinos; Chen, I-Min A.; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Philip; Markowitz, Victor M.; Kyrpides, Nikos C.

    2010-01-01

    The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification. GOLD is available at: http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece, at: http://gold.imbb.forth.gr/ PMID:19914934

  19. The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata

    Science.gov (United States)

    Pagani, Ioanna; Liolios, Konstantinos; Jansson, Jakob; Chen, I-Min A.; Smirnova, Tatyana; Nosrat, Bahador; Markowitz, Victor M.; Kyrpides, Nikos C.

    2012-01-01

    The Genomes OnLine Database (GOLD, http://www.genomesonline.org/) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2011, GOLD, now on version 4.0, contains information for 11 472 sequencing projects, of which 2907 have been completed and their sequence data has been deposited in a public repository. Out of these complete projects, 1918 are finished and 989 are permanent drafts. Moreover, GOLD contains information for 340 metagenome studies associated with 1927 metagenome samples. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about any (x) Sequence specification and beyond. PMID:22135293

  20. Query-Time Optimization Techniques for Structured Queries in Information Retrieval

    Science.gov (United States)

    Cartright, Marc-Allen

    2013-01-01

    The use of information retrieval (IR) systems is evolving towards larger, more complicated queries. Both the IR industrial and research communities have generated significant evidence indicating that in order to continue improving retrieval effectiveness, increases in retrieval model complexity may be unavoidable. From an operational perspective,…

  1. Computer systems and methods for the query and visualization of multidimensional database

    Science.gov (United States)

    Stolte, Chris; Tang, Diane L.; Hanrahan, Patrick

    2010-05-11

    A method and system for producing graphics. A hierarchical structure of a database is determined. A visual table, comprising a plurality of panes, is constructed by providing a specification that is in a language based on the hierarchical structure of the database. In some cases, this language can include fields that are in the database schema. The database is queried to retrieve a set of tuples in accordance with the specification. A subset of the set of tuples is associated with a pane in the plurality of panes.

  2. mzML2ISA & nmrML2ISA: generating enriched ISA-Tab metadata files from metabolomics XML data.

    Science.gov (United States)

    Larralde, Martin; Lawson, Thomas N; Weber, Ralf J M; Moreno, Pablo; Haug, Kenneth; Rocca-Serra, Philippe; Viant, Mark R; Steinbeck, Christoph; Salek, Reza M

    2017-08-15

    Submission to the MetaboLights repository for metabolomics data currently places the burden of reporting instrument and acquisition parameters in ISA-Tab format on users, who have to do it manually, a process that is time consuming and prone to user input error. Since the large majority of these parameters are embedded in instrument raw data files, an opportunity exists to capture this metadata more accurately. Here we report a set of Python packages that can automatically generate ISA-Tab metadata file stubs from raw XML metabolomics data files. The parsing packages are separated into mzML2ISA (encompassing mzML and imzML formats) and nmrML2ISA (nmrML format only). Overall, the use of mzML2ISA & nmrML2ISA reduces the time needed to capture metadata substantially (capturing 90% of metadata on assay and sample levels), is much less prone to user input errors, improves compliance with minimum information reporting guidelines and facilitates more finely grained data exploration and querying of datasets. mzML2ISA & nmrML2ISA are available under version 3 of the GNU General Public Licence at https://github.com/ISA-tools. Documentation is available from http://2isa.readthedocs.io/en/latest/. reza.salek@ebi.ac.uk or isatools@googlegroups.com. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  3. Lost in translation? A multilingual Query Builder improves the quality of PubMed queries: a randomised controlled trial.

    Science.gov (United States)

    Schuers, Matthieu; Joulakian, Mher; Kerdelhué, Gaetan; Segas, Léa; Grosjean, Julien; Darmoni, Stéfan J; Griffon, Nicolas

    2017-07-03

    MEDLINE is the most widely used medical bibliographic database in the world. Most of its citations are in English and this can be an obstacle for some researchers to access the information the database contains. We created a multilingual query builder to facilitate access to the PubMed subset using a language other than English. The aim of our study was to assess the impact of this multilingual query builder on the quality of PubMed queries for non-native English speaking physicians and medical researchers. A randomised controlled study was conducted among French speaking general practice residents. We designed a multi-lingual query builder to facilitate information retrieval, based on available MeSH translations and providing users with both an interface and a controlled vocabulary in their own language. Participating residents were randomly allocated either the French or the English version of the query builder. They were asked to translate 12 short medical questions into MeSH queries. The main outcome was the quality of the query. Two librarians blind to the arm independently evaluated each query, using a modified published classification that differentiated eight types of errors. Twenty residents used the French version of the query builder and 22 used the English version. 492 queries were analysed. There were significantly more perfect queries in the French group vs. the English group (respectively 37.9% vs. 17.9%; p < 0.01). It took significantly more time for the members of the English group than the members of the French group to build each query, respectively 194 sec vs. 128 sec; p < 0.01. This multi-lingual query builder is an effective tool to improve the quality of PubMed queries in particular for researchers whose first language is not English.

  4. CrossQuery: a web tool for easy associative querying of transcriptome data.

    Directory of Open Access Journals (Sweden)

    Toni U Wagner

    Full Text Available Enormous amounts of data are being generated by modern methods such as transcriptome or exome sequencing and microarray profiling. Primary analyses such as quality control, normalization, statistics and mapping are highly complex and need to be performed by specialists. Thereafter, results are handed back to biomedical researchers, who are then confronted with complicated data lists. For rather simple tasks like data filtering, sorting and cross-association there is a need for new tools which can be used by non-specialists. Here, we describe CrossQuery, a web tool that enables straight forward, simple syntax queries to be executed on transcriptome sequencing and microarray datasets. We provide deep-sequencing data sets of stem cell lines derived from the model fish Medaka and microarray data of human endothelial cells. In the example datasets provided, mRNA expression levels, gene, transcript and sample identification numbers, GO-terms and gene descriptions can be freely correlated, filtered and sorted. Queries can be saved for later reuse and results can be exported to standard formats that allow copy-and-paste to all widespread data visualization tools such as Microsoft Excel. CrossQuery enables researchers to quickly and freely work with transcriptome and microarray data sets requiring only minimal computer skills. Furthermore, CrossQuery allows growing association of multiple datasets as long as at least one common point of correlated information, such as transcript identification numbers or GO-terms, is shared between samples. For advanced users, the object-oriented plug-in and event-driven code design of both server-side and client-side scripts allow easy addition of new features, data sources and data types.

  5. The XML Metadata Editor of GFZ Data Services

    Science.gov (United States)

    Ulbricht, Damian; Elger, Kirsten; Tesei, Telemaco; Trippanera, Daniele

    2017-04-01

    Following the FAIR data principles, research data should be Findable, Accessible, Interoperable and Reuseable. Publishing data under these principles requires to assign persistent identifiers to the data and to generate rich machine-actionable metadata. To increase the interoperability, metadata should include shared vocabularies and crosslink the newly published (meta)data and related material. However, structured metadata formats tend to be complex and are not intended to be generated by individual scientists. Software solutions are needed that support scientists in providing metadata describing their data. To facilitate data publication activities of 'GFZ Data Services', we programmed an XML metadata editor that assists scientists to create metadata in different schemata popular in the earth sciences (ISO19115, DIF, DataCite), while being at the same time usable by and understandable for scientists. Emphasis is placed on removing barriers, in particular the editor is publicly available on the internet without registration [1] and the scientists are not requested to provide information that may be generated automatically (e.g. the URL of a specific licence or the contact information of the metadata distributor). Metadata are stored in browser cookies and a copy can be saved to the local hard disk. To improve usability, form fields are translated into the scientific language, e.g. 'creators' of the DataCite schema are called 'authors'. To assist filling in the form, we make use of drop down menus for small vocabulary lists and offer a search facility for large thesauri. Explanations to form fields and definitions of vocabulary terms are provided in pop-up windows and a full documentation is available for download via the help menu. In addition, multiple geospatial references can be entered via an interactive mapping tool, which helps to minimize problems with different conventions to provide latitudes and longitudes. Currently, we are extending the metadata editor

  6. TOPCAT: Tool for OPerations on Catalogues And Tables

    Science.gov (United States)

    Taylor, Mark

    2011-01-01

    TOPCAT is an interactive graphical viewer and editor for tabular data. Its aim is to provide most of the facilities that astronomers need for analysis and manipulation of source catalogues and other tables, though it can be used for non-astronomical data as well. It understands a number of different astronomically important formats (including FITS and VOTable) and more formats can be added. It offers a variety of ways to view and analyse tables, including a browser for the cell data themselves, viewers for information about table and column metadata, and facilities for 1-, 2-, 3- and higher-dimensional visualisation, calculating statistics and joining tables using flexible matching algorithms. Using a powerful and extensible Java-based expression language new columns can be defined and row subsets selected for separate analysis. Table data and metadata can be edited and the resulting modified table can be written out in a wide range of output formats. It is a stand-alone application which works quite happily with no network connection. However, because it uses Virtual Observatory (VO) standards, it can cooperate smoothly with other tools in the VO world and beyond, such as VODesktop, Aladin and ds9. Between 2006 and 2009 TOPCAT was developed within the AstroGrid project, and is offered as part of a standard suite of applications on the AstroGrid web site, where you can find information on several other VO tools. The program is written in pure Java and available under the GNU General Public Licence. It has been developed in the UK within the Starlink and AstroGrid projects, and under PPARC and STFC grants. Its underlying table processing facilities are provided by STIL.

  7. Taxonomic names, metadata, and the Semantic Web

    Directory of Open Access Journals (Sweden)

    Roderic D. M. Page

    2006-01-01

    Full Text Available Life Science Identifiers (LSIDs offer an attractive solution to the problem of globally unique identifiers for digital objects in biology. However, I suggest that in the context of taxonomic names, the most compelling benefit of adopting these identifiers comes from the metadata associated with each LSID. By using existing vocabularies wherever possible, and using a simple vocabulary for taxonomy-specific concepts we can quickly capture the essential information about a taxonomic name in the Resource Description Framework (RDF format. This opens up the prospect of using technologies developed for the Semantic Web to add ``taxonomic intelligence" to biodiversity databases. This essay explores some of these ideas in the context of providing a taxonomic framework for the phylogenetic database TreeBASE.

  8. Evolution of the ATLAS Metadata Interface (AMI)

    CERN Document Server

    Odier, Jerome; The ATLAS collaboration; Fulachier, Jerome; Lambert, Fabian

    2015-01-01

    The ATLAS Metadata Interface (AMI) can be considered to be a mature application because it has existed for at least 10 years. Over the years, the number of users and the number of functions provided for these users has increased. It has been necessary to adapt the hardware infrastructure in a seamless way so that the Quality of Service remains high. We will describe the evolution of the application from the initial one, using single server with a MySQL backend database, to the current state, where we use a cluster of Virtual Machines on the French Tier 1 Cloud at Lyon, an ORACLE database backend also at Lyon, with replication to CERN using ORACLE streams behind a back-up server.

  9. Extending OLAP Querying to External Object

    DEFF Research Database (Denmark)

    Pedersen, Torben Bach; Shoshani, Arie; Gu, Junmin

    On-Line Analytical Processing (OLAP) systems based on a dimensional view of data have found widespread use in business applications and are being used increasingly in non-standard applications. These systems provide good performance and ease-of-use. However, the complex structures and relationships...... inherent in data in nonstandard applications are not accommodated well by OLAP systems. In contrast, object database systems are built to handle such complexity, but do not support OLAP-type querying well. This paper presents the concepts and techniques underlying a flexible, multi-model federated system...... that enables OLAP users to exploit simultaneously the features of OLAP and object systems. The system allows data to be handled using the most appropriate data model and technology: OLAP systems for dimensional data and object database systems for more complex, general data. Additionally, physical data...

  10. Achieving interoperability for metadata registries using comparative object modeling.

    Science.gov (United States)

    Park, Yu Rang; Kim, Ju Han

    2010-01-01

    Achieving data interoperability between organizations relies upon agreed meaning and representation (metadata) of data. For managing and registering metadata, many organizations have built metadata registries (MDRs) in various domains based on international standard for MDR framework, ISO/IEC 11179. Following this trend, two pubic MDRs in biomedical domain have been created, United States Health Information Knowledgebase (USHIK) and cancer Data Standards Registry and Repository (caDSR), from U.S. Department of Health & Human Services and National Cancer Institute (NCI), respectively. Most MDRs are implemented with indiscriminate extending for satisfying organization-specific needs and solving semantic and structural limitation of ISO/IEC 11179. As a result it is difficult to address interoperability among multiple MDRs. In this paper, we propose an integrated metadata object model for achieving interoperability among multiple MDRs. To evaluate this model, we developed an XML Schema Definition (XSD)-based metadata exchange format. We created an XSD-based metadata exporter, supporting both the integrated metadata object model and organization-specific MDR formats.

  11. Definition of a CDI metadata profile and its ISO 19139 based encoding

    Science.gov (United States)

    Boldrini, Enrico; de Korte, Arjen; Santoro, Mattia; Schaap, Dick M. A.; Nativi, Stefano; Manzella, Giuseppe

    2010-05-01

    The Common Data Index (CDI) is the middleware service adopted by SeaDataNet for discovery and query. The primary goal of the EU funded project SeaDataNet is to develop a system which provides transparent access to marine data sets and data products from 36 countries in and around Europe. The European context of SeaDataNet requires that the developed system complies with European Directive INSPIRE. In order to assure the required conformity a GI-cat based solution is proposed. GI-cat is a broker service able to mediate from different metadata sources and publish them through a consistent and unified interface. In this case GI-cat is used as a front end to the SeaDataNet portal publishing the original data, based on CDI v.1 XML schema, through an ISO 19139 application profile catalog interface (OGC CSW AP ISO). The choice of ISO 19139 is supported and driven by INSPIRE Implementing Rules, that have been used as a reference through the whole development process. A mapping from the CDI data model to the ISO 19139 was hence to be implemented in GI-cat and a first draft quickly developed, as both CDI v.1 and ISO 19139 happen to be XML implementations based on the same abstract data model (standard ISO 19115 - metadata about geographic information). This first draft mapping pointed out the CDI metadata model differences with respect to ISO 19115, as it was not possible to accommodate all the information contained in CDI v.1 into ISO 19139. Moreover some modifications were needed in order to reach INSPIRE compliance. The consequent work consisted in the definition of the CDI metadata model as a profile of ISO 19115. This included checking of all the metadata elements present in CDI and their cardinality. A comparison was made with respect to ISO 19115 and possible extensions were individuated. ISO 19139 was then chosen as a natural XML implementation of this new CDI metadata profile. The mapping and the profile definition processes were iteratively refined leading up to a

  12. Querying and Extracting Timeline Information from Road Traffic Sensor Data

    Science.gov (United States)

    Imawan, Ardi; Indikawati, Fitri Indra; Kwon, Joonho; Rao, Praveen

    2016-01-01

    The escalation of traffic congestion in urban cities has urged many countries to use intelligent transportation system (ITS) centers to collect historical traffic sensor data from multiple heterogeneous sources. By analyzing historical traffic data, we can obtain valuable insights into traffic behavior. Many existing applications have been proposed with limited analysis results because of the inability to cope with several types of analytical queries. In this paper, we propose the QET (querying and extracting timeline information) system—a novel analytical query processing method based on a timeline model for road traffic sensor data. To address query performance, we build a TQ-index (timeline query-index) that exploits spatio-temporal features of timeline modeling. We also propose an intuitive timeline visualization method to display congestion events obtained from specified query parameters. In addition, we demonstrate the benefit of our system through a performance evaluation using a Busan ITS dataset and a Seattle freeway dataset. PMID:27563900

  13. A novel adaptive Cuckoo search for optimal query plan generation.

    Science.gov (United States)

    Gomathi, Ramalingam; Sharmila, Dhandapani

    2014-01-01

    The emergence of multiple web pages day by day leads to the development of the semantic web technology. A World Wide Web Consortium (W3C) standard for storing semantic web data is the resource description framework (RDF). To enhance the efficiency in the execution time for querying large RDF graphs, the evolving metaheuristic algorithms become an alternate to the traditional query optimization methods. This paper focuses on the problem of query optimization of semantic web data. An efficient algorithm called adaptive Cuckoo search (ACS) for querying and generating optimal query plan for large RDF graphs is designed in this research. Experiments were conducted on different datasets with varying number of predicates. The experimental results have exposed that the proposed approach has provided significant results in terms of query execution time. The extent to which the algorithm is efficient is tested and the results are documented.

  14. A Novel Adaptive Cuckoo Search for Optimal Query Plan Generation

    Directory of Open Access Journals (Sweden)

    Ramalingam Gomathi

    2014-01-01

    Full Text Available The emergence of multiple web pages day by day leads to the development of the semantic web technology. A World Wide Web Consortium (W3C standard for storing semantic web data is the resource description framework (RDF. To enhance the efficiency in the execution time for querying large RDF graphs, the evolving metaheuristic algorithms become an alternate to the traditional query optimization methods. This paper focuses on the problem of query optimization of semantic web data. An efficient algorithm called adaptive Cuckoo search (ACS for querying and generating optimal query plan for large RDF graphs is designed in this research. Experiments were conducted on different datasets with varying number of predicates. The experimental results have exposed that the proposed approach has provided significant results in terms of query execution time. The extent to which the algorithm is efficient is tested and the results are documented.

  15. Structured Query Translation in Peer to Peer Database Sharing Systems

    Directory of Open Access Journals (Sweden)

    Mehedi Masud

    2009-10-01

    Full Text Available This paper presents a query translation mechanism between heterogeneous peers in Peer to Peer Database Sharing Systems (PDSSs. A PDSS combines a database management system with P2P functionalities. The local databases on peers are called peer databases. In a PDSS, each peer chooses its own data model and schema and maintains data independently without any global coordinator. One of the problems in such a system is translating queries between peers, taking into account both the schema and data heterogeneity. Query translation is the problem of rewriting a query posed in terms of one peer schema to a query in terms of another peer schema. This paper proposes a query translation mechanism between peers where peers are acquainted in data sharing systems through data-level mappings for sharing data.

  16. RCQ-GA: RDF Chain Query Optimization Using Genetic Algorithms

    Science.gov (United States)

    Hogenboom, Alexander; Milea, Viorel; Frasincar, Flavius; Kaymak, Uzay

    The application of Semantic Web technologies in an Electronic Commerce environment implies a need for good support tools. Fast query engines are needed for efficient querying of large amounts of data, usually represented using RDF. We focus on optimizing a special class of SPARQL queries, the so-called RDF chain queries. For this purpose, we devise a genetic algorithm called RCQ-GA that determines the order in which joins need to be performed for an efficient evaluation of RDF chain queries. The approach is benchmarked against a two-phase optimization algorithm, previously proposed in literature. The more complex a query is, the more RCQ-GA outperforms the benchmark in solution quality, execution time needed, and consistency of solution quality. When the algorithms are constrained by a time limit, the overall performance of RCQ-GA compared to the benchmark further improves.

  17. Evaluation of Sub Query Performance in SQL Server

    Directory of Open Access Journals (Sweden)

    Oktavia Tanty

    2014-03-01

    Full Text Available The paper explores several sub query methods used in a query and their impact on the query performance. The study uses experimental approach to evaluate the performance of each sub query methods combined with indexing strategy. The sub query methods consist of in, exists, relational operator and relational operator combined with top operator. The experimental shows that using relational operator combined with indexing strategy in sub query has greater performance compared with using same method without indexing strategy and also other methods. In summary, for application that emphasized on the performance of retrieving data from database, it better to use relational operator combined with indexing strategy. This study is done on Microsoft SQL Server 2012.

  18. Multi-Dimensional Top-k Dominating Queries

    DEFF Research Database (Denmark)

    Yiu, Man Lung; Mamoulis, Nikos

    2009-01-01

    The top-k dominating query returns k data objects which dominate the highest number of objects in a dataset. This query is an important tool for decision support since it provides data analysts an intuitive way for finding significant objects. In addition, it combines the advantages of top......-k and skyline queries without sharing their disadvantages: (i) the output size can be controlled, (ii) no ranking functions need to be specified by users, and (iii) the result is independent of the scales at different dimensions. Despite their importance, top-k dominating queries have not received adequate...... of the query which considers dominance in dimensional subspaces. Experiments using synthetic and real datasets demonstrate that our algorithms significantly outperform a previous skyline-based approach. We also illustrate the applicability of this multi-dimensional analysis query by studying the meaningfulness...

  19. A Query Language for Handling Big Observation Data Sets in the Sensor Web

    Science.gov (United States)

    Autermann, Christian; Stasch, Christoph; Jirka, Simon; Koppe, Roland

    2017-04-01

    The Sensor Web provides a framework for the standardized Web-based sharing of environmental observations and sensor metadata. While the issue of varying data formats and protocols is addressed by these standards, the fast growing size of observational data is imposing new challenges for the application of these standards. Most solutions for handling big observational datasets currently focus on remote sensing applications, while big in-situ datasets relying on vector features still lack a solid approach. Conventional Sensor Web technologies may not be adequate, as the sheer size of the data transmitted and the amount of metadata accumulated may render traditional OGC Sensor Observation Services (SOS) unusable. Besides novel approaches to store and process observation data in place, e.g. by harnessing big data technologies from mainstream IT, the access layer has to be amended to utilize and integrate these large observational data archives into applications and to enable analysis. For this, an extension to the SOS will be discussed that establishes a query language to dynamically process and filter observations at storage level, similar to the OGC Web Coverage Service (WCS) and it's Web Coverage Processing Service (WCPS) extension. This will enable applications to request e.g. spatial or temporal aggregated data sets in a resolution it is able to display or it requires. The approach will be developed and implemented in cooperation with the The Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research whose catalogue of data compromises marine observations of physical, chemical and biological phenomena from a wide variety of sensors, including mobile (like research vessels, aircrafts or underwater vehicles) and stationary (like buoys or research stations). Observations are made with a high temporal resolution and the resulting time series may span multiple decades.

  20. VMQL: A Visual Language for Ad-Hoc Model Querying

    DEFF Research Database (Denmark)

    Störrle, Harald

    2011-01-01

    facilities are inadequate. The Visual Model Query Language (VMQL) is a novel approach that uses the respective modeling language of the source model as the query language, too. The semantics of VMQL is defined formally based on graphs, so that query execution can be defined as graph matching. VMQL has been...... applied to several visual modeling languages, implemented, and validated in small case studies, and several controlled experiments....

  1. Adaptive and Optimized RDF Query Interface for Distributed WFS Data

    Directory of Open Access Journals (Sweden)

    Tian Zhao

    2017-04-01

    Full Text Available Web Feature Service (WFS is a protocol for accessing geospatial data stores such as databases and Shapefiles over the Web. However, WFS does not provide direct access to data distributed in multiple servers. In addition, WFS features extracted from their original sources are not convenient for user access due to the lack of connection to high-level concepts. Users are facing the choices of either querying each WFS server first and then integrating the results, or converting the data from all WFS servers to a more expressive format such as RDF (Resource Description Framework and then querying the integrated data. The first choice requires additional programming while the second choice is not practical for large or frequently updated datasets. The new contribution of this paper is that we propose a novel adaptive and optimized RDF query interface to overcome the aforementioned limitation. Specifically, in this paper, we propose a novel algorithm to query and synthesize distributed WFS data through an RDF query interface, where users can specify data requests to multiple WFS servers using a single RDF query. Users can also define a simple configuration to associate WFS feature types, attributes, and values with RDF classes, properties, and values so that user queries can be written using a more uniform and informative vocabulary. The algorithm translates each RDF query written in SPARQL-like syntax to multiple WFS GetFeature requests, and then converts and integrates the multiple WFS results to get the answers to the original query. The generated GetFeature requests are sent asynchronously and simultaneously to WFS servers to take advantage of the server parallelism. The results of each GetFeature request are cached to improve query response time for subsequent queries that involve one or more of the cached requests. A JavaScript-based prototype is implemented and experimental results show that the query response time can be greatly reduced through

  2. Group-by Skyline Query Processing in Relational Engines

    DEFF Research Database (Denmark)

    Yiu, Man Lung; Luk, Ming-Hay; Lo, Eric

    2009-01-01

    The skyline operator was first proposed in 2001 for retrieving interesting tuples from a dataset. Since then, 100+ skyline-related papers have been published; however, we discovered that one of the most intuitive and practical type of skyline queries, namely, group-by skyline queries remains...... the missing cost model for the BBS algorithm. Experimental results show that our techniques are able to devise the best query plans for a variety of group-by skyline queries. Our focus is on algorithms that can be directly implemented in today's commercial database systems without the addition of new access...

  3. The Query Complexity of Finding a Hidden Permutation

    DEFF Research Database (Denmark)

    Afshani, Peyman; Afrawal, Manindra; Benjamin, Doerr

    2012-01-01

    We study the query complexity of determining a hidden permutation. More specifically, we study the problem of learning a secret (z) consisting of a binary string z of length n and a permutation of [n]. The secret must be unveiled by asking queries x01n , and for each query asked, we are returned ...... applications in many other query complexity problems.......We study the query complexity of determining a hidden permutation. More specifically, we study the problem of learning a secret (z) consisting of a binary string z of length n and a permutation of [n]. The secret must be unveiled by asking queries x01n , and for each query asked, we are returned...... the score fz(x) defined as fz(x):=maxi[0n]ji:z(j)=x(j); i.e., the length of the longest common prefix of x and z with respect to . The goal is to minimize the number of queries asked. Our main result are matching upper and lower bounds for this problem, both for deterministic and randomized query schemes...

  4. The effect of query complexity on Web searching results

    Directory of Open Access Journals (Sweden)

    B.J. Jansen

    2000-01-01

    Full Text Available This paper presents findings from a study of the effects of query structure on retrieval by Web search services. Fifteen queries were selected from the transaction log of a major Web search service in simple query form with no advanced operators (e.g., Boolean operators, phrase operators, etc. and submitted to 5 major search engines - Alta Vista, Excite, FAST Search, Infoseek, and Northern Light. The results from these queries became the baseline data. The original 15 queries were then modified using the various search operators supported by each of the 5 search engines for a total of 210 queries. Each of these 210 queries was also submitted to the applicable search service. The results obtained were then compared to the baseline results. A total of 2,768 search results were returned by the set of all queries. In general, increasing the complexity of the queries had little effect on the results with a greater than 70% overlap in results, on average. Implications for the design of Web search services and directions for future research are discussed.

  5. PAQ: Persistent Adaptive Query Middleware for Dynamic Environments

    Science.gov (United States)

    Rajamani, Vasanth; Julien, Christine; Payton, Jamie; Roman, Gruia-Catalin

    Pervasive computing applications often entail continuous monitoring tasks, issuing persistent queries that return continuously updated views of the operational environment. We present PAQ, a middleware that supports applications' needs by approximating a persistent query as a sequence of one-time queries. PAQ introduces an integration strategy abstraction that allows composition of one-time query responses into streams representing sophisticated spatio-temporal phenomena of interest. A distinguishing feature of our middleware is the realization that the suitability of a persistent query's result is a function of the application's tolerance for accuracy weighed against the associated overhead costs. In PAQ, programmers can specify an inquiry strategy that dictates how information is gathered. Since network dynamics impact the suitability of a particular inquiry strategy, PAQ associates an introspection strategy with a persistent query, that evaluates the quality of the query's results. The result of introspection can trigger application-defined adaptation strategies that alter the nature of the query. PAQ's simple API makes developing adaptive querying systems easily realizable. We present the key abstractions, describe their implementations, and demonstrate the middleware's usefulness through application examples and evaluation.

  6. Efficient Processing of Multiple DTW Queries in Time Series Databases

    DEFF Research Database (Denmark)

    Kremer, Hardy; Günnemann, Stephan; Ivanescu, Anca-Maria

    2011-01-01

    Dynamic Time Warping (DTW) is a widely used distance measure for time series that has been successfully used in science and many other application domains. As DTW is computationally expensive, there is a strong need for efficient query processing algorithms. Such algorithms exist for single queries....... In many of today’s applications, however, large numbers of queries arise at any given time. Existing DTW techniques do not process multiple DTW queries simultaneously, a serious limitation which slows down overall processing. In this paper, we propose an efficient processing approach for multiple DTW...

  7. An Approach to Assist Designers With Their Queries and Designs

    DEFF Research Database (Denmark)

    Ahmed, Saeema

    2006-01-01

    Recent research investigating how engineers search for information has concluded that engineering designers acquire assistance when formulating queries. An approach to assist designers with their queries is presented. This approach forms part of a knowledge management system, where indexed...... documents are entered in to a knowledge-based system and is generated dynamically. The network can be used to assist a designer in searching for information; reformulating a query and; to prompt design tasks. This paper presents an approach to prompt designers with their design queries, along with some...

  8. Web Database Schema Identification through Simple Query Interface

    Science.gov (United States)

    Lin, Ling; Zhou, Lizhu

    Web databases provide different types of query interfaces to access the data records stored in the backend databases. While most existing works exploit a complex query interface with multiple input fields to perform schema identification of the Web databases, little attention has been paid on how to identify the schema of web databases by simple query interface (SQI), which has only one single query text input field. This paper proposes a new method of instance-based query probing to identify WDBs' interface and result schema for SQI. The interface schema identification problem is defined as generating the fullcondition query of SQI and a novel query probing strategy is proposed. The result schema is also identified based on the result webpages of SQI's full-condition query, and an extended identification of the non-query attributes is proposed to improve the attribute recall rate. Experimental results on web databases of online shopping for book, movie and mobile phone show that our method is effective and efficient.

  9. Toward element-level interoperability in bibliographic metadata

    Directory of Open Access Journals (Sweden)

    Eric Childress

    2008-03-01

    Full Text Available This paper discusses an approach and set of tools for translating bibliographic metadata from one format to another. A computational model is proposed to formalize the notion of a 'crosswalk'. The translation process separates semantics from syntax, and specifies a crosswalk as machine executable translation files which are focused on assertions of element equivalence and are closely associated with the underlying intellectual analysis of metadata translation. A data model developed by the authors called Morfrom serves as an internal generic metadata format. Translation logic is written in an XML scripting language designed by the authors called the Semantic Equivalence Expression Language (Seel. These techniques have been built into an OCLC software toolkit to manage large and diverse collections of metadata records, called the Crosswalk Web Service.

  10. Requirements for multimedia metadata schemes in surveillance applications for security

    NARCIS (Netherlands)

    Rest, J.H.C. van; Grootjen, F.A.; Grootjen, M.; Wijn, R.; Aarts, O.A.J.; Roelofs, M.L.; Burghouts, G.J.; Bouma, H.; Alic, L.; Kraaij, W.

    2013-01-01

    Surveillance for security requires communication between systems and humans, involves behavioural and multimedia research, and demands an objective benchmarking for the performance of system components.Metadata representation schemes are extremely important to facilitate (system) interoperability

  11. Ontology-based Metadata Portal for Unified Semantics

    Data.gov (United States)

    National Aeronautics and Space Administration — The Ontology-based Metadata Portal for Unified Semantics (OlyMPUS) will extend the prototype Ontology-Driven Interactive Search Environment for Earth Sciences...

  12. Large geospatial images discovery: metadata model and technological framework

    Directory of Open Access Journals (Sweden)

    Lukáš Brůha

    2015-12-01

    Full Text Available The advancements in geospatial web technology triggered efforts for disclosure of valuable resources of historical collections. This paper focuses on the role of spatial data infrastructures (SDI in such efforts. The work describes the interplay between SDI technologies and potential use cases in libraries such as cartographic heritage. The metadata model is introduced to link up the sources from these two distinct fields. To enhance the data search capabilities, the work focuses on the representation of the content-based metadata of raster images, which is the crucial prerequisite to target the search in a more effective way. The architecture of the prototype system for automatic raster data processing, storage, analysis and distribution is introduced. The architecture responds to the characteristics of input datasets, namely to the continuous flow of very large raster data and related metadata. Proposed solutions are illustrated on the case study of cartometric analysis of digitised early maps and related metadata encoding.

  13. Distributed metadata in a high performance computing environment

    Science.gov (United States)

    Bent, John M.; Faibish, Sorin; Zhang, Zhenhua; Liu, Xuezhao; Tang, Haiying

    2017-07-11

    A computer-executable method, system, and computer program product for managing meta-data in a distributed storage system, wherein the distributed storage system includes one or more burst buffers enabled to operate with a distributed key-value store, the co computer-executable method, system, and computer program product comprising receiving a request for meta-data associated with a block of data stored in a first burst buffer of the one or more burst buffers in the distributed storage system, wherein the meta data is associated with a key-value, determining which of the one or more burst buffers stores the requested metadata, and upon determination that a first burst buffer of the one or more burst buffers stores the requested metadata, locating the key-value in a portion of the distributed key-value store accessible from the first burst buffer.

  14. Metadata and Metacognition: How can we stimulate reflection for learning?

    NARCIS (Netherlands)

    Specht, Marcus

    2012-01-01

    Specht, M. (2012, 12 September). Metadata and Metacognition: How can we stimulate reflection for learning? Invited presentation given at the seminar on awareness and reflection in learning at the University of Leuven, Leuven, Belgium.

  15. USGS 24k Digital Raster Graphic (DRG) Metadata

    Data.gov (United States)

    Minnesota Department of Natural Resources — Metadata for the scanned USGS 24k Topograpic Map Series (also known as 24k Digital Raster Graphic). Each scanned map is represented by a polygon in the layer and the...

  16. NNDSS - Table IV. Tuberculosis

    Data.gov (United States)

    U.S. Department of Health & Human Services — NNDSS - Table IV. Tuberculosis - 2016.This Table includes total number of cases reported in the United States, by region and by states, in accordance with the...

  17. NNDSS - Table II. Vibriosis

    Data.gov (United States)

    U.S. Department of Health & Human Services — NNDSS - Table II. Vibriosis - 2017. In this Table, provisional cases of selected notifiable diseases (≥1,000 cases reported during the preceding year), and selected...

  18. NNDSS - Table IV. Tuberculosis

    Data.gov (United States)

    U.S. Department of Health & Human Services — NNDSS - Table IV. Tuberculosis - 2014.This Table includes total number of cases reported in the United States, by region and by states, in accordance with the...

  19. NNDSS - Table III. Tuberculosis

    Data.gov (United States)

    U.S. Department of Health & Human Services — NNDSS - Table III. Tuberculosis - 2017.This Table includes total number of cases reported in the United States, by region and by states, in accordance with the...

  20. NNDSS - Table IV. Tuberculosis

    Data.gov (United States)

    U.S. Department of Health & Human Services — NNDSS - Table IV. Tuberculosis - 2015.This Table includes total number of cases reported in the United States, by region and by states, in accordance with the...

  1. Tabled Execution in Scheme

    Energy Technology Data Exchange (ETDEWEB)

    Willcock, J J; Lumsdaine, A; Quinlan, D J

    2008-08-19

    Tabled execution is a generalization of memorization developed by the logic programming community. It not only saves results from tabled predicates, but also stores the set of currently active calls to them; tabled execution can thus provide meaningful semantics for programs that seemingly contain infinite recursions with the same arguments. In logic programming, tabled execution is used for many purposes, both for improving the efficiency of programs, and making tasks simpler and more direct to express than with normal logic programs. However, tabled execution is only infrequently applied in mainstream functional languages such as Scheme. We demonstrate an elegant implementation of tabled execution in Scheme, using a mix of continuation-passing style and mutable data. We also show the use of tabled execution in Scheme for a problem in formal language and automata theory, demonstrating that tabled execution can be a valuable tool for Scheme users.

  2. NNDSS - Table II. Vibriosis

    Data.gov (United States)

    U.S. Department of Health & Human Services — NNDSS - Table II. Vibriosis - 2018. In this Table, provisional cases of selected notifiable diseases (≥1,000 cases reported during the preceding year), and selected...

  3. Pension Insurance Data Tables

    Data.gov (United States)

    Pension Benefit Guaranty Corporation — Find out about retirement trends in PBGC's data tables. The tables include statistics on the people and pensions that PBGC protects, including how many Americans are...

  4. NNDSS - Table II. Vibriosis

    Data.gov (United States)

    U.S. Department of Health & Human Services — NNDSS - Table II. Vibriosis - 2018. In this Table, provisional cases of selected notifiable diseases (≥1,000 cases reported during the preceding year), and...

  5. NNDSS - Table II. Vibriosis

    Data.gov (United States)

    U.S. Department of Health & Human Services — NNDSS - Table II. Vibriosis - 2017. In this Table, provisional cases of selected notifiable diseases (≥1,000 cases reported during the preceding year), and...

  6. Tilt Table Test

    Science.gov (United States)

    ... test may also be appropriate to investigate the cause of fainting if you've fainted only once, but another ... recommend a tilt table test to evaluate the cause of syncope. A tilt table test may also be recommended ...

  7. The ATLAS EventIndex: data flow and inclusion of other metadata

    CERN Document Server

    Prokoshin, Fedor; The ATLAS collaboration; Cardenas Zarate, Simon Ernesto; Favareto, Andrea; Fernandez Casani, Alvaro; Gallas, Elizabeth; Garcia Montoro, Carlos; Gonzalez de la Hoz, Santiago; Hrivnac, Julius; Malon, David; Salt, Jose; Sanchez, Javier; Toebbicke, Rainer; Yuan, Ruijun

    2016-01-01

    The ATLAS EventIndex is the catalogue of the event-related metadata for the information obtained from the ATLAS detector. The basic unit of this information is event record, containing the event identification parameters, pointers to the files containing this event as well as trigger decision information. The main use case for the EventIndex are the event picking, providing information for the Event Service and data consistency checks for large production campaigns. The EventIndex employs the Hadoop platform for data storage and handling, as well as a messaging system for the collection of information. The information for the EventIndex is collected both at Tier-0, when the data are first produced, and from the GRID, when various types of derived data are produced. The EventIndex uses various types of auxiliary information from other ATLAS sources for data collection and processing: trigger tables from the condition metadata database (COMA), dataset information from the data catalog AMI and the Rucio data man...

  8. The ATLAS EventIndex: data flow and inclusion of other metadata

    CERN Document Server

    AUTHOR|(INSPIRE)INSPIRE-00064378; Cardenas Zarate, Simon Ernesto; Favareto, Andrea; Fernandez Casani, Alvaro; Gallas, Elizabeth; Garcia Montoro, Carlos; Gonzalez de la Hoz, Santiago; Hrivnac, Julius; Malon, David; Prokoshin, Fedor; Salt, Jose; Sanchez, Javier; Toebbicke, Rainer; Yuan, Ruijun

    2016-01-01

    The ATLAS EventIndex is the catalogue of the event-related metadata for the information collected from the ATLAS detector. The basic unit of this information is the event record, containing the event identification parameters, pointers to the files containing this event as well as trigger decision information. The main use case for the EventIndex is event picking, as well as data consistency checks for large production campaigns. The EventIndex employs the Hadoop platform for data storage and handling, as well as a messaging system for the collection of information. The information for the EventIndex is collected both at Tier-0, when the data are first produced, and from the Grid, when various types of derived data are produced. The EventIndex uses various types of auxiliary information from other ATLAS sources for data collection and processing: trigger tables from the condition metadata database (COMA), dataset information from the data catalogue AMI and the Rucio data management system and information on p...

  9. Assigning creative commons licenses to research metadata: issues and cases

    OpenAIRE

    Poblet, Marta

    2016-01-01

    This paper discusses the problem of lack of clear licensing and transparency of usage terms and conditions for research metadata. Making research data connected, discoverable and reusable are the key enablers of the new data revolution in research. We discuss how the lack of transparency hinders discovery of research data and make it disconnected from the publication and other trusted research outcomes. In addition, we discuss the application of Creative Commons licenses for research metadata...

  10. Massive Meta-Data: A New Data Mining Resource

    Science.gov (United States)

    Hugo, W.

    2012-04-01

    Worldwide standardisation, and interoperability initiatives such as GBIF, Open Access and GEOSS (to name but three of many) have led to the emergence of interlinked and overlapping meta-data repositories containing, potentially, tens of millions of entries collectively. This forms the backbone of an emerging global scientific data infrastructure that is both driven by changes in the way we work, and opens up new possibilities in management, research, and collaboration. Several initiatives are concentrated on building a generalised, shared, easily available, scalable, and indefinitely preserved scientific data infrastructure to aid future scientific work. This paper deals with the parallel aspect of the meta-data that will be used to support the global scientific data infrastructure. There are obvious practical issues (semantic interoperability and speed of discovery being the most important), but we are here more concerned with some of the less obvious conceptual questions and opportunities: 1. Can we use meta-data to assess, pinpoint, and reduce duplication of meta-data? 2. Can we use it to reduce overlaps of mandates in data portals, research collaborations, and research networks? 3. What possibilities exist for mining the relationships that exist implicitly in very large meta-data collections? 4. Is it possible to define an explicit 'scientific data infrastructure' as a complex, multi-relational network database, that can become self-maintaining and self-organising in true Web 2.0 and 'social networking' fashion? The paper provides a blueprint for a new approach to massive meta-data collections, and how this can be processed using established analysis techniques to answer the questions posed. It assesses the practical implications of working with standard meta-data definitions (such as ISO 19115, Dublin Core, and EML) in a meta-data mining context, and makes recommendations in respect of extension to support self-organising, semantically oriented 'networks of

  11. Table Tennis Club

    CERN Multimedia

    Table Tennis Club

    2013-01-01

    Apparently table tennis plays an important role in physics, not so much because physicists are interested in the theory of table tennis ball scattering, but probably because it provides useful breaks from their deep intellectual occupation. It seems that many of the greatest physicists took table tennis very seriously. For instance, Heisenberg could not even bear to lose a game of table tennis, Otto Frisch played a lot of table tennis, and had a table set up in his library, and Niels Bohr apparently beat everybody at table tennis. Therefore, as the CERN Table Tennis Club advertises on a poster for the next CERN Table Tennis Tournament: “if you want to be a great physicist, perhaps you should play table tennis”. Outdoor table at restaurant n° 1 For this reason, and also as part of the campaign launched by the CERN medical service “Move! & Eat better”, to encourage everyone at CERN to take regular exercise, the CERN Table Tennis Club, with the supp...

  12. Periodic Table of Students.

    Science.gov (United States)

    Johnson, Mike

    1998-01-01

    Presents an exercise in which an eighth-grade science teacher decorated the classroom with a periodic table of students. Student photographs were arranged according to similarities into vertical columns. Students were each assigned an atomic number according to their placement in the table. The table is then used to teach students about…

  13. AcuTable

    DEFF Research Database (Denmark)

    Dibbern, Simon; Rasmussen, Kasper Vestergaard; Ortiz-Arroyo, Daniel

    2017-01-01

    In this paper we describe AcuTable, a new tangible user interface. AcuTable is a shapeable surface that employs capacitive touch sensors. The goal of AcuTable was to enable the exploration of the capabilities of such haptic interface and its applications. We describe its design and implementation...

  14. Query by image example: The CANDID approach

    Energy Technology Data Exchange (ETDEWEB)

    Kelly, P.M.; Cannon, M. [Los Alamos National Lab., NM (United States). Computer Research and Applications Group; Hush, D.R. [Univ. of New Mexico, Albuquerque, NM (United States). Dept. of Electrical and Computer Engineering

    1995-02-01

    CANDID (Comparison Algorithm for Navigating Digital Image Databases) was developed to enable content-based retrieval of digital imagery from large databases using a query-by-example methodology. A user provides an example image to the system, and images in the database that are similar to that example are retrieved. The development of CANDID was inspired by the N-gram approach to document fingerprinting, where a ``global signature`` is computed for every document in a database and these signatures are compared to one another to determine the similarity between any two documents. CANDID computes a global signature for every image in a database, where the signature is derived from various image features such as localized texture, shape, or color information. A distance between probability density functions of feature vectors is then used to compare signatures. In this paper, the authors present CANDID and highlight two results from their current research: subtracting a ``background`` signature from every signature in a database in an attempt to improve system performance when using inner-product similarity measures, and visualizing the contribution of individual pixels in the matching process. These ideas are applicable to any histogram-based comparison technique.

  15. Query-Driven Visualization and Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Ruebel, Oliver; Bethel, E. Wes; Prabhat, Mr.; Wu, Kesheng

    2012-11-01

    This report focuses on an approach to high performance visualization and analysis, termed query-driven visualization and analysis (QDV). QDV aims to reduce the amount of data that needs to be processed by the visualization, analysis, and rendering pipelines. The goal of the data reduction process is to separate out data that is "scientifically interesting'' and to focus visualization, analysis, and rendering on that interesting subset. The premise is that for any given visualization or analysis task, the data subset of interest is much smaller than the larger, complete data set. This strategy---extracting smaller data subsets of interest and focusing of the visualization processing on these subsets---is complementary to the approach of increasing the capacity of the visualization, analysis, and rendering pipelines through parallelism. This report discusses the fundamental concepts in QDV, their relationship to different stages in the visualization and analysis pipelines, and presents QDV's application to problems in diverse areas, ranging from forensic cybersecurity to high energy physics.

  16. A Metadata Schema for Geospatial Resource Discovery Use Cases

    Directory of Open Access Journals (Sweden)

    Darren Hardy

    2014-07-01

    Full Text Available We introduce a metadata schema that focuses on GIS discovery use cases for patrons in a research library setting. Text search, faceted refinement, and spatial search and relevancy are among GeoBlacklight's primary use cases for federated geospatial holdings. The schema supports a variety of GIS data types and enables contextual, collection-oriented discovery applications as well as traditional portal applications. One key limitation of GIS resource discovery is the general lack of normative metadata practices, which has led to a proliferation of metadata schemas and duplicate records. The ISO 19115/19139 and FGDC standards specify metadata formats, but are intricate, lengthy, and not focused on discovery. Moreover, they require sophisticated authoring environments and cataloging expertise. Geographic metadata standards target preservation and quality measure use cases, but they do not provide for simple inter-institutional sharing of metadata for discovery use cases. To this end, our schema reuses elements from Dublin Core and GeoRSS to leverage their normative semantics, community best practices, open-source software implementations, and extensive examples already deployed in discovery contexts such as web search and mapping. Finally, we discuss a Solr implementation of the schema using a "geo" extension to MODS.

  17. Using Metadata to Build Geographic Information Sharing Environment on Internet

    Directory of Open Access Journals (Sweden)

    Chih-hong Sun

    1999-12-01

    Full Text Available Internet provides a convenient environment to share geographic information. Web GIS (Geographic Information System even provides users a direct access environment to geographic databases through Internet. However, the complexity of geographic data makes it difficult for users to understand the real content and the limitation of geographic information. In some cases, users may misuse the geographic data and make wrong decisions. Meanwhile, geographic data are distributed across various government agencies, academic institutes, and private organizations, which make it even more difficult for users to fully understand the content of these complex data. To overcome these difficulties, this research uses metadata as a guiding mechanism for users to fully understand the content and the limitation of geographic data. We introduce three metadata standards commonly used for geographic data and metadata authoring tools available in the US. We also review the current development of geographic metadata standard in Taiwan. Two metadata authoring tools are developed in this research, which will enable users to build their own geographic metadata easily.[Article content in Chinese

  18. Forensic devices for activism: Metadata tracking and public proof

    Directory of Open Access Journals (Sweden)

    Lonneke van der Velden

    2015-10-01

    Full Text Available The central topic of this paper is a mobile phone application, ‘InformaCam’, which turns metadata from a surveillance risk into a method for the production of public proof. InformaCam allows one to manage and delete metadata from images and videos in order to diminish surveillance risks related to online tracking. Furthermore, it structures and stores the metadata in such a way that the documentary material becomes better accommodated to evidentiary settings, if needed. In this paper I propose InformaCam should be interpreted as a ‘forensic device’. By using the conceptualization of forensics and work on socio-technical devices the paper discusses how InformaCam, through a range of interventions, rearranges metadata into a technology of evidence. InformaCam explicitly recognizes mobile phones as context aware, uses their sensors, and structures metadata in order to facilitate data analysis after images are captured. Through these modifications it invents a form of ‘sensory data forensics'. By treating data in this particular way, surveillance resistance does more than seeking awareness. It becomes engaged with investigatory practices. Considering the extent by which states conduct metadata surveillance, the project can be seen as a timely response to the unequal distribution of power over data.

  19. On (dynamic) range minimum queries in external memory

    DEFF Research Database (Denmark)

    Arge, L.; Fischer, Johannes; Sanders, Peter

    2013-01-01

    We study the one-dimensional range minimum query (RMQ) problem in the external memory model. We provide the first space-optimal solution to the batched static version of the problem. On an instance with N elements and Q queries, our solution takes Θ(sort(N + Q)) = Θ( N+QB log M /B N+QB ) I...

  20. Modeling Large Time Series for Efficient Approximate Query Processing

    DEFF Research Database (Denmark)

    Perera, Kasun S; Hahmann, Martin; Lehner, Wolfgang

    2015-01-01

    -wise aggregation to derive the models. These models are initially created from the original data and are kept in the database along with it. Subsequent queries are answered using the stored models rather than scanning and processing the original datasets. In order to support model query processing, we maintain...

  1. Video Stream Retrieval of Unseen Queries using Semantic Memory

    NARCIS (Netherlands)

    Cappallo, S.; Mensink, T.; Snoek, C.G.M.; Wilson, R.C.; Hancock, E.R.; Smith, W.A.P.

    2016-01-01

    Retrieval of live, user-broadcast video streams is an under-addressed and increasingly relevant challenge. The on-line nature of the problem requires temporal evaluation and the unforeseeable scope of potential queries motivates an approach which can accommodate arbitrary search queries. To account

  2. Real SQL queries 50 challenges : practice for reporting and analysis

    CERN Document Server

    Cohen, Brian; Mishra, Neerja

    2015-01-01

    Queries improve when challenges are authentic. This book sets your learning on the fast track with realistic problems to solve. Topics span sales, marketing, human resources, purchasing, and production. Real SQL Queries: 50 Challenges is perfect for analysts, report writers, or anyone searching for a hands-on approach to learning SQL Server.

  3. The Odyssey Approach for Optimizing Federated SPARQL Queries

    DEFF Research Database (Denmark)

    Montoya, Gabriela; Skaf-Molli, Hala; Hose, Katja

    2017-01-01

    because (ii) there is only limited access to statistics about schema and instance data of remote sources. To overcome these challenges, most federated query engines rely on heuristics to reduce the space of possible query execution plans or on dynamic programming strategies to produce optimal plans...

  4. Formal specification of a query expression generator using RSL ...

    African Journals Online (AJOL)

    Formal methods are used for the specification of the query generator which is not the usual practice in the specification of query generators. We use RSL, the RAISE Specification Language, to formally specify our generator. From the specification, an implementation of our generator is generated in C++ using a command ...

  5. Query Classification and Study of University Students' Search Trends

    Science.gov (United States)

    Maabreh, Majdi A.; Al-Kabi, Mohammed N.; Alsmadi, Izzat M.

    2012-01-01

    Purpose: This study is an attempt to develop an automatic identification method for Arabic web queries and divide them into several query types using data mining. In addition, it seeks to evaluate the impact of the academic environment on using the internet. Design/methodology/approach: The web log files were collected from one of the higher…

  6. Investigating queries and search failures in academic search

    NARCIS (Netherlands)

    Li, X.; Schijvenaars, B.J.A.; de Rijke, M.

    Academic search concerns the retrieval and profiling of information objects in the domain of academic research. In this paper we reveal important observations of academic search queries, and provide an algorithmic solution to address a type of failure during search sessions: null queries. We start

  7. Efficient external memory structures for range-aggregate queries

    DEFF Research Database (Denmark)

    Agarwal, P.K.; Yang, J.; Arge, L.

    2013-01-01

    We present external memory data structures for efficiently answering range-aggregate queries. The range-aggregate problem is defined as follows: Given a set of weighted points in Rd, compute the aggregate of the weights of the points that lie inside a d-dimensional orthogonal query rectangle. The...

  8. Top-k Spatial Preference Queries in Directed Road Networks

    Directory of Open Access Journals (Sweden)

    Muhammad Attique

    2016-09-01

    Full Text Available Top-k spatial preference queries rank objects based on the score of feature objects in their spatial neighborhood. Top-k preference queries are crucial for a wide range of location based services such as hotel browsing and apartment searching. In recent years, a lot of research has been conducted on processing of top-k spatial preference queries in Euclidean space. While few algorithms study top-k preference queries in road networks, they all focus on undirected road networks. In this paper, we investigate the problem of processing the top-k spatial preference queries in directed road networks where each road segment has a particular orientation. Computation of data object scores requires examining the scores of each feature object in its spatial neighborhood. This may cause the computational delay, thus resulting in a high query processing time. In this paper, we address this problem by proposing a pruning and grouping of feature objects to reduce the number of feature objects. Furthermore, we present an efficient algorithm called TOPS that can process top-k spatial preference queries in directed road networks. Experimental results indicate that our algorithm significantly reduces the query processing time compared to period solution for a wide range of problem settings.

  9. Mining Web Query Logs to Analyze Political Issues

    NARCIS (Netherlands)

    Weber, I.; Garimella, V.R.K.; Borra, E.; Contractor, N.; Uzzi, B.

    2012-01-01

    We present a novel approach to using anonymized web search query logs to analyze and visualize political issues. Our starting point is a list of politically annotated blogs (left vs. right). We use this list to assign a numerical political leaning to queries leading to clicks on these blogs.

  10. On the Suitability of Skyline Queries for Data Exploration

    DEFF Research Database (Denmark)

    Chester, Sean; Mortensen, Michael Lind; Assent, Ira

    2014-01-01

    The skyline operator has been studied in database research for multi-criteria decision making. Until now the focus has been on the efficiency or accuracy of single queries. In practice, however, users are increasingly confronted with unknown data collections, where precise query formulation proves...

  11. A framework for query optimization to support data mining

    NARCIS (Netherlands)

    S.R. Choenni (Sunil); A.P.J.M. Siebes (Arno)

    1996-01-01

    textabstractIn order to extract knowledge from databases, data mining algorithms heavily query the databases. Inefficient processing of these queries will inevitably have its impact on the performance of these algorithms, making them less valuable. In this paper, we describe an optimization

  12. Evolving Metadata in NASA Earth Science Data Systems

    Science.gov (United States)

    Mitchell, A.; Cechini, M. F.; Walter, J.

    2011-12-01

    NASA's Earth Observing System (EOS) is a coordinated series of satellites for long term global observations. NASA's Earth Observing System Data and Information System (EOSDIS) is a petabyte-scale archive of environmental data that supports global climate change research by providing end-to-end services from EOS instrument data collection to science data processing to full access to EOS and other earth science data. On a daily basis, the EOSDIS ingests, processes, archives and distributes over 3 terabytes of data from NASA's Earth Science missions representing over 3500 data products ranging from various types of science disciplines. EOSDIS is currently comprised of 12 discipline specific data centers that are collocated with centers of science discipline expertise. Metadata is used in all aspects of NASA's Earth Science data lifecycle from the initial measurement gathering to the accessing of data products. Missions use metadata in their science data products when describing information such as the instrument/sensor, operational plan, and geographically region. Acting as the curator of the data products, data centers employ metadata for preservation, access and manipulation of data. EOSDIS provides a centralized metadata repository called the Earth Observing System (EOS) ClearingHouse (ECHO) for data discovery and access via a service-oriented-architecture (SOA) between data centers and science data users. ECHO receives inventory metadata from data centers who generate metadata files that complies with the ECHO Metadata Model. NASA's Earth Science Data and Information System (ESDIS) Project established a Tiger Team to study and make recommendations regarding the adoption of the international metadata standard ISO 19115 in EOSDIS. The result was a technical report recommending an evolution of NASA data systems towards a consistent application of ISO 19115 and related standards including the creation of a NASA-specific convention for core ISO 19115 elements. Part of

  13. Automated metadata--final project report

    Energy Technology Data Exchange (ETDEWEB)

    Schissel, David [General Atomics, San Diego, CA (United States)

    2016-04-01

    This report summarizes the work of the Automated Metadata, Provenance Cataloging, and Navigable Interfaces: Ensuring the Usefulness of Extreme-Scale Data Project (MPO Project) funded by the United States Department of Energy (DOE), Offices of Advanced Scientific Computing Research and Fusion Energy Sciences. Initially funded for three years starting in 2012, it was extended for 6 months with additional funding. The project was a collaboration between scientists at General Atomics, Lawrence Berkley National Laboratory (LBNL), and Massachusetts Institute of Technology (MIT). The group leveraged existing computer science technology where possible, and extended or created new capabilities where required. The MPO project was able to successfully create a suite of software tools that can be used by a scientific community to automatically document their scientific workflows. These tools were integrated into workflows for fusion energy and climate research illustrating the general applicability of the project’s toolkit. Feedback was very positive on the project’s toolkit and the value of such automatic workflow documentation to the scientific endeavor.

  14. Automated metadata--final project report

    International Nuclear Information System (INIS)

    Schissel, David

    2016-01-01

    This report summarizes the work of the Automated Metadata, Provenance Cataloging, and Navigable Interfaces: Ensuring the Usefulness of Extreme-Scale Data Project (MPO Project) funded by the United States Department of Energy (DOE), Offices of Advanced Scientific Computing Research and Fusion Energy Sciences. Initially funded for three years starting in 2012, it was extended for 6 months with additional funding. The project was a collaboration between scientists at General Atomics, Lawrence Berkley National Laboratory (LBNL), and Massachusetts Institute of Technology (MIT). The group leveraged existing computer science technology where possible, and extended or created new capabilities where required. The MPO project was able to successfully create a suite of software tools that can be used by a scientific community to automatically document their scientific workflows. These tools were integrated into workflows for fusion energy and climate research illustrating the general applicability of the project's toolkit. Feedback was very positive on the project's toolkit and the value of such automatic workflow documentation to the scientific endeavor.

  15. Better Living Through Metadata: Examining Archive Usage

    Science.gov (United States)

    Becker, G.; Winkelman, S.; Rots, A.

    2013-10-01

    The primary purpose of an observatory's archive is to provide access to the data through various interfaces. User interactions with the archive are recorded in server logs, which can be used to answer basic questions like: Who has downloaded dataset X? When did she do this? Which tools did she use? The answers to questions like these fill in patterns of data access (e.g., how many times dataset X has been downloaded in the past three years). Analysis of server logs provides metrics of archive usage and provides feedback on interface use which can be used to guide future interface development. The Chandra X-ray Observatory is fortunate in that a database to track data access and downloads has been continuously recording such transactions for years; however, it is overdue for an update. We will detail changes we hope to effect and the differences the changes may make to our usage metadata picture. We plan to gather more information about the geographic location of users without compromising privacy; create improved archive statistics; and track and assess the impact of web “crawlers” and other scripted access methods on the archive. With the improvements to our download tracking we hope to gain a better understanding of the dissemination of Chandra's data; how effectively it is being done; and perhaps discover ideas for new services.

  16. Educational Rationale Metadata for Learning Objects

    Directory of Open Access Journals (Sweden)

    Tom Carey

    2002-10-01

    Full Text Available Instructors searching for learning objects in online repositories will be guided in their choices by the content of the object, the characteristics of the learners addressed, and the learning process embodied in the object. We report here on a feasibility study for metadata to record process-oriented information about instructional approaches for learning objects, though a set of Educational Rationale [ER] tags which would allow authors to describe the critical elements in their design intent. The prototype ER tags describe activities which have been demonstrated to be of value in learning, and authors select the activities whose support was critical in their design decisions. The prototype ER tag set consists descriptors of the instructional approach used in the design, plus optional sub-elements for Comments, Importance and Features which implement the design intent. The tag set was tested by creators of four learning object modules, three intended for post-secondary learners and one for K-12 students and their families. In each case the creators reported that the ER tag set allowed them to express succinctly the key instructional approaches embedded in their designs. These results confirmed the overall feasibility of the ER tag approach as a means of capturing design intent from creators of learning objects. Much work remains to be done before a usable ER tag set could be specified, including evaluating the impact of ER tags during design to improve instructional quality of learning objects.

  17. Database Description - Nikkaji-InChI Mapping Table | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available chemical substances identifier), InChI, and InChIKey in Nikkaji ( http://doi.org/10.15079/NIKKAJI ) which is..., Chikako MAEDA Journal: Journal of Information Processing and Management Vol. 48 (2005) No. 4 p. 220-225. ( http://doi.org...se maintenance site Japan Science and Technology Agency (JST) URL of the original website http://doi.org...ahiro KIMURA, Tatsuya KUSHIDA Journal: Journal of Information Processing and Management Vol. 58 (2015) No. 3 p. 204-212. ( http://doi....org/10.1241/johokanri.58.204 ) External Links: Original website information Databa

  18. ORF Table - KAIKOcDNA | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available [ Credits ] English ]; } else if ( url.search(//en//) != -1 ) { url = url.replace(/en/,/jp/); document...//) != -1 ) { url = url.replace(/contents/,/contents-en/); document.getElementById(lang).innerHTML=[ Japanes...e(/contents-en/,/contents/); document.getElementById(lang).innerHTML=[ Japanese | English ]; } else if( url....search(/contents-en/) != -1 || url.search(/index-e.html/) != -1 ) { document.getElementById(lang).innerHTML=...h)-e.html/) != -1 ) { url = url.replace(-e.html,.html); document.getElementById(lang).innerHTML=[ Japanese | English ]; } else { docu

  19. QTL Information Table - Q-TARO | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available stage.jst.go.jp/browse/-char/ja ) - the Rice Genetics Newsletter ( http://www.gramene.org/newsletters/rice_g...Med, HighWire) - RGN (Rice Genetics Newsletter) Reference no. Number assigned to

  20. Pareto-depth for multiple-query image retrieval.

    Science.gov (United States)

    Hsiao, Ko-Jen; Calder, Jeff; Hero, Alfred O

    2015-02-01

    Most content-based image retrieval systems consider either one single query, or multiple queries that include the same object or represent the same semantic information. In this paper, we consider the content-based image retrieval problem for multiple query images corresponding to different image semantics. We propose a novel multiple-query information retrieval algorithm that combines the Pareto front method with efficient manifold ranking. We show that our proposed algorithm outperforms state of the art multiple-query retrieval algorithms on real-world image databases. We attribute this performance improvement to concavity properties of the Pareto fronts, and prove a theoretical result that characterizes the asymptotic concavity of the fronts.

  1. Query Log Analysis of an Electronic Health Record Search Engine

    Science.gov (United States)

    Yang, Lei; Mei, Qiaozhu; Zheng, Kai; Hanauer, David A.

    2011-01-01

    We analyzed a longitudinal collection of query logs of a full-text search engine designed to facilitate information retrieval in electronic health records (EHR). The collection, 202,905 queries and 35,928 user sessions recorded over a course of 4 years, represents the information-seeking behavior of 533 medical professionals, including frontline practitioners, coding personnel, patient safety officers, and biomedical researchers for patient data stored in EHR systems. In this paper, we present descriptive statistics of the queries, a categorization of information needs manifested through the queries, as well as temporal patterns of the users’ information-seeking behavior. The results suggest that information needs in medical domain are substantially more sophisticated than those that general-purpose web search engines need to accommodate. Therefore, we envision there exists a significant challenge, along with significant opportunities, to provide intelligent query recommendations to facilitate information retrieval in EHR. PMID:22195150

  2. Macromolecular query language (MMQL): prototype data model and implementation.

    Science.gov (United States)

    Shindyalov, I N; Chang, W; Pu, C; Bourne, P E

    1994-11-01

    Macromolecular query language (MMQL) is an extensible interpretive language in which to pose questions concerning the experimental or derived features of the 3-D structure of biological macromolecules. MMQL portends to be intuitive with a simple syntax, so that from a user's perspective complex queries are easily written. A number of basic queries and a more complex query--determination of structures containing a five-strand Greek key motif--are presented to illustrate the strengths and weaknesses of the language. The predominant features of MMQL are a filter and pattern grammar which are combined to express a wide range of interesting biological queries. Filters permit the selection of object attributes, for example, compound name and resolution, whereas the patterns currently implemented query primary sequence, close contacts, hydrogen bonding, secondary structure, conformation and amino acid properties (volume, polarity, isoelectric point, hydrophobicity and different forms of exposure). MMQL queries are processed by MMQLlib; a C++ class library, to which new query methods and pattern types are easily added. The prototype implementation described uses PDBlib, another C(++)-based class library from representing the features of biological macromolecules at the level of detail parsable from a PDB file. Since PDBlib can represent data stored in relational and object-oriented databases, as well as PDB files, once these data are loaded they too can be queried by MMQL. Performance metrics are given for queries of PDB files for which all derived data are calculated at run time and compared to a preliminary version of OOPDB, a prototype object-oriented database with a schema based on a persistent version of PDBlib which offers more efficient data access and the potential to maintain derived information. MMQLlib, PDBlib and associated software are available via anonymous ftp from cuhhca.hhmi.columbia.edu.

  3. Processing SPARQL queries with regular expressions in RDF databases.

    Science.gov (United States)

    Lee, Jinsoo; Pham, Minh-Duc; Lee, Jihwan; Han, Wook-Shin; Cho, Hune; Yu, Hwanjo; Lee, Jeong-Hoon

    2011-03-29

    As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users' requests for extracting information from the RDF data as well as the lack of users' knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns.

  4. CERN Table Tennis Club

    CERN Multimedia

    CERN Table Tennis Club

    2014-01-01

    CERN Table Tennis Club Announcing CERN 60th Anniversary Table Tennis Tournament to take place at CERN, from July 1 to July 15, 2014   The CERN Table Tennis Club, reborn in 2008, is encouraging people at CERN to take more regular exercise. This is why the Club, thanks to the strong support of the CERN Staff Association, installed last season a first outdoor table on the terrace of restaurant # 1, and will install another one this season on the terrace of Restaurant # 2. Table tennis provides both physical exercise and friendly social interactions. The CERN Table Tennis club is happy to use the unique opportunity of the 60th CERN anniversary to promote table tennis at CERN, as it is a game that everybody can easily play, regardless of level. Table tennis is particularly well suited for CERN, as many great physicists play table tennis, as you might already know: “Heisenberg could not even bear to lose a game of table tennis”; “Otto Frisch played a lot of table tennis;...

  5. jQuery UI 1.7 the user interface library for jQuery

    CERN Document Server

    Wellman, Dan

    2009-01-01

    An example-based approach leads you step-by-step through the implementation and customization of each library component and its associated resources in turn. To emphasize the way that jQuery UI takes the difficulty out of user interface design and implementation, each chapter ends with a 'fun with' section that puts together what you've learned throughout the chapter to make a usable and fun page. In these sections you'll often get to experiment with the latest associated technologies like AJAX and JSON. This book is for front-end designers and developers who need to quickly learn how to use t

  6. CellMiner: a relational database and query tool for the NCI-60 cancer cell lines

    Directory of Open Access Journals (Sweden)

    Reinhold William C

    2009-06-01

    Full Text Available Abstract Background Advances in the high-throughput omic technologies have made it possible to profile cells in a large number of ways at the DNA, RNA, protein, chromosomal, functional, and pharmacological levels. A persistent problem is that some classes of molecular data are labeled with gene identifiers, others with transcript or protein identifiers, and still others with chromosomal locations. What has lagged behind is the ability to integrate the resulting data to uncover complex relationships and patterns. Those issues are reflected in full form by molecular profile data on the panel of 60 diverse human cancer cell lines (the NCI-60 used since 1990 by the U.S. National Cancer Institute to screen compounds for anticancer activity. To our knowledge, CellMiner is the first online database resource for integration of the diverse molecular types of NCI-60 and related meta data. Description CellMiner enables scientists to perform advanced querying of molecular information on NCI-60 (and additional types through a single web interface. CellMiner is a freely available tool that organizes and stores raw and normalized data that represent multiple types of molecular characterizations at the DNA, RNA, protein, and pharmacological levels. Annotations for each project, along with associated metadata on the samples and datasets, are stored in a MySQL database and linked to the molecular profile data. Data can be queried and downloaded along with comprehensive information on experimental and analytic methods for each data set. A Data Intersection tool allows selection of a list of genes (proteins in common between two or more data sets and outputs the data for those genes (proteins in the respective sets. In addition to its role as an integrative resource for the NCI-60, the CellMiner package also serves as a shell for incorporation of molecular profile data on other cell or tissue sample types. Conclusion CellMiner is a relational database tool for

  7. Using XML to encode TMA DES metadata.

    Science.gov (United States)

    Lyttleton, Oliver; Wright, Alexander; Treanor, Darren; Lewis, Paul

    2011-01-01

    The Tissue Microarray Data Exchange Specification (TMA DES) is an XML specification for encoding TMA experiment data. While TMA DES data is encoded in XML, the files that describe its syntax, structure, and semantics are not. The DTD format is used to describe the syntax and structure of TMA DES, and the ISO 11179 format is used to define the semantics of TMA DES. However, XML Schema can be used in place of DTDs, and another XML encoded format, RDF, can be used in place of ISO 11179. Encoding all TMA DES data and metadata in XML would simplify the development and usage of programs which validate and parse TMA DES data. XML Schema has advantages over DTDs such as support for data types, and a more powerful means of specifying constraints on data values. An advantage of RDF encoded in XML over ISO 11179 is that XML defines rules for encoding data, whereas ISO 11179 does not. We created an XML Schema version of the TMA DES DTD. We wrote a program that converted ISO 11179 definitions to RDF encoded in XML, and used it to convert the TMA DES ISO 11179 definitions to RDF. We validated a sample TMA DES XML file that was supplied with the publication that originally specified TMA DES using our XML Schema. We successfully validated the RDF produced by our ISO 11179 converter with the W3C RDF validation service. All TMA DES data could be encoded using XML, which simplifies its processing. XML Schema allows datatypes and valid value ranges to be specified for CDEs, which enables a wider range of error checking to be performed using XML Schemas than could be performed using DTDs.

  8. Using XML to encode TMA DES metadata

    Directory of Open Access Journals (Sweden)

    Oliver Lyttleton

    2011-01-01

    Full Text Available Background: The Tissue Microarray Data Exchange Specification (TMA DES is an XML specification for encoding TMA experiment data. While TMA DES data is encoded in XML, the files that describe its syntax, structure, and semantics are not. The DTD format is used to describe the syntax and structure of TMA DES, and the ISO 11179 format is used to define the semantics of TMA DES. However, XML Schema can be used in place of DTDs, and another XML encoded format, RDF, can be used in place of ISO 11179. Encoding all TMA DES data and metadata in XML would simplify the development and usage of programs which validate and parse TMA DES data. XML Schema has advantages over DTDs such as support for data types, and a more powerful means of specifying constraints on data values. An advantage of RDF encoded in XML over ISO 11179 is that XML defines rules for encoding data, whereas ISO 11179 does not. Materials and Methods: We created an XML Schema version of the TMA DES DTD. We wrote a program that converted ISO 11179 definitions to RDF encoded in XML, and used it to convert the TMA DES ISO 11179 definitions to RDF. Results: We validated a sample TMA DES XML file that was supplied with the publication that originally specified TMA DES using our XML Schema. We successfully validated the RDF produced by our ISO 11179 converter with the W3C RDF validation service. Conclusions: All TMA DES data could be encoded using XML, which simplifies its processing. XML Schema allows datatypes and valid value ranges to be specified for CDEs, which enables a wider range of error checking to be performed using XML Schemas than could be performed using DTDs.

  9. TABLE TENNIS CLUB

    CERN Multimedia

    TABLE TENNIS CLUB

    2010-01-01

    2010 CERN Table Tennis Tournament The CERN Table Tennis Club organizes its traditional CERN Table Tennis Tournament, at the Meyrin club, 2 rue de livron, in Meyrin, Saturday August 21st, in the afternoon. The tournament is open to all CERN staff, users, visitors and families, including of course summer students. See below for details. In order to register, simply send an E-mail to Jean-Pierre Revol (jean-pierre.revol@cern.ch). You can also download the registration form from the Club Web page (http://www.cern.ch/tabletennis), and send it via internal mail. Photo taken on August 22, 2009 showing some of the participants in the 2nd CERN Table Tennis tournament. INFORMATION ON CERN TABLE TENNIS CLUB CERN used to have a tradition of table tennis activities at CERN. For some reason, at the beginning of the 1980’s, the CERN Table Tennis club merged with the Meyrin Table Tennis club, a member of the Association Genevoise de Tennis de Table (AGTT). Therefore, if you want to practice table tennis, you...

  10. Research in Mobile Database Query Optimization and Processing

    Directory of Open Access Journals (Sweden)

    Agustinus Borgy Waluyo

    2005-01-01

    Full Text Available The emergence of mobile computing provides the ability to access information at any time and place. However, as mobile computing environments have inherent factors like power, storage, asymmetric communication cost, and bandwidth limitations, efficient query processing and minimum query response time are definitely of great interest. This survey groups a variety of query optimization and processing mechanisms in mobile databases into two main categories, namely: (i query processing strategy, and (ii caching management strategy. Query processing includes both pull and push operations (broadcast mechanisms. We further classify push operation into on-demand broadcast and periodic broadcast. Push operation (on-demand broadcast relates to designing techniques that enable the server to accommodate multiple requests so that the request can be processed efficiently. Push operation (periodic broadcast corresponds to data dissemination strategies. In this scheme, several techniques to improve the query performance by broadcasting data to a population of mobile users are described. A caching management strategy defines a number of methods for maintaining cached data items in clients' local storage. This strategy considers critical caching issues such as caching granularity, caching coherence strategy and caching replacement policy. Finally, this survey concludes with several open issues relating to mobile query optimization and processing strategy.

  11. Summarization of Text Document Using Query Dependent Parsing Techniques

    Science.gov (United States)

    Rokade, P. P.; Mrunal, Bewoor; Patil, S. H.

    2010-11-01

    World Wide Web is the largest source of information. Huge amount of data is present on the Web. There has been a great amount of work on query-independent summarization of documents. However, due to the success of Web search engines query-specific document summarization (query result snippets) has become an important problem. In this paper a method to create query specific summaries by identifying the most query-relevant fragments and combining them using the semantic associations within the document is discussed. In particular, first a structure is added to the documents in the preprocessing stage and converts them to document graphs. The present research work focuses on analytical study of different document clustering and summarization techniques currently the most research is focused on Query-Independent summarization. The main aim of this research work is to combine the both approaches of document clustering and query dependent summarization. This mainly includes applying different clustering algorithms on a text document. Create a weighted document graph of the resulting graph based on the keywords. And obtain the document graph to get the summary of the document. The performance of the summary using different clustering techniques will be analyzed and the optimal approach will be suggested.

  12. Efficient continuously moving top-k spatial keyword query processing

    DEFF Research Database (Denmark)

    Wu, Dingming; Yiu, Man Lung; Jensen, Christian S.

    2011-01-01

    Web users and content are increasingly being geo-positioned. This development gives prominence to spatial keyword queries, which involve both the locations and textual descriptions of content. We study the efficient processing of continuously moving top-k spatial keyword (MkSK) queries over spatial...... keyword data. State-of-the-art solutions for moving queries employ safe zones that guarantee the validity of reported results as long as the user remains within a zone. However, existing safe zone methods focus solely on spatial locations and ignore text relevancy. We propose two algorithms for computing...

  13. Intelligent query processing for semantic mediation of information systems

    Directory of Open Access Journals (Sweden)

    Saber Benharzallah

    2011-11-01

    Full Text Available We propose an intelligent and an efficient query processing approach for semantic mediation of information systems. We propose also a generic multi agent architecture that supports our approach. Our approach focuses on the exploitation of intelligent agents for query reformulation and the use of a new technology for the semantic representation. The algorithm is self-adapted to the changes of the environment, offers a wide aptitude and solves the various data conflicts in a dynamic way; it also reformulates the query using the schema mediation method for the discovered systems and the context mediation for the other systems.

  14. A new weighted fuzzy grammar on object oriented database queries

    Directory of Open Access Journals (Sweden)

    Ali Haroonabadi

    2012-08-01

    Full Text Available The fuzzy object oriented database model is often used to handle the existing imprecise and complicated objects for many real-world applications. The main focus of this paper is on fuzzy queries and tries to analyze a complicated and complex query to get more meaningful and closer responses. The method permits the user to provide the possibility of allocating the weight to various parts of the query, which makes it easier to follow better goals and return the target objects.

  15. jQuery 2.0 animation techniques beginner's guide

    CERN Document Server

    Culpepper, Adam

    2013-01-01

    This book is a guide to help you create attractive web page animations using jQuery. Written in a friendly and engaging approach this book is designed to be placed alongside your computer as a mentor.If you are a web designer or a frontend developer or if you want to learn how to animate the user interface of your web applications with jQuery, this book is for you. Experience with jQuery or Javascript would be helpful but solid knowledge base of HTML and CSS is assumed.

  16. Mercury- Distributed Metadata Management, Data Discovery and Access System

    Science.gov (United States)

    Palanisamy, Giri; Wilson, Bruce E.; Devarakonda, Ranjeet; Green, James M.

    2007-12-01

    Mercury is a federated metadata harvesting, search and retrieval tool based on both open source and ORNL- developed software. It was originally developed for NASA, and the Mercury development consortium now includes funding from NASA, USGS, and DOE. Mercury supports various metadata standards including XML, Z39.50, FGDC, Dublin-Core, Darwin-Core, EML, and ISO-19115 (under development). Mercury provides a single portal to information contained in disparate data management systems. It collects metadata and key data from contributing project servers distributed around the world and builds a centralized index. The Mercury search interfaces then allow the users to perform simple, fielded, spatial and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury supports various projects including: ORNL DAAC, NBII, DADDI, LBA, NARSTO, CDIAC, OCEAN, I3N, IAI, ESIP and ARM. The new Mercury system is based on a Service Oriented Architecture and supports various services such as Thesaurus Service, Gazetteer Web Service and UDDI Directory Services. This system also provides various search services including: RSS, Geo-RSS, OpenSearch, Web Services and Portlets. Other features include: Filtering and dynamic sorting of search results, book-markable search results, save, retrieve, and modify search criteria.

  17. Interoperable Solar Data and Metadata via LISIRD 3

    Science.gov (United States)

    Wilson, A.; Lindholm, D. M.; Pankratz, C. K.; Snow, M. A.; Woods, T. N.

    2015-12-01

    LISIRD 3 is a major upgrade of the LASP Interactive Solar Irradiance Data Center (LISIRD), which serves several dozen space based solar irradiance and related data products to the public. Through interactive plots, LISIRD 3 provides data browsing supported by data subsetting and aggregation. Incorporating a semantically enabled metadata repository, LISIRD 3 users see current, vetted, consistent information about the datasets offered. Users can now also search for datasets based on metadata fields such as dataset type and/or spectral or temporal range. This semantic database enables metadata browsing, so users can discover the relationships between datasets, instruments, spacecraft, mission and PI. The database also enables creation and publication of metadata records in a variety of formats, such as SPASE or ISO, making these datasets more discoverable. The database also enables the possibility of a public SPARQL endpoint, making the metadata browsable in an automated fashion. LISIRD 3's data access middleware, LaTiS, provides dynamic, on demand reformatting of data and timestamps, subsetting and aggregation, and other server side functionality via a RESTful OPeNDAP compliant API, enabling interoperability between LASP datasets and many common tools. LISIRD 3's templated front end design, coupled with the uniform data interface offered by LaTiS, allows easy integration of new datasets. Consequently the number and variety of datasets offered by LISIRD has grown to encompass several dozen, with many more to come. This poster will discuss design and implementation of LISIRD 3, including tools used, capabilities enabled, and issues encountered.

  18. Standard Reference Tables -

    Data.gov (United States)

    Department of Transportation — The Standard Reference Tables (SRT) provide consistent reference data for the various applications that support Flight Standards Service (AFS) business processes and...

  19. ALGORITMA RC4 DALAM PROTEKSI TRANSMISI DAN HASIL QUERY UNTUK ORDBMS POSTGRESQL

    Directory of Open Access Journals (Sweden)

    Yuri Ariyanto

    2009-01-01

    Full Text Available In this research will be worked through about how cryptography RC4's algorithm implementation in protection to query result and of query, security by encryption and descryption up to both is in network. Implementation of this research which is build software in client that function access databases that is placed by the side of server. Software that building to have facility for encryption and descryption query result and of query that is sent from client goes to server and. transmission query result and of query can secure its security. Well guaranted transmission security him of query result and of query can be told to succeed if success software can encryption query result and of query which transmission so that in the event of scanning to both, scanning will not understand data content. Conclusion of this research that is woke up software succeed encryption query and result of query which transmission between application of client and of server databases. Abstract in Bahasa Indonesia: Pada penelitian ini dibahas mengenai bagaimana mengimplementasikan algoritma kriptografi RC4 dalam proteksi terhadap query dan hasil query, pengamanan dilakukan dengan cara melakukan enkripsi dan dekripsi selama keduanya berada di dalam jaringan. Pengimplementasian dari penelitian ini yaitu membangun sebuah software yang akan diletakkan di sisi client yang berfungsi mengakses database yang diletakkan di sisi server. Software yang dibangun memiliki fasilitas untuk mengenkripsi dan mendektipsi query dan hasil query yang dikirimkan dari client ke server dan juga sebaliknya. Dengan demikian tramsmisi query dan hasil query dapat terjamin keamanannya.Terjaminnya keamanan transmisi query dan hasil query dapat dikatakan berhasil jika software berhasil mengenkripsi query dan hasil query yang ditransmisikan sehingga apabila terjadi penyadapan terhadap keduanya, penyadap tidak akan mengerti isi data tersebut. Kesimpulan dari penelitian ini yaitu software yang dibangun

  20. Development and Validation of Queries Using Structured Query Language (SQL) to Determine the Utilization of Comparison Imaging in Radiology Reports Stored on PACS

    OpenAIRE

    Lakhani, Paras; Menschik, Elliot D.; Goldszal, Alberto F.; Murray, Joseph P.; Weiner, Mark G.; Langlotz, Curtis P.

    2006-01-01

    The purpose of this research was to develop queries that quantify the utilization of comparison imaging in free-text radiology reports. The queries searched for common phrases that indicate whether comparison imaging was utilized, not available, or not mentioned. The queries were iteratively refined and tested on random samples of 100 reports with human review as a reference standard until the precision and recall of the queries did not improve significantly between iterations. Then, query ac...

  1. A Revisit of Query Expansion with Different Semantic Levels

    DEFF Research Database (Denmark)

    Zhang, Ce; Cui, Bin; Cong, Gao

    2009-01-01

    Query expansion has received extensive attention in information retrieval community. Although semantic based query expansion appears to be promising in improving retrieval performance, previous research has shown that it cannot consistently improve retrieval performance. It is a tricky problem...... to automatically determine whether to do query expansion for a given query. In this paper, we introduce Compact Concept Ontology (CCO) and provide users the option of exploring different semantic levels by using different CCOs. Experimental results show our approach is superior to previous work in many cases....... Additionally, we integrate the proposed methods into a text-based video search system (iVSearcher), to improve the user’s experience and retrieval performance significantly. To the best of our knowledge, this is the first system that integrates semantic information into video search and explores different...

  2. An introduction to XML query processing and keyword search

    CERN Document Server

    Lu, Jiaheng

    2013-01-01

    This book systematically and comprehensively covers the latest advances in XML data searching. It presents an extensive overview of the current query processing and keyword search techniques on XML data.

  3. Determinacy in Static Analysis of jQuery

    DEFF Research Database (Denmark)

    Andreasen, Esben; Møller, Anders

    2014-01-01

    Static analysis for JavaScript can potentially help programmers find errors early during development. Although much progress has been made on analysis techniques, a major obstacle is the prevalence of libraries, in particular jQuery, which apply programming patterns that have detrimental conseque......Static analysis for JavaScript can potentially help programmers find errors early during development. Although much progress has been made on analysis techniques, a major obstacle is the prevalence of libraries, in particular jQuery, which apply programming patterns that have detrimental...... present a static dataflow analysis for JavaScript that infers and exploits determinacy information on-the-fly, to enable analysis of some of the most complex parts of jQuery. The techniques are implemented in the TAJS analysis tool and evaluated on a collection of small programs that use jQuery. Our...

  4. Parasol: An Architecture for Cross-Cloud Federated Graph Querying

    Energy Technology Data Exchange (ETDEWEB)

    Lieberman, Michael; Choudhury, Sutanay; Hughes, Marisa; Patrone, Dennis; Hider, Sandy; Piatko, Christine; Chapman, Matthew; Marple, JP; Silberberg, David

    2014-06-22

    Large scale data fusion of multiple datasets can often provide in- sights that examining datasets individually cannot. However, when these datasets reside in different data centers and cannot be collocated due to technical, administrative, or policy barriers, a unique set of problems arise that hamper querying and data fusion. To ad- dress these problems, a system and architecture named Parasol is presented that enables federated queries over graph databases residing in multiple clouds. Parasol’s design is flexible and requires only minimal assumptions for participant clouds. Query optimization techniques are also described that are compatible with Parasol’s lightweight architecture. Experiments on a prototype implementation of Parasol indicate its suitability for cross-cloud federated graph queries.

  5. Human Cell and Tissue Establishment Registration Public Query

    Data.gov (United States)

    U.S. Department of Health & Human Services — This application provides Human Cell and Tissue registration information for registered, inactive, and pre-registered firms. Query options are by Establishment Name,...

  6. Nearest private query based on quantum oblivious key distribution

    Science.gov (United States)

    Xu, Min; Shi, Run-hua; Luo, Zhen-yu; Peng, Zhen-wan

    2017-12-01

    Nearest private query is a special private query which involves two parties, a user and a data owner, where the user has a private input (e.g., an integer) and the data owner has a private data set, and the user wants to query which element in the owner's private data set is the nearest to his input without revealing their respective private information. In this paper, we first present a quantum protocol for nearest private query, which is based on quantum oblivious key distribution (QOKD). Compared to the classical related protocols, our protocol has the advantages of the higher security and the better feasibility, so it has a better prospect of applications.

  7. External Data Structures for Shortest Path Queries on Planar Digraphs

    DEFF Research Database (Denmark)

    Arge, Lars; Toma, Laura

    2005-01-01

    In this paper we present space-query trade-offs for external memory data structures that answer shortest path queries on planar directed graphs. For any S = Ω(N 1 + ε) and S = O(N2/B), our main result is a family of structures that use S space and answer queries in O(N2/ S B) I/Os, thus obtaining...... optimal space-query product O(N2/B). An S space structure can be constructed in O(√S · sort(N)) I/Os, where sort(N) is the number of I/Os needed to sort N elements, B is the disk block size, and N is the size of the graph....

  8. Meta-Data Objects as the Basis for System Evolution

    CERN Document Server

    Estrella, Florida; Tóth, N; Kovács, Z; Le Goff, J M; Clatchey, Richard Mc; Toth, Norbert; Kovacs, Zsolt; Goff, Jean-Marie Le

    2001-01-01

    One of the main factors driving object-oriented software development in the Web- age is the need for systems to evolve as user requirements change. A crucial factor in the creation of adaptable systems dealing with changing requirements is the suitability of the underlying technology in allowing the evolution of the system. A reflective system utilizes an open architecture where implicit system aspects are reified to become explicit first-class (meta-data) objects. These implicit system aspects are often fundamental structures which are inaccessible and immutable, and their reification as meta-data objects can serve as the basis for changes and extensions to the system, making it self- describing. To address the evolvability issue, this paper proposes a reflective architecture based on two orthogonal abstractions - model abstraction and information abstraction. In this architecture the modeling abstractions allow for the separation of the description meta-data from the system aspects they represent so that th...

  9. Metadata for fine-grained processing at ATLAS

    CERN Document Server

    Cranshaw, Jack; The ATLAS collaboration

    2016-01-01

    High energy physics experiments are implementing highly parallel solutions for event processing on resources that support concurrency at multiple levels. These range from the inherent large-scale parallelism of HPC resources to the multiprocessing and multithreading needed for effective use of multi-core and GPU-augmented nodes. Such modes of processing, and the efficient opportunistic use of transiently-available resources, lead to finer-grained processing of event data. Previously metadata systems were tailored to jobs that were atomic and processed large, well-defined units of data. The new environment requires a more fine-grained approach to metadata handling, especially with regard to bookkeeping. For opportunistic resources metadata propagation needs to work even if individual jobs are not finalized. This contribution describes ATLAS solutions to this problem in the context of the multiprocessing framework currently in use for LHC Run 2, development underway for the ATLAS multithreaded framework (Athena...

  10. Statistical Data Processing with R – Metadata Driven Approach

    Directory of Open Access Journals (Sweden)

    Rudi SELJAK

    2016-06-01

    Full Text Available In recent years the Statistical Office of the Republic of Slovenia has put a lot of effort into re-designing its statistical process. We replaced the classical stove-pipe oriented production system with general software solutions, based on the metadata driven approach. This means that one general program code, which is parametrized with process metadata, is used for data processing for a particular survey. Currently, the general program code is entirely based on SAS macros, but in the future we would like to explore how successfully statistical software R can be used for this approach. Paper describes the metadata driven principle for data validation, generic software solution and main issues connected with the use of statistical software R for this approach.

  11. Multiresolution Cube Estimators for Sensor Network Aggregate Queries

    OpenAIRE

    Meliou, Alexandra; Guestrin, Carlos; Hellerstein, Joseph M.

    2010-01-01

    In this work we present in-network techniques to improve the efficiency of spatial aggregate queries. Such queries are very common in a sensornet setting, demanding more targeted techniques for their handling. Our approach constructs and maintains multi-resolution cube hierarchies inside the network, which can be constructed in a distributed fashion. In case of failures, recovery can also be performed with in-network decisions. In this paper we demonstrate how in-network cube hierarchies can ...

  12. The retrieval effectiveness of search engines on navigational queries

    OpenAIRE

    Lewandowski, Dirk

    2011-01-01

    Purpose - To test major Web search engines on their performance on navigational queries, i.e. searches for homepages. Design/methodology/approach - 100 real user queries are posed to six search engines (Google, Yahoo, MSN, Ask, Seekport, and Exalead). Users described the desired pages, and the results position of these is recorded. Measured success N and mean reciprocal rank are calculated. Findings - Performance of the major search engines Google, Yahoo, and MSN is best, with around 90 perce...

  13. A distributed query execution engine of big attributed graphs.

    Science.gov (United States)

    Batarfi, Omar; Elshawi, Radwa; Fayoumi, Ayman; Barnawi, Ahmed; Sakr, Sherif

    2016-01-01

    A graph is a popular data model that has become pervasively used for modeling structural relationships between objects. In practice, in many real-world graphs, the graph vertices and edges need to be associated with descriptive attributes. Such type of graphs are referred to as attributed graphs. G-SPARQL has been proposed as an expressive language, with a centralized execution engine, for querying attributed graphs. G-SPARQL supports various types of graph querying operations including reachability, pattern matching and shortest path where any G-SPARQL query may include value-based predicates on the descriptive information (attributes) of the graph edges/vertices in addition to the structural predicates. In general, a main limitation of centralized systems is that their vertical scalability is always restricted by the physical limits of computer systems. This article describes the design, implementation in addition to the performance evaluation of DG-SPARQL, a distributed, hybrid and adaptive parallel execution engine of G-SPARQL queries. In this engine, the topology of the graph is distributed over the main memory of the underlying nodes while the graph data are maintained in a relational store which is replicated on the disk of each of the underlying nodes. DG-SPARQL evaluates parts of the query plan via SQL queries which are pushed to the underlying relational stores while other parts of the query plan, as necessary, are evaluated via indexless memory-based graph traversal algorithms. Our experimental evaluation shows the efficiency and the scalability of DG-SPARQL on querying massive attributed graph datasets in addition to its ability to outperform the performance of Apache Giraph, a popular distributed graph processing system, by orders of magnitudes.

  14. Two Dimensional Range Minimum Queries and Fibonacci Lattices

    DEFF Research Database (Denmark)

    Brodal, Gerth Stølting; Davoodi, Pooya; Lewenstein, Moshe

    2012-01-01

    technique—the discrepancy properties of Fibonacci lattices—we give an indexing data structure for 2D-RMQs that uses O(N/c) bits additional space with O(clogc(loglogc)2) query time, for any parameter c, 4 ≤ c ≤ N. Also, when the entries of the input matrix are from {0,1}, we show that the query time can...

  15. Inductive queries for a drug designing robot scientist

    OpenAIRE

    King, Ross D.; Schierz, Amanda C.; Clare, Amanda; Rowland, Jem; Sparkes, Andrew; Nijssen, Siegfried; Ramon, Jan

    2010-01-01

    It is increasingly clear that machine learning algorithms need to be integrated in an iterative scientific discovery loop, in which data is queried repeatedly by means of inductive queries and where the computer provides guidance to the experiments that are being performed. In this chapter, we summarise several key challenges in achieving this integration of machine learning and data mining algorithms in methods for the discovery of Quantitative Structure Activity Relationships (QSARs). We in...

  16. In Interactive, Web-Based Approach to Metadata Authoring

    Science.gov (United States)

    Pollack, Janine; Wharton, Stephen W. (Technical Monitor)

    2001-01-01

    NASA's Global Change Master Directory (GCMD) serves a growing number of users by assisting the scientific community in the discovery of and linkage to Earth science data sets and related services. The GCMD holds over 8000 data set descriptions in Directory Interchange Format (DIF) and 200 data service descriptions in Service Entry Resource Format (SERF), encompassing the disciplines of geology, hydrology, oceanography, meteorology, and ecology. Data descriptions also contain geographic coverage information, thus allowing researchers to discover data pertaining to a particular geographic location, as well as subject of interest. The GCMD strives to be the preeminent data locator for world-wide directory level metadata. In this vein, scientists and data providers must have access to intuitive and efficient metadata authoring tools. Existing GCMD tools are not currently attracting. widespread usage. With usage being the prime indicator of utility, it has become apparent that current tools must be improved. As a result, the GCMD has released a new suite of web-based authoring tools that enable a user to create new data and service entries, as well as modify existing data entries. With these tools, a more interactive approach to metadata authoring is taken, as they feature a visual "checklist" of data/service fields that automatically update when a field is completed. In this way, the user can quickly gauge which of the required and optional fields have not been populated. With the release of these tools, the Earth science community will be further assisted in efficiently creating quality data and services metadata. Keywords: metadata, Earth science, metadata authoring tools

  17. Using URIs to effectively transmit sensor data and metadata

    Science.gov (United States)

    Kokkinaki, Alexandra; Buck, Justin; Darroch, Louise; Gardner, Thomas

    2017-04-01

    Autonomous ocean observation is massively increasing the number of sensors in the ocean. Accordingly, the continuing increase in datasets produced, makes selecting sensors that are fit for purpose a growing challenge. Decision making on selecting quality sensor data, is based on the sensor's metadata, i.e. manufacturer specifications, history of calibrations etc. The Open Geospatial Consortium (OGC) has developed the Sensor Web Enablement (SWE) standards to facilitate integration and interoperability of sensor data and metadata. The World Wide Web Consortium (W3C) Semantic Web technologies enable machine comprehensibility promoting sophisticated linking and processing of data published on the web. Linking the sensor's data and metadata according to the above-mentioned standards can yield practical difficulties, because of internal hardware bandwidth restrictions and a requirement to constrain data transmission costs. Our approach addresses these practical difficulties by uniquely identifying sensor and platform models and instances through URIs, which resolve via content negotiation to either OGC's sensor meta language, sensorML or W3C's Linked Data. Data transmitted by a sensor incorporate the sensor's unique URI to refer to its metadata. Sensor and platform model URIs and descriptions are created and hosted by the British Oceanographic Data Centre (BODC) linked systems service. The sensor owner creates the sensor and platform instance URIs prior and during sensor deployment, through an updatable web form, the Sensor Instance Form (SIF). SIF enables model and instance URI association but also platform and sensor linking. The use of URIs, which are dynamically generated through the SIF, offers both practical and economical benefits to the implementation of SWE and Linked Data standards in near real time systems. Data can be linked to metadata dynamically in-situ while saving on the costs associated to the transmission of long metadata descriptions. The transmission

  18. Representation and alignment of sung queries for music information retrieval

    Science.gov (United States)

    Adams, Norman H.; Wakefield, Gregory H.

    2005-09-01

    The pursuit of robust and rapid query-by-humming systems, which search melodic databases using sung queries, is a common theme in music information retrieval. The retrieval aspect of this database problem has received considerable attention, whereas the front-end processing of sung queries and the data structure to represent melodies has been based on musical intuition and historical momentum. The present work explores three time series representations for sung queries: a sequence of notes, a ``smooth'' pitch contour, and a sequence of pitch histograms. The performance of the three representations is compared using a collection of naturally sung queries. It is found that the most robust performance is achieved by the representation with highest dimension, the smooth pitch contour, but that this representation presents a formidable computational burden. For all three representations, it is necessary to align the query and target in order to achieve robust performance. The computational cost of the alignment is quadratic, hence it is necessary to keep the dimension small for rapid retrieval. Accordingly, iterative deepening is employed to achieve both robust performance and rapid retrieval. Finally, the conventional iterative framework is expanded to adapt the alignment constraints based on previous iterations, further expediting retrieval without degrading performance.

  19. A comparison of peer-to-peer query response modes

    CERN Document Server

    Hoschek, W

    2002-01-01

    In a large distributed system spanning many administrative domains such as a Grid (Foster et al., 2001), it is desirable to maintain and query dynamic and timely information about active participants such as services, resources and user communities. However, in such a database system, the set of information tuples in the universe is partitioned over one or more distributed nodes, for reasons including autonomy, scalability, availability, performance and security. This suggests the use of peer-to-peer (P2P) query technology. A variety of query response modes can be used to return matching query results from P2P nodes to an originator. Although from the functional perspective all response modes are equivalent, no mode is optimal under all circumstances. Which query response modes allow to express suitable trade-offs for a wide range ofP2P application? We answer this question by systematically describing and characterizing four query response modes for the unified peer-to-peer database framework (UPDF) proposed ...

  20. Facilitating the production of ISO-compliant metadata of geospatial datasets

    Science.gov (United States)

    Giuliani, Gregory; Guigoz, Yaniss; Lacroix, Pierre; Ray, Nicolas; Lehmann, Anthony

    2016-02-01

    Metadata are recognized as an essential element to enable efficient and effective discovery of geospatial data published in spatial data infrastructures (SDI). However, metadata production is still perceived as a complex, tedious and time-consuming task. This typically results in little metadata production and can seriously hinder the objective of facilitating data discovery. In response to this issue, this paper presents a proof of concept based on an interoperable workflow between a data publication server and a metadata catalog to automatically generate ISO-compliant metadata. The proposed approach facilitates metadata creation by embedding this task in daily data management workflows; ensures that data and metadata are permanently up-to-date; significantly reduces the obstacles of metadata production; and potentially facilitates contributions to initiatives like the Global Earth Observation System of Systems (GEOSS) by making geospatial resources discoverable.

  1. Linked data for libraries, archives and museums how to clean, link and publish your metadata

    CERN Document Server

    Hooland, Seth van

    2014-01-01

    This highly practical handbook teaches you how to unlock the value of your existing metadata through cleaning, reconciliation, enrichment and linking and how to streamline the process of new metadata creation. Libraries, archives and museums are facing up to the challenge of providing access to fast growing collections whilst managing cuts to budgets. Key to this is the creation, linking and publishing of good quality metadata as Linked Data that will allow their collections to be discovered, accessed and disseminated in a sustainable manner. This highly practical handbook teaches you how to unlock the value of your existing metadata through cleaning, reconciliation, enrichment and linking and how to streamline the process of new metadata creation. Metadata experts Seth van Hooland and Ruben Verborgh introduce the key concepts of metadata standards and Linked Data and how they can be practically applied to existing metadata, giving readers the tools and understanding to achieve maximum results with limited re...

  2. A novel framework for assessing metadata quality in epidemiological and public health research settings.

    Science.gov (United States)

    McMahon, Christiana; Denaxas, Spiros

    2016-01-01

    Metadata are critical in epidemiological and public health research. However, a lack of biomedical metadata quality frameworks and limited awareness of the implications of poor quality metadata renders data analyses problematic. In this study, we created and evaluated a novel framework to assess metadata quality of epidemiological and public health research datasets. We performed a literature review and surveyed stakeholders to enhance our understanding of biomedical metadata quality assessment. The review identified 11 studies and nine quality dimensions; none of which were specifically aimed at biomedical metadata. 96 individuals completed the survey; of those who submitted data, most only assessed metadata quality sometimes, and eight did not at all. Our framework has four sections: a) general information; b) tools and technologies; c) usability; and d) management and curation. We evaluated the framework using three test cases and sought expert feedback. The framework can assess biomedical metadata quality systematically and robustly.

  3. The Living Periodic Table

    Science.gov (United States)

    Nahlik, Mary Schrodt

    2005-01-01

    To help make the abstract world of chemistry more concrete eighth-grade students, the author has them create a living periodic table that can be displayed in the classroom or hallway. This display includes information about the elements arranged in the traditional periodic table format, but also includes visual real-world representations of the…

  4. Table Tennis Club

    CERN Multimedia

    Table Tennis Club

    2011-01-01

    CERN Table Tennis Tournament Saturday 20th August 2011 at 13.30 at the CERN/Meyrin TT club (underneath the Piscine de Livron, rue de Livron 2, 1217 Meyrin) Details: http://cern.ch/club-TableTennis Registration: jean-pierre.revol@cern.ch Open to all CERN staff, visitors, summer students, and families

  5. Multiple Depth DB Tables Indexing on the Sphere

    Directory of Open Access Journals (Sweden)

    Luciano Nicastro

    2010-01-01

    Full Text Available Any project dealing with large astronomical datasets should consider the use of a relational database server (RDBS. Queries requiring quick selections on sky regions, objects cross-matching and other high-level data investigations involving sky coordinates could be unfeasible if tables are missing an effective indexing scheme. In this paper we present the Dynamic Index Facility (DIF software package. By using the HTM and HEALPix sky pixelization schema, it allows a very efficient indexing and management of spherical data stored into MySQL tables. Any table hosting spherical coordinates can be automatically managed by DIF using any number of sky resolutions at the same time. DIF comprises a set of facilities among which SQL callable functions to perform queries on circular and rectangular regions. Moreover, by removing the limitations and difficulties of 2-d data indexing, DIF allows the full exploitation of the RDBS capabilities. Performance tests on Giga-entries tables are reported together with some practical usage of the package.

  6. Getting started with tables.

    Science.gov (United States)

    Inskip, Hazel; Ntani, Georgia; Westbury, Leo; Di Gravio, Chiara; D'Angelo, Stefania; Parsons, Camille; Baird, Janis

    2017-01-01

    Tables are often overlooked by many readers of papers who tend to focus on the text. Good tables tell much of the story of a paper and give a richer insight into the details of the study participants and the main research findings. Being confident in reading tables and constructing clear tables are important skills for researchers to master. Common forms of tables were considered, along with the standard statistics used in them. Papers in the Archives of Public Health published during 2015 and 2016 were hand-searched for examples to illustrate the points being made. Presentation of graphs and figures were not considered as they are outside the scope of the paper. Basic statistical concepts are outlined to aid understanding of each of the tables presented. The first table in many papers gives an overview of the study population and its characteristics, usually giving numbers and percentages of the study population in different categories (e.g. by sex, educational attainment, smoking status) and summaries of measured characteristics (continuous variables) of the participants (e.g. age, height, body mass index). Tables giving the results of the analyses follow; these often include summaries of characteristics in different groups of participants, as well as relationships between the outcome under study and the exposure of interest. For continuous outcome data, results are often expressed as differences between means, or regression or correlation coefficients. Ratio/relative measures (e.g. relative risks, odds ratios) are usually used for binary outcome measures that take one of two values for each study participants (e.g. dead versus alive, obese versus non-obese). Tables come in many forms, but various standard types are described here. Clear tables provide much of the important detail in a paper and researchers are encouraged to read and construct them with care.

  7. The Effects of Normalisation of the Satisfaction of Novice End-User Querying Databases

    Directory of Open Access Journals (Sweden)

    Conrad Benedict

    1997-05-01

    Full Text Available This paper reports the results of an experiment that investigated the effects different structural characteristics of relational databases have on information satisfaction of end-users querying databases. The results show that unnormalised tables adversely affect end-user satisfaction. The adverse affect on end-user satisfaction is attributable primarily to the use of non atomic data. In this study, the affect on end user satisfaction of repeating fields was not significant. The study contributes to the further development of theories of individual adjustment to information technology in the workplace by alerting organisations and, in particular, database designers to the ways in which the structural characteristics of relational databases may affect end-user satisfaction. More importantly, the results suggest that database designers need to clearly identify the domains for each item appearing in their databases. These issues are of increasing importance because of the growth in the amount of data available to end-users in relational databases.

  8. A Window to the World: Lessons Learned from NASA's Collaborative Metadata Curation Effort

    Science.gov (United States)

    Bugbee, K.; Dixon, V.; Baynes, K.; Shum, D.; le Roux, J.; Ramachandran, R.

    2017-12-01

    Well written descriptive metadata adds value to data by making data easier to discover as well as increases the use of data by providing the context or appropriateness of use. While many data centers acknowledge the importance of correct, consistent and complete metadata, allocating resources to curate existing metadata is often difficult. To lower resource costs, many data centers seek guidance on best practices for curating metadata but struggle to identify those recommendations. In order to assist data centers in curating metadata and to also develop best practices for creating and maintaining metadata, NASA has formed a collaborative effort to improve the Earth Observing System Data and Information System (EOSDIS) metadata in the Common Metadata Repository (CMR). This effort has taken significant steps in building consensus around metadata curation best practices. However, this effort has also revealed gaps in EOSDIS enterprise policies and procedures within the core metadata curation task. This presentation will explore the mechanisms used for building consensus on metadata curation, the gaps identified in policies and procedures, the lessons learned from collaborating with both the data centers and metadata curation teams, and the proposed next steps for the future.

  9. BioMart – biological queries made easy

    Directory of Open Access Journals (Sweden)

    Thorisson Gudmundur

    2009-01-01

    Full Text Available Abstract Background Biologists need to perform complex queries, often across a variety of databases. Typically, each data resource provides an advanced query interface, each of which must be learnt by the biologist before they can begin to query them. Frequently, more than one data source is required and for high-throughput analysis, cutting and pasting results between websites is certainly very time consuming. Therefore, many groups rely on local bioinformatics support to process queries by accessing the resource's programmatic interfaces if they exist. This is not an efficient solution in terms of cost and time. Instead, it would be better if the biologist only had to learn one generic interface. BioMart provides such a solution. Results BioMart enables scientists to perform advanced querying of biological data sources through a single web interface. The power of the system comes from integrated querying of data sources regardless of their geographical locations. Once these queries have been defined, they may be automated with its "scripting at the click of a button" functionality. BioMart's capabilities are extended by integration with several widely used software packages such as BioConductor, DAS, Galaxy, Cytoscape, Taverna. In this paper, we describe all aspects of BioMart from a user's perspective and demonstrate how it can be used to solve real biological use cases such as SNP selection for candidate gene screening or annotation of microarray results. Conclusion BioMart is an easy to use, generic and scalable system and therefore, has become an integral part of large data resources including Ensembl, UniProt, HapMap, Wormbase, Gramene, Dictybase, PRIDE, MSD and Reactome. BioMart is freely accessible to use at http://www.biomart.org.

  10. Competence Based Educational Metadata for Supporting Lifelong Competence Development Programmes

    NARCIS (Netherlands)

    Sampson, Demetrios; Fytros, Demetrios

    2008-01-01

    Sampson, D., & Fytros, D. (2008). Competence Based Educational Metadata for Supporting Lifelong Competence Development Programmes. In P. Diaz, Kinshuk, I. Aedo & E. Mora (Eds.), Proceedings of the 8th IEEE International Conference on Advanced Learning Technologies (ICALT 2008), pp. 288-292. July,

  11. The evolution of chondrichthyan research through a metadata ...

    African Journals Online (AJOL)

    We compiled metadata from Sharks Down Under (1991) and the two Sharks International conferences (2010 and 2014), spanning 23 years. Analysis of the data highlighted taxonomic biases towards charismatic species, a declining number of studies in fundamental science such as those related to taxonomy and basic life ...

  12. Standardizing metadata and taxonomic identification in metabarcoding studies

    NARCIS (Netherlands)

    Tedersoo, Leho; Ramirez, Kelly; Nilsson, R; Kaljuvee, Aivi; Koljalg, Urmas; Abarenkov, Kessy

    2015-01-01

    High-throughput sequencing-based metabarcoding studies produce vast amounts of ecological data, but a lack of consensus on standardization of metadata and how to refer to the species recovered severely hampers reanalysis and comparisons among studies. Here we propose an automated workflow covering

  13. Metadata for data rescue and data at risk

    Science.gov (United States)

    Anderson, William L.; Faundeen, John L.; Greenberg, Jane; Taylor, Fraser

    2011-01-01

    Scientific data age, become stale, fall into disuse and run tremendous risks of being forgotten and lost. These problems can be addressed by archiving and managing scientific data over time, and establishing practices that facilitate data discovery and reuse. Metadata documentation is integral to this work and essential for measuring and assessing high priority data preservation cases. The International Council for Science: Committee on Data for Science and Technology (CODATA) has a newly appointed Data-at-Risk Task Group (DARTG), participating in the general arena of rescuing data. The DARTG primary objective is building an inventory of scientific data that are at risk of being lost forever. As part of this effort, the DARTG is testing an approach for documenting endangered datasets. The DARTG is developing a minimal and easy to use set of metadata properties for sufficiently describing endangered data, which will aid global data rescue missions. The DARTG metadata framework supports rapid capture, and easy documentation, across an array of scientific domains. This paper reports on the goals and principles supporting the DARTG metadata schema, and provides a description of the preliminary implementation.

  14. A metadata schema for data objects in clinical research.

    Science.gov (United States)

    Canham, Steve; Ohmann, Christian

    2016-11-24

    A large number of stakeholders have accepted the need for greater transparency in clinical research and, in the context of various initiatives and systems, have developed a diverse and expanding number of repositories for storing the data and documents created by clinical studies (collectively known as data objects). To make the best use of such resources, we assert that it is also necessary for stakeholders to agree and deploy a simple, consistent metadata scheme. The relevant data objects and their likely storage are described, and the requirements for metadata to support data sharing in clinical research are identified. Issues concerning persistent identifiers, for both studies and data objects, are explored. A scheme is proposed that is based on the DataCite standard, with extensions to cover the needs of clinical researchers, specifically to provide (a) study identification data, including links to clinical trial registries; (b) data object characteristics and identifiers; and (c) data covering location, ownership and access to the data object. The components of the metadata scheme are described. The metadata schema is proposed as a natural extension of a widely agreed standard to fill a gap not tackled by other standards related to clinical research (e.g., Clinical Data Interchange Standards Consortium, Biomedical Research Integrated Domain Group). The proposal could be integrated with, but is not dependent on, other moves to better structure data in clinical research.

  15. Training and Best Practice Guidelines: Implications for Metadata Creation

    Science.gov (United States)

    Chuttur, Mohammad Y.

    2012-01-01

    In response to the rapid development of digital libraries over the past decade, researchers have focused on the use of metadata as an effective means to support resource discovery within online repositories. With the increasing involvement of libraries in digitization projects and the growing number of institutional repositories, it is anticipated…

  16. Metadata Schema Used in OCLC Sampled Web Pages

    Directory of Open Access Journals (Sweden)

    Fei Yu

    2005-12-01

    Full Text Available The tremendous growth of Web resources has made information organization and retrieval more and more difficult. As one approach to this problem, metadata schemas have been developed to characterize Web resources. However, many questions have been raised about the use of metadata schemas such as which metadata schemas have been used on the Web? How did they describe Web accessible information? What is the distribution of these metadata schemas among Web pages? Do certain schemas dominate the others? To address these issues, this study analyzed 16,383 Web pages with meta tags extracted from 200,000 OCLC sampled Web pages in 2000. It found that only 8.19% Web pages used meta tags; description tags, keyword tags, and Dublin Core tags were the only three schemas used in the Web pages. This article revealed the use of meta tags in terms of their function distribution, syntax characteristics, granularity of the Web pages, and the length distribution and word number distribution of both description and keywords tags.

  17. Transforming and enhancing metadata for enduser discovery: a case study

    Directory of Open Access Journals (Sweden)

    Edward M. Corrado

    2014-05-01

    The Libraries’ workflow and portions of code will be shared; issues and challenges involved will be discussed. While this case study is specific to Binghamton University Libraries, examples of strategies used at other institutions will also be introduced. This paper should be useful to anyone interested in describing large quantities of photographs or other materials with preexisting embedded metadata.

  18. Aspect oriented implementation of design patterns using metadata ...

    African Journals Online (AJOL)

    Aspect oriented programming extends object oriented programming by managing crosscutting concerns using aspects. Two of the most important critics of aspect oriented programming are the “tyranny of the dominant signature” and lack of visibility of program's flow. Metadata, in form of Java annotations, is a solution to ...

  19. Big Earth Data Initiative: Metadata Improvement: Case Studies

    Science.gov (United States)

    Kozimor, John; Habermann, Ted; Farley, John

    2016-01-01

    Big Earth Data Initiative (BEDI) The Big Earth Data Initiative (BEDI) invests in standardizing and optimizing the collection, management and delivery of U.S. Government's civil Earth observation data to improve discovery, access use, and understanding of Earth observations by the broader user community. Complete and consistent standard metadata helps address all three goals.

  20. Elementary Statistics Tables

    CERN Document Server

    Neave, Henry R

    2012-01-01

    This book, designed for students taking a basic introductory course in statistical analysis, is far more than just a book of tables. Each table is accompanied by a careful but concise explanation and useful worked examples. Requiring little mathematical background, Elementary Statistics Tables is thus not just a reference book but a positive and user-friendly teaching and learning aid. The new edition contains a new and comprehensive "teach-yourself" section on a simple but powerful approach, now well-known in parts of industry but less so in academia, to analysing and interpreting process dat