WorldWideScience

Sample records for twenty million queries

  1. Northeastern University in TREC 2009. Million Query Track

    Science.gov (United States)

    2009-11-01

    queries in the collection had at least one click on documents in the .gov collection. Given the computational complexity of feature extraction and...ranking functions (by employing SVM) over two different collections, (a) the Million Query 2008 (MQ08) collection (GOV2 corpus and queries with at least one ... click on doc- uments in the .gov domain), and (b) a Bing generated collection (described in Section 2.1) and employed the constructed ranking

  2. Twenty-million-year relationship between mammalian diversity and primary productivity

    Science.gov (United States)

    Fritz, Susanne A.; Eronen, Jussi T.; Schnitzler, Jan; Hof, Christian; Janis, Christine M.; Mulch, Andreas; Böhning-Gaese, Katrin; Graham, Catherine H.

    2016-09-01

    At global and regional scales, primary productivity strongly correlates with richness patterns of extant animals across space, suggesting that resource availability and climatic conditions drive patterns of diversity. However, the existence and consistency of such diversity-productivity relationships through geological history is unclear. Here we provide a comprehensive quantitative test of the diversity-productivity relationship for terrestrial large mammals through time across broad temporal and spatial scales. We combine >14,000 occurrences for 690 fossil genera through the Neogene (23-1.8 Mya) with regional estimates of primary productivity from fossil plant communities in North America and Europe. We show a significant positive diversity-productivity relationship through the 20-million-year record, providing evidence on unprecedented spatial and temporal scales that this relationship is a general pattern in the ecology and paleo-ecology of our planet. Further, we discover that genus richness today does not match the fossil relationship, suggesting that a combination of human impacts and Pleistocene climate variability has modified the 20-million-year ecological relationship by strongly reducing primary productivity and driving many mammalian species into decline or to extinction.

  3. Twenty-million-year relationship between mammalian diversity and primary productivity.

    Science.gov (United States)

    Fritz, Susanne A; Eronen, Jussi T; Schnitzler, Jan; Hof, Christian; Janis, Christine M; Mulch, Andreas; Böhning-Gaese, Katrin; Graham, Catherine H

    2016-09-27

    At global and regional scales, primary productivity strongly correlates with richness patterns of extant animals across space, suggesting that resource availability and climatic conditions drive patterns of diversity. However, the existence and consistency of such diversity-productivity relationships through geological history is unclear. Here we provide a comprehensive quantitative test of the diversity-productivity relationship for terrestrial large mammals through time across broad temporal and spatial scales. We combine >14,000 occurrences for 690 fossil genera through the Neogene (23-1.8 Mya) with regional estimates of primary productivity from fossil plant communities in North America and Europe. We show a significant positive diversity-productivity relationship through the 20-million-year record, providing evidence on unprecedented spatial and temporal scales that this relationship is a general pattern in the ecology and paleo-ecology of our planet. Further, we discover that genus richness today does not match the fossil relationship, suggesting that a combination of human impacts and Pleistocene climate variability has modified the 20-million-year ecological relationship by strongly reducing primary productivity and driving many mammalian species into decline or to extinction.

  4. Superfund Query

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Superfund Query allows users to retrieve data from the Comprehensive Environmental Response, Compensation, and Liability Information System (CERCLIS) database.

  5. Query responses

    Directory of Open Access Journals (Sweden)

    Paweł Łupkowski

    2017-05-01

    Full Text Available In this article we consider the phenomenon of answering a query with a query. Although such answers are common, no large scale, corpus-based characterization exists, with the exception of clarification requests. After briefly reviewing different theoretical approaches on this subject, we present a corpus study of query responses in the British National Corpus and develop a taxonomy for query responses. We point at a variety of response categories that have not been formalized in previous dialogue work, particularly those relevant to adversarial interaction. We show that different response categories have significantly different rates of subsequent answer provision. We provide a formal analysis of the response categories in the framework of KoS.

  6. Evaluating Multidimensional Queries by Diamond Dicing

    CERN Document Server

    Webb, Hazel; Lemire, Daniel

    2010-01-01

    Queries that constrain multiple dimensions simultaneously are difficult to express and compute efficiently in both Structured Query Language (SQL) and multidimensional languages. We introduce the diamond cube operator to facilitate the expression of one such class of multidimensional query. We have developed, implemented and tested algorithms to compute diamonds on both real and synthetic large data sets. We show that our custom implementation is more than twenty-five times faster, on a large data set, than popular database engines.

  7. Querying and Manipulating Temporal Databases

    Directory of Open Access Journals (Sweden)

    Mohamed Mkaouar

    2011-03-01

    Full Text Available Many works have focused, for over twenty five years, on the integration of the time dimension indatabases (DB. However, the standard SQL3 does not yet allow easy definition, manipulation andquerying of temporal DBs. In this paper, we study how we can simplify querying and manipulatingtemporal facts in SQL3, using a model that integrates time in a native manner. To do this, we proposenew keywords and syntax to define different temporal versions for many relational operators andfunctions used in SQL. It then becomes possible to perform various queries and updates appropriate totemporal facts. We illustrate the use of these proposals on many examples from a real application.

  8. Approximate dictionary queries

    DEFF Research Database (Denmark)

    Brodal, Gerth Stølting; Gasieniec, Leszek

    1996-01-01

    Given a set of n binary strings of length m each. We consider the problem of answering d-queries. Given a binary query string of length m, a d-query is to report if there exists a string in the set within Hamming distance d of . We present a data structure of size O(nm) supporting 1-queries in ti...

  9. Optimizing Temporal Queries

    DEFF Research Database (Denmark)

    Toman, David; Bowman, Ivan Thomas

    2003-01-01

    Recent research in the area of temporal databases has proposed a number of query languages that vary in their expressive power and the semantics they provide to users. These query languages represent a spectrum of solutions to the tension between clean semantics and efficient evaluation. Often......, these query languages are implemented by translating temporal queries into standard relational queries. However, the compiled queries are often quite cumbersome and expensive to execute even using state-of-the-art relational products. This paper presents an optimization technique that produces more efficient...... translated SQL queries by taking into account the properties of the encoding used for temporal attributes. For concreteness, this translation technique is presented in the context of SQL/TP; however, these techniques are also applicable to other temporal query languages....

  10. Optimizing Temporal Queries

    DEFF Research Database (Denmark)

    Toman, David; Bowman, Ivan Thomas

    2003-01-01

    translated SQL queries by taking into account the properties of the encoding used for temporal attributes. For concreteness, this translation technique is presented in the context of SQL/TP; however, these techniques are also applicable to other temporal query languages......., these query languages are implemented by translating temporal queries into standard relational queries. However, the compiled queries are often quite cumbersome and expensive to execute even using state-of-the-art relational products. This paper presents an optimization technique that produces more efficient......Recent research in the area of temporal databases has proposed a number of query languages that vary in their expressive power and the semantics they provide to users. These query languages represent a spectrum of solutions to the tension between clean semantics and efficient evaluation. Often...

  11. Path-based Queries on Trajectory Data

    DEFF Research Database (Denmark)

    Krogh, Benjamin Bjerre; Pelekis, Nikos; Theodoridis, Yannis

    2014-01-01

    a specific path by only retrieving data from the first and last edge in the path. To correctly answer strict path queries existing network-constrained trajectory indexes must retrieve data from all edges in the path. An extensive performance study of NETTRA using a very large real-world trajectory data set....... To efficiently support strict path queries, we present a novel NETwork-constrained TRAjectory index (NETTRA). This index enables very efficient retrieval of trajectories that follow a specific path, i.e., strict path queries. NETTRA uses a new path encoding scheme that can determine if a trajectory follows......, consisting of 1.7 million trajectories (941 million GPS records) and a road network with 1.3 million edges, shows a speed-up of two orders of magnitude compared to state-of-the-art trajectory indexes....

  12. Efficient Query Rewrite for Structured Web Queries

    CERN Document Server

    Gollapudi, Sreenivas; Ntoulas, Alexandros; Paparizos, Stelios

    2011-01-01

    Web search engines and specialized online verticals are increasingly incorporating results from structured data sources to answer semantically rich user queries. For example, the query \\WebQuery{Samsung 50 inch led tv} can be answered using information from a table of television data. However, the users are not domain experts and quite often enter values that do not match precisely the underlying data. Samsung makes 46- or 55- inch led tvs, but not 50-inch ones. So a literal execution of the above mentioned query will return zero results. For optimal user experience, a search engine would prefer to return at least a minimum number of results as close to the original query as possible. Furthermore, due to typical fast retrieval speeds in web-search, a search engine query execution is time-bound. In this paper, we address these challenges by proposing algorithms that rewrite the user query in a principled manner, surfacing at least the required number of results while satisfying the low-latency constraint. We f...

  13. Learning semantic query suggestions

    NARCIS (Netherlands)

    E. Meij; M. Bron; L. Hollink; B. Huurnink; M. de Rijke

    2009-01-01

    An important application of semantic web technology is recognizing human-defined concepts in text. Query transformation is a strategy often used in search engines to derive queries that are able to return more useful search results than the original query and most popular search engines provide faci

  14. Spatio-temporal databases complex motion pattern queries

    CERN Document Server

    Vieira, Marcos R

    2013-01-01

    This brief presents several new query processing techniques, called complex motion pattern queries, specifically designed for very large spatio-temporal databases of moving objects. The brief begins with the definition of flexible pattern queries, which are powerful because of the integration of variables and motion patterns. This is followed by a summary of the expressive power of patterns and flexibility of pattern queries. The brief then present the Spatio-Temporal Pattern System (STPS) and density-based pattern queries. STPS databases contain millions of records with information about mobi

  15. Collective spatial keyword querying

    DEFF Research Database (Denmark)

    Cao, Xin; Cong, Gao; Jensen, Christian S.;

    2011-01-01

    With the proliferation of geo-positioning and geo-tagging, spatial web objects that possess both a geographical location and a textual description are gaining in prevalence, and spatial keyword queries that exploit both location and textual description are gaining in prominence. However......, the queries studied so far generally focus on finding individual objects that each satisfy a query rather than finding groups of objects where the objects in a group collectively satisfy a query. We define the problem of retrieving a group of spatial web objects such that the group's keywords cover the query...

  16. Query Language for Complex Similarity Queries

    CERN Document Server

    Budikova, Petra; Zezula, Pavel

    2012-01-01

    For complex data types such as multimedia, traditional data management methods are not suitable. Instead of attribute matching approaches, access methods based on object similarity are becoming popular. Recently, this resulted in an intensive research of indexing and searching methods for the similarity-based retrieval. Nowadays, many efficient methods are already available, but using them to build an actual search system still requires specialists that tune the methods and build the system manually. Several attempts have already been made to provide a more convenient high-level interface in a form of query languages for such systems, but these are limited to support only basic similarity queries. In this paper, we propose a new language that allows to formulate content-based queries in a flexible way, taking into account the functionality offered by a particular search engine in use. To ensure this, the language is based on a general data model with an abstract set of operations. Consequently, the language s...

  17. Indexing for summary queries

    DEFF Research Database (Denmark)

    Yi, Ke; Wang, Lu; Wei, Zhewei

    2014-01-01

    ), of a particular attribute of these records. Aggregation queries are especially useful in business intelligence and data analysis applications where users are interested not in the actual records, but some statistics of them. They can also be executed much more efficiently than reporting queries, by embedding...

  18. Query recommendation for children

    NARCIS (Netherlands)

    Duarte Torres, Sergio; Hiemstra, Djoerd; Weber, Ingmar; Serdyukov, Pavel

    2012-01-01

    One of the biggest problems that children experience while searching the web occurs during the query formulation process. Children have been found to struggle formulating queries based on keywords given their limited vocabulary and their difficulty to choose the right keywords. In this work we propo

  19. WATERS Expert Query Tool

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Expert Query Tool is a web-based reporting tool using the EPA’s WATERS database.There are just three steps to using Expert Query:1. View Selection – Choose what...

  20. Mastering jQuery

    CERN Document Server

    Libby, Alex

    2015-01-01

    If you are a developer who is already familiar with using jQuery and wants to push your skill set further, then this book is for you. The book assumes an intermediate knowledge level of jQuery, JavaScript, HTML5, and CSS.

  1. Indexing for summary queries

    DEFF Research Database (Denmark)

    Yi, Ke; Wang, Lu; Wei, Zhewei

    2014-01-01

    ), of a particular attribute of these records. Aggregation queries are especially useful in business intelligence and data analysis applications where users are interested not in the actual records, but some statistics of them. They can also be executed much more efficiently than reporting queries, by embedding...

  2. Declarative Visualization Queries

    Science.gov (United States)

    Pinheiro da Silva, P.; Del Rio, N.; Leptoukh, G. G.

    2011-12-01

    In an ideal interaction with machines, scientists may prefer to write declarative queries saying "what" they want from a machine than to write code stating "how" the machine is going to address the user request. For example, in relational database, users have long relied on specifying queries using Structured Query Language (SQL), a declarative language to request data results from a database management system. In the context of visualizations, we see that users are still writing code based on complex visualization toolkit APIs. With the goal of improving the scientists' experience of using visualization technology, we have applied this query-answering pattern to a visualization setting, where scientists specify what visualizations they want generated using a declarative SQL-like notation. A knowledge enhanced management system ingests the query and knows the following: (1) know how to translate the query into visualization pipelines; and (2) how to execute the visualization pipelines to generate the requested visualization. We define visualization queries as declarative requests for visualizations specified in an SQL like language. Visualization queries specify what category of visualization to generate (e.g., volumes, contours, surfaces) as well as associated display attributes (e.g., color and opacity), without any regards for implementation, thus allowing scientists to remain partially unaware of a wide range of visualization toolkit (e.g., Generic Mapping Tools and Visualization Toolkit) specific implementation details. Implementation details are only a concern for our knowledge-based visualization management system, which uses both the information specified in the query and knowledge about visualization toolkit functions to construct visualization pipelines. Knowledge about the use of visualization toolkits includes what data formats the toolkit operates on, what formats they output, and what views they can generate. Visualization knowledge, which is not

  3. Orthogonal Query Expansion

    CERN Document Server

    Ackerman, Margareta; Lopez-Ortiz, Alejandro

    2011-01-01

    Over the last fifteen years, web searching has seen tremendous improvements. Starting from a nearly random collection of matching pages in 1995, today, search engines tend to satisfy the user's informational need on well-formulated queries. One of the main remaining challenges is to satisfy the users' needs when they provide a poorly formulated query. When the pages matching the user's original keywords are judged to be unsatisfactory, query expansion techniques are used to alter the result set. These techniques find keywords that are similar to the keywords given by the user, which are then appended to the original query leading to a perturbation of the result set. However, when the original query is sufficiently ill-posed, the user's informational need is best met using entirely different keywords, and a small perturbation of the original result set is bound to fail. We propose a novel approach that is not based on the keywords of the original query. We intentionally seek out orthogonal queries, which are r...

  4. Moving Spatial Keyword Queries

    DEFF Research Database (Denmark)

    Wu, Dingming; Yiu, Man Lung; Jensen, Christian S.

    2013-01-01

    Web users and content are increasingly being geo-positioned. This development gives prominence to spatial keyword queries, which involve both the locations and textual descriptions of content. We study the efficient processing of continuously moving top-k spatial keyword (MkSK) queries over spatial...... text data. State-of-the-art solutions for moving queries employ safe zones that guarantee the validity of reported results as long as the user remains within the safe zone associated with a result. However, existing safe-zone methods focus solely on spatial locations and ignore text relevancy. We...

  5. jQuery cookbook

    CERN Document Server

    2010-01-01

    jQuery simplifies building rich, interactive web frontends. Getting started with this JavaScript library is easy, but it can take years to fully realize its breadth and depth; this cookbook shortens the learning curve considerably. With these recipes, you'll learn patterns and practices from 19 leading developers who use jQuery for everything from integrating simple components into websites and applications to developing complex, high-performance user interfaces. Ideal for newcomers and JavaScript veterans alike, jQuery Cookbook starts with the basics and then moves to practical use cases w

  6. Range-Clustering Queries

    OpenAIRE

    Abrahamsen, Mikkel; de Berg, Mark; Buchin, Kevin; Mehr, Mehran; Mehrabi, Ali D.

    2017-01-01

    In a geometric $k$-clustering problem the goal is to partition a set of points in $\\mathbb{R}^d$ into $k$ subsets such that a certain cost function of the clustering is minimized. We present data structures for orthogonal range-clustering queries on a point set $S$: given a query box $Q$ and an integer $k>2$, compute an optimal $k$-clustering for $S\\setminus Q$. We obtain the following results. We present a general method to compute a $(1+\\epsilon)$-approximation to a range-clustering query, ...

  7. Benchmarking Query Execution Robustness

    Science.gov (United States)

    Wiener, Janet L.; Kuno, Harumi; Graefe, Goetz

    Benchmarks that focus on running queries on a well-tuned database system ignore a long-standing problem: adverse runtime conditions can cause database system performance to vary widely and unexpectedly. When the query execution engine does not exhibit resilience to these adverse conditions, addressing the resultant performance problems can contribute significantly to the total cost of ownership for a database system in over-provisioning, lost efficiency, and increased human administrative costs. For example, focused human effort may be needed to manually invoke workload management actions or fine-tune the optimization of specific queries.

  8. Selective sweeps across twenty millions years of primate evolution

    DEFF Research Database (Denmark)

    Munch, Kasper; Nam, Kiwoong; Schierup, Mikkel Heide

    2016-01-01

    The contribution from selective sweeps to variation in genetic diversity has proven notoriously difficult to assess, in part because polymorphism data only allows detection of sweeps in the most recent few hundred thousand years. Here we show how linked selection in ancestral species can be quant...

  9. Localized Geometric Query Problems

    CERN Document Server

    Augustine, John; Maheshwari, Anil; Nandy, Subhas C; Roy, Sasanka; Sarvattomananda, Swami

    2011-01-01

    A new class of geometric query problems are studied in this paper. We are required to preprocess a set of geometric objects $P$ in the plane, so that for any arbitrary query point $q$, the largest circle that contains $q$ but does not contain any member of $P$, can be reported efficiently. The geometric sets that we consider are point sets and boundaries of simple polygons.

  10. Querying JSON Streams

    OpenAIRE

    Bo, Yang

    2010-01-01

    A data stream management system (DSMS) is similar to a database management system (DBMS) but can search data directly in on-line streams. Using its mediator-wrapper approach, the extensible database system, Amos II, allows different kinds of distributed data resource to be queried. It has been extended with a stream datatype to query possibly infinite streams, which provides DSMS functionality. Nowadays, more and more web applications start to offer their services in JSON format which is a te...

  11. Inverse Queries For Multidimensional Spaces

    CERN Document Server

    Bernecker, Thomas; Kriegel, Hans-Peter; Mamoulis, Nikos; Renz, Matthias; Zhang, Shiming; Züfle, Andreas

    2011-01-01

    Traditional spatial queries return, for a given query object $q$, all database objects that satisfy a given predicate, such as epsilon range and $k$-nearest neighbors. This paper defines and studies {\\em inverse} spatial queries, which, given a subset of database objects $Q$ and a query predicate, return all objects which, if used as query objects with the predicate, contain $Q$ in their result. We first show a straightforward solution for answering inverse spatial queries for any query predicate. Then, we propose a filter-and-refinement framework that can be used to improve efficiency. We show how to apply this framework on a variety of inverse queries, using appropriate space pruning strategies. In particular, we propose solutions for inverse epsilon range queries, inverse $k$-nearest neighbor queries, and inverse skyline queries. Our experiments show that our framework is significantly more efficient than naive approaches.

  12. Animating the Web with jQuery

    Directory of Open Access Journals (Sweden)

    Asokan M

    2013-02-01

    Full Text Available World globalization and present day technology increases the web users rapidly. Every website is trying to attract the web users. The web site creators /developers add different kind of animations to their websites. There are many softwares available to create animation. jQuery can be used to create interactive and powerful web pages with animations. JQuery is a JavaScript library intendedto make Java Script programming easier and more fun. A JavaScript library is a complex JavaScript program that both simplifies difficult tasks and solves cross-browser problems. With jQuery, we canaccomplish tasks in a single line of code. JQuery is used on millions of websites. This paper discuss about the advantages and usage statistics of jQuery on the web. A complete procedure to create a slider and banner plug-ins are also included. They are tested with different browsers.

  13. jQuery Mobile

    CERN Document Server

    Reid, Jon

    2011-01-01

    Native apps have distinct advantages, but the future belongs to mobile web apps that function on a broad range of smartphones and tablets. Get started with jQuery Mobile, the touch-optimized framework for creating apps that look and behave consistently across many devices. This concise book provides HTML5, CSS3, and JavaScript code examples, screen shots, and step-by-step guidance to help you build a complete working app with jQuery Mobile. If you're already familiar with the jQuery JavaScript library, you can use your existing skills to build cross-platform mobile web apps right now. This b

  14. XPath Whole Query Optimization

    CERN Document Server

    Maneth, Sebastian

    2010-01-01

    Previous work reports about SXSI, a fast XPath engine which executes tree automata over compressed XML indexes. Here, reasons are investigated why SXSI is so fast. It is shown that tree automata can be used as a general framework for fine grained XML query optimization. We define the "relevant nodes" of a query as those nodes that a minimal automaton must touch in order to answer the query. This notion allows to skip many subtrees during execution, and, with the help of particular tree indexes, even allows to skip internal nodes of the tree. We efficiently approximate runs over relevant nodes by means of on-the-fly removal of alternation and non-determinism of (alternating) tree automata. We also introduce many implementation techniques which allows us to efficiently evaluate tree automata, even in the absence of special indexes. Through extensive experiments, we demonstrate the impact of the different optimization techniques.

  15. Code query by example

    Science.gov (United States)

    Vaucouleur, Sebastien

    2011-02-01

    We introduce code query by example for customisation of evolvable software products in general and of enterprise resource planning systems (ERPs) in particular. The concept is based on an initial empirical study on practices around ERP systems. We motivate our design choices based on those empirical results, and we show how the proposed solution helps with respect to the infamous upgrade problem: the conflict between the need for customisation and the need for upgrade of ERP systems. We further show how code query by example can be used as a form of lightweight static analysis, to detect automatically potential defects in large software products. Code query by example as a form of lightweight static analysis is particularly interesting in the context of ERP systems: it is often the case that programmers working in this field are not computer science specialists but more of domain experts. Hence, they require a simple language to express custom rules.

  16. Learning jQuery

    CERN Document Server

    Chaffer, Jonathan

    2013-01-01

    Step through each of the core concepts of the jQuery library, building an overall picture of its capabilities. Once you have thoroughly covered the basics, the book returns to each concept to cover more advanced examples and techniques.This book is for web designers who want to create interactive elements for their designs, and for developers who want to create the best user interface for their web applications. Basic JavaScript programming and knowledge of HTML and CSS is required. No knowledge of jQuery is assumed, nor is experience with any other JavaScript libraries.

  17. KoralQuery -- A General Corpus Query Protocol

    DEFF Research Database (Denmark)

    Bingel, Joachim; Diewald, Nils

    2015-01-01

    The task-oriented and format-driven development of corpus query systems has led to the creation of numerous corpus query languages (QLs) that vary strongly in expressiveness and syntax. This is a severe impediment for the interoperability of corpus analysis systems, which lack a common protocol....... In this paper, we present KoralQuery, a JSON-LD based general corpus query protocol, aiming to be independent of particular QLs, tasks and corpus formats. In addition to describing the system of types and operations that KoralQuery is built on, we exemplify the representation of corpus queries in the serialized...

  18. User perspectives on query difficulty

    DEFF Research Database (Denmark)

    Lioma, Christina; Larsen, Birger; Schütze, Hinrich

    2011-01-01

    , or to statistical and linguistic features of the queries that may render them difficult. This work addresses query difficulty from a different angle, namely the users’ own perspectives on query difficulty. Two research questions are asked: (1) Are users aware that the query they submit to an IR system may......The difficulty of a user query can affect the performance of Information Retrieval (IR) systems. What makes a query difficult and how one may predict this is an active research area, focusing mainly on factors relating to the retrieval algorithm, to the properties of the retrieval data...... for synthesising the user-assessed causes of query difficulty through opinion fusion into an overall assessment of query difficulty. The resulting assessments of query difficulty are found to agree notably more to the TREC categories than the direct user assessments....

  19. Spatial Keyword Querying

    DEFF Research Database (Denmark)

    Cao, Xin; Chen, Lisi; Cong, Gao;

    2012-01-01

    The web is increasingly being used by mobile users. In addition, it is increasingly becoming possible to accurately geo-position mobile users and web content. This development gives prominence to spatial web data management. Specifically, a spatial keyword query takes a user location and user-sup...... different kinds of functionality as well as the ideas underlying their definition....

  20. Spatial Keyword Query Processing

    DEFF Research Database (Denmark)

    Chen, Lisi; Jensen, Christian S.; Wu, Dingming

    2013-01-01

    an all-around survey of 12 state- of-the-art geo-textual indices. We propose a benchmark that en- ables the comparison of the spatial keyword query performance. We also report on the findings obtained when applying the bench- mark to the indices, thus uncovering new insights that may guide index...

  1. Conceptual querying through ontologies

    DEFF Research Database (Denmark)

    Andreasen, Troels; Bulskov, Henrik

    2009-01-01

    We present here ail approach to conceptual querying where the aim is, given a collection of textual database objects or documents, to target an abstraction of the entire database content in terms of the concepts appearing in documents, rather than the documents in the collection. The approach is ...

  2. XIRAF: Ultimate Forensic Querying

    NARCIS (Netherlands)

    Alink, W.; Bhoedjang, R.; Vries, A.P. de; Boncz, P.A.

    2006-01-01

    This paper describes a novel, XML-based approach towards managing and querying forensic traces extracted from digital evidence. This approach has been implemented in XIRAF, a prototype system for forensic analysis. XIRAF systematically applies forensic analysis tools to evidence files (e.g., hard di

  3. Query Driven Visualization

    CERN Document Server

    Buddelmeijer, Hugo

    2011-01-01

    The request driven way of deriving data in Astro-WISE is extended to a query driven way of visualization. This allows scientists to focus on the science they want to perform, because all administration of their data is automated. This can be done over an abstraction layer that enhances control and flexibility for the scientist.

  4. Flexible Query Answering Systems

    DEFF Research Database (Denmark)

    -computer interaction. The special track covers some some specific and, typically, newer fields, namely: environmental scanning for strategic early warning; generating linguistic descriptions of data; advances in fuzzy querying and fuzzy databases: theory and applications; fusion and ensemble techniques for on......-line learning on data streams; and intelligent information extraction from texts....

  5. Flexible Query Answering Systems

    DEFF Research Database (Denmark)

    This book constitutes the refereed proceedings of the 12th International Conference on Flexible Query Answering Systems, FQAS 2017, held in London, UK, in June 2017. The 21 full papers presented in this book together with 4 short papers were carefully reviewed and selected from 43 submissions...

  6. Learning via Query Synthesis

    KAUST Repository

    Alabdulmohsin, Ibrahim Mansour

    2017-05-07

    Active learning is a subfield of machine learning that has been successfully used in many applications. One of the main branches of active learning is query synthe- sis, where the learning agent constructs artificial queries from scratch in order to reveal sensitive information about the underlying decision boundary. It has found applications in areas, such as adversarial reverse engineering, automated science, and computational chemistry. Nevertheless, the existing literature on membership query synthesis has, generally, focused on finite concept classes or toy problems, with a limited extension to real-world applications. In this thesis, I develop two spectral algorithms for learning halfspaces via query synthesis. The first algorithm is a maximum-determinant convex optimization method while the second algorithm is a Markovian method that relies on Khachiyan’s classical update formulas for solving linear programs. The general theme of these methods is to construct an ellipsoidal approximation of the version space and to synthesize queries, afterward, via spectral decomposition. Moreover, I also describe how these algorithms can be extended to other settings as well, such as pool-based active learning. Having demonstrated that halfspaces can be learned quite efficiently via query synthesis, the second part of this thesis proposes strategies for mitigating the risk of reverse engineering in adversarial environments. One approach that can be used to render query synthesis algorithms ineffective is to implement a randomized response. In this thesis, I propose a semidefinite program (SDP) for learning a distribution of classifiers, subject to the constraint that any individual classifier picked at random from this distributions provides reliable predictions with a high probability. This algorithm is, then, justified both theoretically and empirically. A second approach is to use a non-parametric classification method, such as similarity-based classification. In this

  7. Google BigQuery analytics

    CERN Document Server

    Tigani, Jordan

    2014-01-01

    How to effectively use BigQuery, avoid common mistakes, and execute sophisticated queries against large datasets Google BigQuery Analytics is the perfect guide for business and data analysts who want the latest tips on running complex queries and writing code to communicate with the BigQuery API. The book uses real-world examples to demonstrate current best practices and techniques, and also explains and demonstrates streaming ingestion, transformation via Hadoop in Google Compute engine, AppEngine datastore integration, and using GViz with Tableau to generate charts of query results. In addit

  8. User perspectives on query difficulty

    DEFF Research Database (Denmark)

    Lioma, Christina; Larsen, Birger; Schütze, Hinrich

    2011-01-01

    The difficulty of a user query can affect the performance of Information Retrieval (IR) systems. What makes a query difficult and how one may predict this is an active research area, focusing mainly on factors relating to the retrieval algorithm, to the properties of the retrieval data, or to sta......The difficulty of a user query can affect the performance of Information Retrieval (IR) systems. What makes a query difficult and how one may predict this is an active research area, focusing mainly on factors relating to the retrieval algorithm, to the properties of the retrieval data......, or to statistical and linguistic features of the queries that may render them difficult. This work addresses query difficulty from a different angle, namely the users’ own perspectives on query difficulty. Two research questions are asked: (1) Are users aware that the query they submit to an IR system may...

  9. COMPLEX QUERY AND METADATA

    OpenAIRE

    Nakatoh, Tetsuya; Omori, Keisuke; Yamada, Yasuhiro; Hirokawa, Sachio

    2003-01-01

    We are developing a search system DAISEn which integrates multiple search engines and generates a metasearch engine automatically. The target search engines of DAISEn are not general search engines, but are search engines specialized in some area. Integration of such engines yields efficiency and quality. There are search engines of new type which accept complex query and return structured data. Integration of such search engines is much harder than that of simple search engines which accept ...

  10. Querying genomic databases

    Energy Technology Data Exchange (ETDEWEB)

    Baehr, A.; Hagstrom, R.; Joerg, D.; Overbeek, R.

    1991-09-01

    A natural-language interface has been developed that retrieves genomic information by using a simple subset of English. The interface spares the biologist from the task of learning database-specific query languages and computer programming. Currently, the interface deals with the E. coli genome. It can, however, be readily extended and shows promise as a means of easy access to other sequenced genomic databases as well.

  11. A Semantic Graph Query Language

    Energy Technology Data Exchange (ETDEWEB)

    Kaplan, I L

    2006-10-16

    Semantic graphs can be used to organize large amounts of information from a number of sources into one unified structure. A semantic query language provides a foundation for extracting information from the semantic graph. The graph query language described here provides a simple, powerful method for querying semantic graphs.

  12. Query optimization over crowdsourced data

    KAUST Repository

    Park, Hyunjung

    2013-08-26

    Deco is a comprehensive system for answering declarative queries posed over stored relational data together with data obtained on-demand from the crowd. In this paper we describe Deco\\'s cost-based query optimizer, building on Deco\\'s data model, query language, and query execution engine presented earlier. Deco\\'s objective in query optimization is to find the best query plan to answer a query, in terms of estimated monetary cost. Deco\\'s query semantics and plan execution strategies require several fundamental changes to traditional query optimization. Novel techniques incorporated into Deco\\'s query optimizer include a cost model distinguishing between "free" existing data versus paid new data, a cardinality estimation algorithm coping with changes to the database state during query execution, and a plan enumeration algorithm maximizing reuse of common subplans in a setting that makes reuse challenging. We experimentally evaluate Deco\\'s query optimizer, focusing on the accuracy of cost estimation and the efficiency of plan enumeration.

  13. Mastering jQuery mobile

    CERN Document Server

    Lambert, Chip

    2015-01-01

    You've started down the path of jQuery Mobile, now begin mastering some of jQuery Mobile's higher level topics. Go beyond jQuery Mobile's documentation and master one of the hottest mobile technologies out there. Previous JavaScript and PHP experience can help you get the most out of this book.

  14. A query index for continuous queries on RFID streaming data

    Institute of Scientific and Technical Information of China (English)

    Jaekwan PARK; Bonghee HONG; Chaehoon BAN

    2008-01-01

    RFID middleware collects and filters RFID streaming data to process applications' requests called continuous queries, because they are executed continuously during tag movement. Several approaches to building an index on queries rather than data records, called a query index, have been proposed to evaluate continuous queries over streaming data. EPCgiobal proposed an Event Cycle Specification (ECSpec) model, which is a de facto standard query interface for RFID applications. Continuous queries based on ECSpec consist of a large number of segments that represent the query conditions. The problem when using any of the existing query indexes on these continuous queries is that it takes a long time to build the index, because it is necessary to insert a large number of segments into the index. To solve this problem, we propose a transform method that converts a group of segments into compressed data. We also propose an efficient query index scheme for the transformed space. Comparing with existing query indexes, the performance of proposed index outperforms the others on various datasets.

  15. Mining the SDSS SkyServer SQL queries log

    Science.gov (United States)

    Hirota, Vitor M.; Santos, Rafael; Raddick, Jordan; Thakar, Ani

    2016-05-01

    SkyServer, the Internet portal for the Sloan Digital Sky Survey (SDSS) astronomic catalog, provides a set of tools that allows data access for astronomers and scientific education. One of SkyServer data access interfaces allows users to enter ad-hoc SQL statements to query the catalog. SkyServer also presents some template queries that can be used as basis for more complex queries. This interface has logged over 330 million queries submitted since 2001. It is expected that analysis of this data can be used to investigate usage patterns, identify potential new classes of queries, find similar queries, etc. and to shed some light on how users interact with the Sloan Digital Sky Survey data and how scientists have adopted the new paradigm of e-Science, which could in turn lead to enhancements on the user interfaces and experience in general. In this paper we review some approaches to SQL query mining, apply the traditional techniques used in the literature and present lessons learned, namely, that the general text mining approach for feature extraction and clustering does not seem to be adequate for this type of data, and, most importantly, we find that this type of analysis can result in very different queries being clustered together.

  16. Bayesian Query-Focused Summarization

    CERN Document Server

    Daumé, Hal

    2009-01-01

    We present BayeSum (for ``Bayesian summarization''), a model for sentence extraction in query-focused summarization. BayeSum leverages the common case in which multiple documents are relevant to a single query. Using these documents as reinforcement for query terms, BayeSum is not afflicted by the paucity of information in short queries. We show that approximate inference in BayeSum is possible on large data sets and results in a state-of-the-art summarization system. Furthermore, we show how BayeSum can be understood as a justified query expansion technique in the language modeling for IR framework.

  17. Instant Cassandra query language

    CERN Document Server

    Singh, Amresh

    2013-01-01

    Get to grips with a new technology, understand what it is and what it can do for you, and then get to work with the most important features and tasks. It's an Instant Starter guide.Instant Cassandra Query Language is great for those who are working with Cassandra databases and who want to either learn CQL to check data from the console or build serious applications using CQL. If you're looking for something that helps you get started with CQL in record time and you hate the idea of learning a new language syntax, then this book is for you.

  18. An Efficient Algorithm for Query Transformation in Semantic Query Optimization

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    Semantic query optimization (SQO) is comparatively a recent approach for the transformation of given query into equivalent alternative query using matching rules in order to select an optimal query based on the costs of executing alternative queries. The key aspect of the algorithm proposed here is that previous proposed SQO techniques can be considered equally in the uniform cost model, with which optimization opportunities will not be missed. At the same time, the authors used the implication closure to guarantee that any matched rule will not be lost. The authors implemented their algorithm for the optimization of decomposed sub-query in local database in Multi-Database Integrator (MDBI), which is a multidatabase project. The experimental results verify that this algorithm is effective in the process of SQO.

  19. Optimizing Phylogenetic Queries for Performance.

    Science.gov (United States)

    Jamil, Hasan M

    2017-08-24

    The vast majority of phylogenetic databases do not support declarative querying using which their contents can be flexibly and conveniently accessed and the template based query interfaces they support do not allow arbitrary speculative queries. They therefore also do not support query optimization leveraging unique phylogeny properties. While a small number of graph query languages such as XQuery, Cypher and GraphQL exist for computer savvy users, most are too general and complex to be useful for biologists, and too inefficient for large phylogeny querying. In this paper, we discuss a recently introduced visual query language, called PhyQL, that leverages phylogeny specific properties to support essential and powerful constructs for a large class of phylogentic queries. We develop a range of pruning aids, and propose a substantial set of query optimization strategies using these aids suitable for large phylogeny querying. A hybrid optimization technique that exploits a set of indices and ``graphlet" partitioning is discussed. A ``fail soonest" strategy is used to avoid hopeless processing and is shown to produce dividends. Possible novel optimization techniques yet to be explored are also discussed.

  20. Deep web query interface understanding and integration

    CERN Document Server

    Dragut, Eduard C; Yu, Clement T

    2012-01-01

    There are millions of searchable data sources on the Web and to a large extent their contents can only be reached through their own query interfaces. There is an enormous interest in making the data in these sources easily accessible. There are primarily two general approaches to achieve this objective. The first is to surface the contents of these sources from the deep Web and add the contents to the index of regular search engines. The second is to integrate the searching capabilities of these sources and support integrated access to them. In this book, we introduce the state-of-the-art tech

  1. Cooperative Answering of Fuzzy Queries

    Institute of Scientific and Technical Information of China (English)

    Narjes Hachani; Mohamed Ali Ben Hassine; Hanène Chettaoui; Habib Ounelli

    2009-01-01

    The majority of existing information systems deals with crisp data through crisp database systems. Traditional Database Management Systems (DBMS) have not taken into account imprecision so one can say there is some sort of lack of flexibility. The reason is that queries retrieve only elements which precisely match to the given Boolean query. That is, an element belongs to the result if the query is true for this element; otherwise, no answers are returned to the user. The aim of this paper is to present a cooperative approach to handling empty answers of fuzzy conjunctive queries by referring to the Formal Concept Analysis (FCA) theory and fuzzy logic. We present an architecture which combines FCA and databases. The processing of fuzzy queries allows detecting the minimal reasons of empty answers. We also use concept lattice in order to provide the user with the nearest answers in the case of a query failure.

  2. Ranking Queries on Uncertain Data

    CERN Document Server

    Hua, Ming

    2011-01-01

    Uncertain data is inherent in many important applications, such as environmental surveillance, market analysis, and quantitative economics research. Due to the importance of those applications and rapidly increasing amounts of uncertain data collected and accumulated, analyzing large collections of uncertain data has become an important task. Ranking queries (also known as top-k queries) are often natural and useful in analyzing uncertain data. Ranking Queries on Uncertain Data discusses the motivations/applications, challenging problems, the fundamental principles, and the evaluation algorith

  3. Research Issues in Mobile Querying

    DEFF Research Database (Denmark)

    Breunig, M.; Jensen, Christian Søndergaard; Klein, M.

    2004-01-01

    This document reports on key aspects of the discussions conducted within the working group. In particular, the document aims to offer a structured and somewhat digested summary of the group's discussions. The document first offers concepts that enable characterization of "mobile queries" as well...... as the types of systems that enable such queries. It explores the notion of context in mobile queries. The document ends with a few observations, mainly regarding challenges....

  4. Optimizing queries in distributed systems

    Directory of Open Access Journals (Sweden)

    Ion LUNGU

    2006-01-01

    Full Text Available This research presents the main elements of query optimizations in distributed systems. First, data architecture according with system level architecture in a distributed environment is presented. Then the architecture of a distributed database management system (DDBMS is described on conceptual level followed by the presentation of the distributed query execution steps on these information systems. The research ends with presentation of some aspects of distributed database query optimization and strategies used for that.

  5. Smart Query Answering for Marine Sensor Data

    Directory of Open Access Journals (Sweden)

    Paulo de Souza

    2011-03-01

    Full Text Available We review existing query answering systems for sensor data. We then propose an extended query answering approach termed smart query, specifically for marine sensor data. The smart query answering system integrates pattern queries and continuous queries. The proposed smart query system considers both streaming data and historical data from marine sensor networks. The smart query also uses query relaxation technique and semantics from domain knowledge as a recommender system. The proposed smart query benefits in building data and information systems for marine sensor networks.

  6. Data Caching for XML Query

    Institute of Scientific and Technical Information of China (English)

    SU Fei; CI Lin-lin; ZHU Li-ping; ZHAO Xin-xin

    2006-01-01

    In order to apply the technique of data cache to extensible markup language (XML) database system, the XML-cache system to support data cache for XQuery is presented. According to the character of XML, the queries with nesting are normalized to facilitate the following operation. Based on the idea of incomplete tree, using the document type definition (DTD) schema tree and conditions from normalized XQuery, the results of previous queries are maintained to answer new queries, at the same time, the remainder queries are sent to XML database at the back. The results of experiment show all applications supported by XML database can use this technique to cache data for future use.

  7. From Questions to Queries

    Directory of Open Access Journals (Sweden)

    M. Drlík

    2007-12-01

    Full Text Available The extension of (Internet databases forceseveryone to become more familiar with techniques of datastorage and retrieval because users’ success often dependson their ability to pose right questions and to be able tointerpret their answers. University programs pay moreattention to developing database programming skills than todata exploitation skills. To educate our students to become“database users”, the authors intensively exploit supportivetools simplifying the production of database elements astables, queries, forms, reports, web pages, and macros.Videosequences demonstrating “standard operations” forcompleting them have been prepared to enhance out-ofclassroomlearning. The use of SQL and other professionaltools is reduced to the cases when the wizards are unable togenerate the intended construct.

  8. A study of medical and health queries to web search engines.

    Science.gov (United States)

    Spink, Amanda; Yang, Yin; Jansen, Jim; Nykanen, Pirrko; Lorence, Daniel P; Ozmutlu, Seda; Ozmutlu, H Cenk

    2004-03-01

    This paper reports findings from an analysis of medical or health queries to different web search engines. We report results: (i). comparing samples of 10000 web queries taken randomly from 1.2 million query logs from the AlltheWeb.com and Excite.com commercial web search engines in 2001 for medical or health queries, (ii). comparing the 2001 findings from Excite and AlltheWeb.com users with results from a previous analysis of medical and health related queries from the Excite Web search engine for 1997 and 1999, and (iii). medical or health advice-seeking queries beginning with the word 'should'. Findings suggest: (i). a small percentage of web queries are medical or health related, (ii). the top five categories of medical or health queries were: general health, weight issues, reproductive health and puberty, pregnancy/obstetrics, and human relationships, and (iii). over time, the medical and health queries may have declined as a proportion of all web queries, as the use of specialized medical/health websites and e-commerce-related queries has increased. Findings provide insights into medical and health-related web querying and suggests some implications for the use of the general web search engines when seeking medical/health information.

  9. KoralQuery -- A General Corpus Query Protocol

    DEFF Research Database (Denmark)

    Bingel, Joachim; Diewald, Nils

    2015-01-01

    The task-oriented and format-driven development of corpus query systems has led to the creation of numerous corpus query languages (QLs) that vary strongly in expressiveness and syntax. This is a severe impediment for the interoperability of corpus analysis systems, which lack a common protocol...... format and illustrate use cases in the KorAP project....

  10. Usability of XML Query Languages

    NARCIS (Netherlands)

    Graaumans, J.P.M.

    2005-01-01

    The eXtensible Markup Language (XML) is a markup language which enables re-use of information. Specific query languages for XML are developed to facilitate this. There are large differences between history, design goal, and syntax of the XML query languages. However, in practice these languages are

  11. The Semantics of Query Modification

    NARCIS (Netherlands)

    Hollink, V.; Tsikrika, T.; Vries, A.P. de

    2010-01-01

    We present a method that exploits `linked data' to determine semantic relations between consecutive user queries. Our method maps queries onto concepts in linked data and searches the linked data graph for direct or indirect relations between the concepts. By comparing relations between large number

  12. Querying Sentiment Development over Time

    DEFF Research Database (Denmark)

    Andreasen, Troels; Christiansen, Henning; Have, Christian Theil

    2013-01-01

    that measures how well a hypothesis characterizes a given time interval; the semantics is parameterized so it can be adjusted to different views of the data. EmoEpisodes is extended to a query language with variables standing for unknown topics and emotions, and the query-answering mechanism will return...

  13. Priming the Query Specification Process.

    Science.gov (United States)

    Toms, Elaine G.; Freund, Luanne

    2003-01-01

    Tests the use of questions as a technique in the query specification process. Using a within-subjects design, 48 people interacted with a modified Google interface to solve four information problems in four domains. Half the tasks were entered as typical keyword queries, and half as questions or statements. Results suggest the typical search box…

  14. jQuery Pocket Reference

    CERN Document Server

    Flanagan, David

    2010-01-01

    "As someone who uses jQuery on a regular basis, it was surprising to discover how much of the library I'm not using. This book is indispensable for anyone who is serious about using jQuery for non-trivial applications."-- Raffaele Cecco, longtime developer of video games, including Cybernoid, Exolon, and Stormlord jQuery is the "write less, do more" JavaScript library. Its powerful features and ease of use have made it the most popular client-side JavaScript framework for the Web. This book is jQuery's trusty companion: the definitive "read less, learn more" guide to the library. jQuery P

  15. Instant jQuery selectors

    CERN Document Server

    De Rosa, Aurelio

    2013-01-01

    Filled with practical, step-by-step instructions and clear explanations for the most important and useful tasks. Instant jQuery Selectors follows a simple how-to format with recipes aimed at making you well versed with the wide range of selectors that jQuery has to offer through a myriad of examples.Instant jQuery Selectors is for web developers who want to delve into jQuery from its very starting point: selectors. Even if you're already familiar with the framework and its selectors, you could find several tips and tricks that you aren't aware of, especially about performance and how jQuery ac

  16. jQuery UI cookbook

    CERN Document Server

    Boduch, Adam

    2013-01-01

    Filled with a practical collection of recipes, jQuery UI Cookbook is full of clear, step-by-step instructions that will help you harness the powerful UI framework in jQuery. Depending on your needs, you can dip in and out of the Cookbook and its recipes, or follow the book from start to finish.If you are a jQuery UI developer looking to improve your existing applications, extract ideas for your new application, or to better understand the overall widget architecture, then jQuery UI Cookbook is a must-have for you. The reader should at least have a rudimentary understanding of what jQuery UI is

  17. Query auto completion in information retrieval

    NARCIS (Netherlands)

    Cai, Fei

    2016-01-01

    Query auto completion is an important feature embedded into today's search engines. It can help users formulate queries which other people have searched for when he/she finishes typing the query prefix. Today's most sophisticated query auto completion approaches are based on the collected query logs

  18. In-context query reformulation for failing SPARQL queries

    Science.gov (United States)

    Viswanathan, Amar; Michaelis, James R.; Cassidy, Taylor; de Mel, Geeth; Hendler, James

    2017-05-01

    Knowledge bases for decision support systems are growing increasingly complex, through continued advances in data ingest and management approaches. However, humans do not possess the cognitive capabilities to retain a bird's-eyeview of such knowledge bases, and may end up issuing unsatisfiable queries to such systems. This work focuses on the implementation of a query reformulation approach for graph-based knowledge bases, specifically designed to support the Resource Description Framework (RDF). The reformulation approach presented is instance-and schema-aware. Thus, in contrast to relaxation techniques found in the state-of-the-art, the presented approach produces in-context query reformulation.

  19. The Query-commit Problem

    CERN Document Server

    Molinaro, Marco

    2011-01-01

    In the query-commit problem we are given a graph where edges have distinct probabilities of existing. It is possible to query the edges of the graph, and if the queried edge exists then its endpoints are irrevocably matched. The goal is to find a querying strategy which maximizes the expected size of the matching obtained. This stochastic matching setup is motivated by applications in kidney exchanges and online dating. In this paper we address the query-commit problem from both theoretical and experimental perspectives. First, we show that a simple class of edges can be queried without compromising the optimality of the strategy. This property is then used to obtain in polynomial time an optimal querying strategy when the input graph is sparse. Next we turn our attentions to the kidney exchange application, focusing on instances modeled over real data from existing exchange programs. We prove that, as the number of nodes grows, almost every instance admits a strategy which matches almost all nodes. This resu...

  20. Multi-Dimensional Path Queries

    DEFF Research Database (Denmark)

    Bækgaard, Lars

    1998-01-01

    We present the path-relationship model that supports multi-dimensional data modeling and querying. A path-relationship database is composed of sets of paths and sets of relationships. A path is a sequence of related elements (atoms, paths, and sets of paths). A relationship is a binary path...... to create nested path structures. We present an SQL-like query language that is based on path expressions and we show how to use it to express multi-dimensional path queries that are suited for advanced data analysis in decision support environments like data warehousing environments...

  1. Recommendation Sets and Choice Queries

    DEFF Research Database (Denmark)

    Viappiani, Paolo Renato; Boutilier, Craig

    2011-01-01

    Utility elicitation is an important component of many applications, such as decision support systems and recommender systems. Such systems query users about their preferences and offer recommendations based on the system's belief about the user's utility function. We analyze the connection between...... the problem of generating optimal recommendation sets and the problem of generating optimal choice queries, considering both Bayesian and regret-based elicitation. Our results show that, somewhat surprisingly, under very general circumstances, the optimal recommendation set coincides with the optimal query....

  2. The role of economics in the QUERI program: QUERI Series

    Directory of Open Access Journals (Sweden)

    Smith Mark W

    2008-04-01

    Full Text Available Abstract Background The United States (U.S. Department of Veterans Affairs (VA Quality Enhancement Research Initiative (QUERI has implemented economic analyses in single-site and multi-site clinical trials. To date, no one has reviewed whether the QUERI Centers are taking an optimal approach to doing so. Consistent with the continuous learning culture of the QUERI Program, this paper provides such a reflection. Methods We present a case study of QUERI as an example of how economic considerations can and should be integrated into implementation research within both single and multi-site studies. We review theoretical and applied cost research in implementation studies outside and within VA. We also present a critique of the use of economic research within the QUERI program. Results Economic evaluation is a key element of implementation research. QUERI has contributed many developments in the field of implementation but has only recently begun multi-site implementation trials across multiple regions within the national VA healthcare system. These trials are unusual in their emphasis on developing detailed costs of implementation, as well as in the use of business case analyses (budget impact analyses. Conclusion Economics appears to play an important role in QUERI implementation studies, only after implementation has reached the stage of multi-site trials. Economic analysis could better inform the choice of which clinical best practices to implement and the choice of implementation interventions to employ. QUERI economics also would benefit from research on costing methods and development of widely accepted international standards for implementation economics.

  3. Million object spectrograph

    Science.gov (United States)

    Ditto, Thomas D.; Ritter, Joseph M.

    2008-07-01

    A new class of astronomical telescope with a primary objective grating (POG) has been studied as an alternative to mirrors. Nineteenth century POG telescopes suffered from low resolution and ambiguity of overlapping spectra as well as background noise. The present design uses a conventional secondary spectrograph to disambiguate all objects while enjoying a very wide instantaneous field-of-view, up to 40°. The POG competes with mirrors, in part, because diffraction gratings provide the very chromatic dispersion that mirrors defeat. The resulting telescope deals effectively with long-standing restrictions on multiple object spectrographs (MOS). The combination of a POG operating in the first-order, coupled to a spectrographic astronomical telescope, isolates spectra from all objects in the free spectral range of the primary. First disclosed as a concept in year 2002, a physical proof-of-principle is now reported. The miniature laboratory model used a 50 mm plane grating primary and was able to disambiguate between objects appearing at angular resolutions of 55 arcseconds and spectral spacings of 0.15 nm. Astronomical performance is a matter of increasing instrument size. A POG configured according to our specifications has no moving parts during observations and is extensible to any length that can be held flat to tolerances approaching float glass. The resulting telescope could record over one million spectra per night of objects in a line of right ascension. The novel MOS does not require pre-imaging to start acquisition of uncharted star fields. Problems are anticipated in calibration and integration time. We propose means to ameliorate them.

  4. jQuery For Dummies

    CERN Document Server

    Beighley, Lynn

    2010-01-01

    Learn how jQuery can make your Web page or blog stand out from the crowd!. jQuery is free, open source software that allows you to extend and customize Joomla!, Drupal, AJAX, and WordPress via plug-ins. Assuming no previous programming experience, Lynn Beighley takes you through the basics of jQuery from the very start. You'll discover how the jQuery library separates itself from other JavaScript libraries through its ease of use, compactness, and friendliness if you're a beginner programmer. Written in the easy-to-understand style of the For Dummies brand, this book demonstrates how you can a

  5. XML Multidimensional Modelling and Querying

    CERN Document Server

    Boucher, Serge; Zimányi, Esteban

    2009-01-01

    As XML becomes ubiquitous and XML storage and processing becomes more efficient, the range of use cases for these technologies widens daily. One promising area is the integration of XML and data warehouses, where an XML-native database stores multidimensional data and processes OLAP queries written in the XQuery interrogation language. This paper explores issues arising in the implementation of such a data warehouse. We first compare approaches for multidimensional data modelling in XML, then describe how typical OLAP queries on these models can be expressed in XQuery. We then show how, regardless of the model, the grouping features of XQuery 1.1 improve performance and readability of these queries. Finally, we evaluate the performance of query evaluation in each modelling choice using the eXist database, which we extended with a grouping clause implementation.

  6. Schedule Sales Query Raw Data

    Data.gov (United States)

    General Services Administration — Schedule Sales Query presents sales volume figures as reported to GSA by contractors. The reports are generated as quarterly reports for the current year and the...

  7. Ontological Queries: Rewriting and Optimization (Extended Version)

    CERN Document Server

    Gottlob, Georg; Pieris, Andreas

    2011-01-01

    Ontological queries are evaluated against an ontology rather than directly on a database. The evaluation and optimization of such queries is an intriguing new problem for database research. In this paper we discuss two important aspects of this problem: query rewriting and query optimization. Query rewriting consists of the compilation of an ontological query into an equivalent query against the underlying relational database. The focus here is on soundness and completeness. We review previous results and present a new rewriting algorithm for rather general types of ontological constraints. In particular, we show how a conjunctive query against an ontology can be compiled into a union of conjunctive queries against the underlying database. Ontological query optimization, in this context, attempts to improve this process so to produce possibly small and cost-effective UCQ rewritings for an input query. We review existing optimization methods, and propose an effective new method that works for linear Datalog+/-...

  8. Improved query difficulty prediction for the web

    NARCIS (Netherlands)

    Hauff, C.; Murdock, V.; Baeza-Yates, R.

    2008-01-01

    Query performance prediction aims to predict whether a query will have a high average precision given retrieval from a particular collection, or low average precision. An accurate estimator of the quality of search engine results can allow the search engine to decide to which queries to apply query

  9. Effective Density Queries of Continuously Moving Objects

    DEFF Research Database (Denmark)

    Jensen, Christian Søndergaard; Lin, D.; Ooi, B.C.

    2006-01-01

    In this paper, we study a newly emerging type of queries on moving objects - the density query. Basically, this query locates regions in the data space where the density of the objects is high. This type of queries is especially useful in Location Based Services (LBS). For example, in a traffic...

  10. Privacy Preserving Moving KNN Queries

    CERN Document Server

    Hashem, Tanzima; Zhang, Rui

    2011-01-01

    We present a novel approach that protects trajectory privacy of users who access location-based services through a moving k nearest neighbor (MkNN) query. An MkNN query continuously returns the k nearest data objects for a moving user (query point). Simply updating a user's imprecise location such as a region instead of the exact position to a location-based service provider (LSP) cannot ensure privacy of the user for an MkNN query: continuous disclosure of regions enables the LSP to follow a user's trajectory. We identify the problem of trajectory privacy that arises from the overlap of consecutive regions while requesting an MkNN query and provide the first solution to this problem. Our approach allows a user to specify the confidence level that represents a bound of how much more the user may need to travel than the actual kth nearest data object. By hiding a user's required confidence level and the required number of nearest data objects from an LSP, we develop a technique to prevent the LSP from tracking...

  11. Dynamic Planar Range Maxima Queries

    DEFF Research Database (Denmark)

    Brodal, Gerth Stølting; Tsakalidis, Konstantinos

    2011-01-01

    We consider the dynamic two-dimensional maxima query problem. Let P be a set of n points in the plane. A point is maximal if it is not dominated by any other point in P. We describe two data structures that support the reporting of the t maximal points that dominate a given query point, and allow...... update time, using O(nlogn) space, where t is the size of the output. This improves the worst case deletion time of the dynamic rectangular visibility query problem from O(log^3 n) to O(log^2 n). We adapt the data structure to the RAM model with word size w, where the coordinates of the points...... in the worst case. The data structure also supports the more general query of reporting the maximal points among the points that lie in a given 3-sided orthogonal range unbounded from above in the same complexity. We can support 4-sided queries in O(log^2 n + t) worst case time, and O(log^2 n) worst case...

  12. Bottom-up mining of XML query patterns to improve XML querying

    Institute of Scientific and Technical Information of China (English)

    Yi-jun BEI; Gang CHEN; Jin-xiang DONG; Ke CHEN

    2008-01-01

    Querying XML data is a computationally expensive process due to the complex nature of both the XML data and the XML queries. In this paper we propose an approach to expedite XML query processing by caching the results of frequent queries. We discover frequent query patterns from user-issued queries using an efficient bottom-up mining approach called VBUXMiner. VBUXMiner consists of two main steps. First, all queries are merged into a summary structure named "compressed global tree guide" (CGTG). Second, a bottom-up traversal scheme based on the CGTG is employed to generate frequent query patterns. We use the frequent query patterns in a cache mechanism to improve the XML query performance. Experimental results show that our proposed mining approach outperforms the previous mining algorithms for XML queries, such as XQPMinerTID and FastXMiner, and that by caching the results of frequent query patterns, XML query performance can be dramatically improved.

  13. Condorcet query engine: A query engine for coordinated index terms

    NARCIS (Netherlands)

    van der Vet, P.E.; Mars, Nicolaas

    1999-01-01

    On-line information retrieval systems often offer their users some means to tune the query to match the level of granularity of the information request. Users can be offered a far greater range of possibilities, however, if documents are indexed with coordinated index concepts. Coordinated index

  14. Head First jQuery

    CERN Document Server

    Benedetti, Ryan

    2011-01-01

    Want to add more interactivity and polish to your websites? Discover how jQuery can help you build complex scripting functionality in just a few lines of code. With Head First jQuery, you'll quickly get up to speed on this amazing JavaScript library by learning how to navigate HTML documents while handling events, effects, callbacks, and animations. By the time you've completed the book, you'll be incorporating Ajax apps, working seamlessly with HTML and CSS, and handling data with PHP, MySQL and JSON. If you want to learn-and understand-how to create interactive web pages, unobtrusive scrip

  15. Preference Elicitation in Prioritized Skyline Queries

    CERN Document Server

    Mindolin, Denis

    2010-01-01

    Preference queries incorporate the notion of binary preference relation into relational database querying. Instead of returning all the answers, such queries return only the best answers, according to a given preference relation. Preference queries are a fast growing area of database research. Skyline queries constitute one of the most thoroughly studied classes of preference queries. A well known limitation of skyline queries is that skyline preference relations assign the same importance to all attributes. In this work, we study p-skyline queries that generalize skyline queries by allowing varying attribute importance in preference relations. We perform an in-depth study of the properties of p-skyline preference relations. In particular,we study the problems of containment and minimal extension. We apply the obtained results to the central problem of the paper: eliciting relative importance of attributes. Relative importance is implicit in the constructed p-skyline preference relation. The elicitation is ba...

  16. Scalable Social Coordination using Enmeshed Queries

    CERN Document Server

    Chen, Jianjun; Varghese, George

    2012-01-01

    Social coordination allows users to move beyond awareness of their friends to efficiently coordinating physical activities with others. While specific forms of social coordination can be seen in tools such as Evite, Meetup and Groupon, we introduce a more general model using what we call {\\em enmeshed queries}. An enmeshed query allows users to declaratively specify an intent to coordinate by specifying social attributes such as the desired group size and who/what/when, and the database returns matching queries. Enmeshed queries are continuous, but new queries (and not data) answer older queries; the variable group size also makes enmeshed queries different from entangled queries, publish-subscribe systems, and dating services. We show that even offline group coordination using enmeshed queries is NP-hard. We then introduce efficient heuristics that use selective indices such as location and time to reduce the space of possible matches; we also add refinements such as delayed evaluation and using the relative...

  17. Query Expansion Using Heterogeneous Thesauri.

    Science.gov (United States)

    Mandala, Rila; Tokunaga, Takenobu; Tanaka, Hozumi

    2000-01-01

    Proposes a method to improve the performance of information retrieval systems by expanding queries using heterogeneous thesauri. Experiments show that using heterogeneous thesauri with an appropriate weighting method results in better retrieval performance than using only one type of thesaurus. (Author/LRW)

  18. Accomplishing Deterministic XML Query Optimization

    Institute of Scientific and Technical Information of China (English)

    Dun-Ren Che

    2005-01-01

    As the popularity of XML (eXtensible Markup Language) keeps growing rapidly, the management of XML compliant structured-document databases has become a very interesting and compelling research area. Query optimization for XML structured-documents stands out as one of the most challenging research issues in this area because of the much enlarged optimization (search) space, which is a consequence of the intrinsic complexity of the underlying data model of XML data. We therefore propose to apply deterministic transformations on query expressions to most aggressively prune the search space and fast achieve a sufficiently improved alternative (if not the optimal) for each incoming query expression. This idea is not just exciting but practically attainable. This paper first provides an overview of our optimization strategy, and then focuses on the key implementation issues of our rule-based transformation system for XML query optimization in a database environment. The performance results we obtained from experimentation show that our approach is a valid and effective one.

  19. Query Expansion Using Heterogeneous Thesauri.

    Science.gov (United States)

    Mandala, Rila; Tokunaga, Takenobu; Tanaka, Hozumi

    2000-01-01

    Proposes a method to improve the performance of information retrieval systems by expanding queries using heterogeneous thesauri. Experiments show that using heterogeneous thesauri with an appropriate weighting method results in better retrieval performance than using only one type of thesaurus. (Author/LRW)

  20. Querying Large Biological Network Datasets

    Science.gov (United States)

    Gulsoy, Gunhan

    2013-01-01

    New experimental methods has resulted in increasing amount of genetic interaction data to be generated every day. Biological networks are used to store genetic interaction data gathered. Increasing amount of data available requires fast large scale analysis methods. Therefore, we address the problem of querying large biological network datasets.…

  1. Explanations for Skyline Query Results

    DEFF Research Database (Denmark)

    Chester, Sean; Assent, Ira

    2015-01-01

    Skyline queries are a well-studied problem for multidimensional data, wherein points are returned to the user iff no other point is preferable across all attributes. This leaves only the points most likely to appeal to an arbitrary user. However, some dominated points may still be interesting, an...

  2. Logical Querying of Relational Databases

    Directory of Open Access Journals (Sweden)

    Luminita Pistol

    2016-12-01

    Full Text Available This paper aims to demonstrate the usefulness of formal logic and lambda calculus in database programming. After a short introduction in propositional and first order logic, we implement dynamically a small database and translate some SQL queries in filtered java 8 streams, enhanced with Tuples facilities from jOOλ library.

  3. Enhancing Recall in Semantic Querying

    DEFF Research Database (Denmark)

    Rouces, Jacobo

    2013-01-01

    RDF and SPARQL are currently state-of-the-art W3C standards to respectively represent and query structured information, especially when information from different sources must be federated. However, there are various reasons for which the same knowledge can be modeled in RDF graphs that are both ...

  4. Large Catalogue Query Performance in Relational Databases

    Science.gov (United States)

    Power, Robert A.

    2007-05-01

    The performance of the mysql and oracle database systems have been compared for a selection of astronomy queries using large catalogues of up to a billion objects. The queries tested are those expected from the astronomy community: general database queries, cone searches, neighbour finding and cross matching. The catalogue preparation, sql query formulation and database performance is presented. Most of the general queries perform adequately when appropriate indexes are present in the database. Each system performs well for cone search queries when the Hierarchical Triangular Mesh spatial index is used. Neighbour finding and cross matching are not well supported in a database environment when compared to software specifically developed to solve these problems.

  5. Optimizing Temporal Queries: Efficient Handling of Duplicates

    DEFF Research Database (Denmark)

    Toman, David; Bowman, Ivan Thomas

    2001-01-01

    Recent research in the area of temporal databases has proposed a number of query languages that vary in their expressive power and the semantics they provide to users. These query languages represent a spectrum of solutions to the tension between clean semantics and efficient evaluation. Often......, these query languages are implemented by translating temporal queries into standard relational queries. However, the compiled queries are often quite cumbersome and expensive to execute even using state-of-the- art relational products. This paper presents an optimization technique that produces more efficient...... translated SQL queries by taking into account the properties of the encoding used for temporal attributes. For concreteness, this translation technique is presented in the context of SQL/TP; however, these techniques are also applicable to other temporal query languages....

  6. Optimizing Temporal Queries: Efficient Handling of Duplicates

    DEFF Research Database (Denmark)

    Toman, David; Bowman, Ivan Thomas

    2001-01-01

    translated SQL queries by taking into account the properties of the encoding used for temporal attributes. For concreteness, this translation technique is presented in the context of SQL/TP; however, these techniques are also applicable to other temporal query languages......., these query languages are implemented by translating temporal queries into standard relational queries. However, the compiled queries are often quite cumbersome and expensive to execute even using state-of-the- art relational products. This paper presents an optimization technique that produces more efficient......Recent research in the area of temporal databases has proposed a number of query languages that vary in their expressive power and the semantics they provide to users. These query languages represent a spectrum of solutions to the tension between clean semantics and efficient evaluation. Often...

  7. Format SPARQL Query Results into HTML Report

    Directory of Open Access Journals (Sweden)

    Dr Sunitha Abburu

    2013-07-01

    Full Text Available SPARQL is one of the powerful query language for querying semantic data. It is recognized by the W3C as a query language for RDF. As an efficient query language for RDF, it has defined several query result formats such as CSV, TSV and XML etc. These formats are not attractive, understandable and readable. The results need to be converted in an appropriate format so that user can easily understand. The above formats require additional transformations or tool support to represent the query result in user readable format. The main aim of this paper is to propose a method to build HTML report dynamically for SPARQL query results. This enables SPARQL query result display, in HTML report format easily, in an attractive understandable format without the support of any additional or external tools or transformation.

  8. Identifying Aspects for Web-Search Queries

    OpenAIRE

    Wu, Fei; Madhavan, Jayant; Halevy, Alon

    2014-01-01

    Many web-search queries serve as the beginning of an exploration of an unknown space of information, rather than looking for a specific web page. To answer such queries effec- tively, the search engine should attempt to organize the space of relevant information in a way that facilitates exploration. We describe the Aspector system that computes aspects for a given query. Each aspect is a set of search queries that together represent a distinct information need relevant to the original search...

  9. Broadcast-Based Spatial Queries

    Institute of Scientific and Technical Information of China (English)

    Kwang-Jin Park; Moon-Bae Song; Chong-Sun Hwang

    2005-01-01

    Indexing techniques have been developed for wireless data broadcast environments, in order to conserve the scarce power resources of the mobile clients. However, the use of interleaved index segments in a broadcast cycle increases the average access latency for the clients. In this paper, the broadcast-based spatial query processing methods (BBS)are presented for the location-based services. In the BBS, broadcasted data objects are sorted sequentially based on their locations, and the server broadcasts the location dependent data along with an index segment. Then, a sequential prefetching and caching scheme is designed to reduce the query response time. The performance of this scheme is investigated in relation to various environmental variables, such as the distributions of the data objects, the average speed of the clients and the size of the service area.

  10. Querying Sentiment Development over Time

    DEFF Research Database (Denmark)

    Andreasen, Troels; Christiansen, Henning; Have, Christian Theil

    2013-01-01

    A new language is introduced for describing hypotheses about fluctuations of measurable properties in streams of timestamped data, and as prime example, we consider trends of emotions in the constantly flowing stream of Twitter messages. The language, called EmoEpisodes, has a precise semantics...... that measures how well a hypothesis characterizes a given time interval; the semantics is parameterized so it can be adjusted to different views of the data. EmoEpisodes is extended to a query language with variables standing for unknown topics and emotions, and the query-answering mechanism will return...... instantiations for topics and emotions as well as time intervals that provide the largest deflections in this measurement. Experiments are performed on a selection of Twitter data to demonstrates the usefulness of the approach....

  11. Lightweight query authentication on streams

    OpenAIRE

    2014-01-01

    We consider a stream outsourcing setting, where a data owner delegates the management of a set of disjoint data streams to an untrusted server. The owner authenticates his streams via signatures. The server processes continuous queries on the union of the streams for clients trusted by the owner. Along with the results, the server sends proofs of result correctness derived from the owner's signatures, which are easily verifiable by the clients. We design novel constructions for a collection o...

  12. Building interactive queries with LINQPad

    CERN Document Server

    Finot, Sébastien

    2013-01-01

    A step-by-step practical guide that will introduce you to LINQPad's key features, thereby helping you to query databases interactively.This book is aimed at C#/.Net developers who wish to learn LINQ programming and leverage the easy way of using LINQPad. No prior knowledge of LINQ or LINQPad is expected. A basic knowledge of SQL and XML is required for some chapters.

  13. Flexible Query Answering Systems 2006

    DEFF Research Database (Denmark)

    submissions, relating to the topic of users posing queries and systems producing answers. The papers cover the fields: Database Management, Information Retrieval, Domain Modeling, Knowledge Representation and Ontologies, Knowledge Discovery and Data Mining, Artificial Intelligence, Classical and Non......-classical Logics, Computational Linguistics and Natural Language Processing, Multimedia Information Systems, and Human--Computer Interaction, including reports of interesting applications. We wish to thank the contributors for their excellent papers and the referees, publisher, and sponsors for their effort...

  14. SCRY: Enabling quantitative reasoning in SPARQL queries

    NARCIS (Netherlands)

    Meroño-Peñuela, A.; Stringer, Bas; Loizou, Antonis; Abeln, Sanne; Heringa, Jaap

    2015-01-01

    The inability to include quantitative reasoning in SPARQL queries slows down the application of Semantic Web technology in the life sciences. SCRY, our SPARQL compatible service layer, improves this by executing services at query time and making their outputs query-accessible, generating RDF data on

  15. Predecessor queries in dynamic integer sets

    DEFF Research Database (Denmark)

    Brodal, Gerth Stølting

    1997-01-01

    We consider the problem of maintaining a set of n integers in the range 0.2w–1 under the operations of insertion, deletion, predecessor queries, minimum queries and maximum queries on a unit cost RAM with word size w bits. Let f (n) be an arbitrary nondecreasing smooth function satisfying n...

  16. Heuristics-based query optimisation for SPARQL

    NARCIS (Netherlands)

    P. Tsialiamanis (Petros); E. Sidirourgos (Eleftherios); I. Fundulaki; V. Christophides; P.A. Boncz (Peter)

    2012-01-01

    textabstractQuery optimization in RDF Stores is a challenging problem as SPARQL queries typically contain many more joins than equivalent relational plans, and hence lead to a large join order search space. In such cases, cost-based query optimization often is not possible. One practical reason for

  17. Query Adaptive Image Retrieval System

    Directory of Open Access Journals (Sweden)

    Amruta Dubewar

    2014-03-01

    Full Text Available Images play a crucial role in various fields such as art gallery, medical, journalism and entertainment. Increasing use of image acquisition and data storage technologies have enabled the creation of large database. So, it is necessary to develop appropriate information management system to efficiently manage these collections and needed a system to retrieve required images from these collections. This paper proposed query adaptive image retrieval system (QAIRS to retrieve images similar to the query image specified by user from database. The goal of this system is to support image retrieval based on content properties such as colour and texture, usually encoded into feature vectors. In this system, colour feature extracted by various techniques such as colour moment, colour histogram and autocorrelogram and texture feature extracted by using gabor wavelet. Hashing technique is used to embed high dimensional image features into hamming space, where search can be performed by hamming distance of compact hash codes. Depending upon minimum hamming distance it returns the similar image to query image.

  18. Boolean queries for news monitoring: Suggesting new query terms to expert users

    NARCIS (Netherlands)

    Verberne, S.; Wabeke, T.; Kaptein, R.

    2016-01-01

    In this paper, we evaluate query suggestion for Boolean queries in a news monitoring system. Users of this system receive news articles that match their running query on a daily basis. Because the news for a topic continuously changes, the queries need regular updating. We first investigated the

  19. Truth Space Method for Caching Database Queries

    Directory of Open Access Journals (Sweden)

    S. V. Mosin

    2015-01-01

    Full Text Available We propose a new method of client-side data caching for relational databases with a central server and distant clients. Data are loaded into the client cache based on queries executed on the server. Every query has the corresponding DB table – the result of the query execution. These queries have a special form called "universal relational query" based on three fundamental Relational Algebra operations: selection, projection and natural join. We have to mention that such a form is the closest one to the natural language and the majority of database search queries can be expressed in this way. Besides, this form allows us to analyze query correctness by checking lossless join property. A subsequent query may be executed in a client’s local cache if we can determine that the query result is entirely contained in the cache. For this we compare truth spaces of the logical restrictions in a new user’s query and the results of the queries execution in the cache. Such a comparison can be performed analytically , without need in additional Database queries. This method may be used to define lacking data in the cache and execute the query on the server only for these data. To do this the analytical approach is also used, what distinguishes our paper from the existing technologies. We propose four theorems for testing the required conditions. The first and the third theorems conditions allow us to define the existence of required data in cache. The second and the fourth theorems state conditions to execute queries with cache only. The problem of cache data actualizations is not discussed in this paper. However, it can be solved by cataloging queries on the server and their serving by triggers in background mode. The article is published in the author’s wording.

  20. A solution of spatial query processing and query optimization for spatial databases

    Institute of Scientific and Technical Information of China (English)

    YUAN Jie; XIE Kun-qing; MA Xiu-jun; ZHANG Min; SUN Le-bin

    2004-01-01

    Recently, attention has been focused on spatial query language which is used to query spatial databases. A design of spatial query language has been presented in this paper by extending the standard relational database query language SQL. It recognizes the significantly different requirements of spatial data handling and overcomes the inherent problems of the application of conventional database query languages. This design is based on an extended spatial data model, including the spatial data types and the spatial operators on them. The processing and optimization of spatial queries have also been discussed in this design. In the end, an implementation of this design is given in a spatial query subsystem.

  1. Cache-Based Aggregate Query Shipping: An Efficient Scheme of Distributed OLAP Query Processing

    Institute of Scientific and Technical Information of China (English)

    Hua-Ming Liao; Guo-Shun Pei

    2008-01-01

    Our study introduces a novel distributed query plan refinement phase in an enhanced architecture of distributed query processing engine (DQPE). Query plan refinement generates potentially efficient distributed query plan by reusable aggregate query shipping (RAQS) approach. The approach improves response time at the cost of pre-processing time. If theoverheads could not be compensated by query results reusage, RAQS is no more favorable. Therefore a global cost estimation model is employed to get proper operators: RR_Agg, R_Agg, or R_Scan. For the purpose of reusing results of queries with aggregate function in distributed query processing, a multi-level hybrid view caching (HVC) scheme is introduced. The scheme retains the advantages of partial match and aggregate query results caching. By our solution, evaluations with distributed TPC-H queries show significant improvement on average response time.

  2. Web development with jQuery

    CERN Document Server

    York, Richard

    2015-01-01

    Newly revised and updated resource on jQuery's many features and advantages Web Development with jQuery offers a major update to the popular Beginning JavaScript and CSS Development with jQuery from 2009. More than half of the content is new or updated, and reflects recent innovations with regard to mobile applications, jQuery mobile, and the spectrum of associated plugins. Readers can expect thorough revisions with expanded coverage of events, CSS, AJAX, animation, and drag and drop. New chapters bring developers up to date on popular features like jQuery UI, navigation, tables, interacti

  3. Structured Query Language for Virtual Observatory

    CERN Document Server

    Shirasaki, Y; Mizumoto, Y; Tanaka, M; Honda, S; Oe, M; Yasuda, N; Masunaga, Y; Shirasaki, Yuji; Ohishi, Masatoshi; Mizumoto, Yoshihiko; Tanaka, Masahiro; Honda, Satoshi; Oe, Masafumi; Yasuda, Naoki; Masunaga, Yoshifumi

    2004-01-01

    Currently two query languages are defined as standards for the Virtual Observatory (VO). Astronomical Data Query Language (ADQL) is used for catalog data query and Simple Image Access Protocol (SIAP) is for image data query. As a result, when we query each data service, we need to know in advance which language is supported and then construct a query language accordingly. The construct of SIAP is simple, but they have a limited capability. For example, there is no way to specify multiple regions in one query, and it is difficult to specify complex query conditions. In this paper, we propose a unified query language for any kind of astronomical database on the basis of SQL99. SQL is a query language optimized for a table data, so to apply the SQL to the image and spectrum data set, the data structure need to be mapped to a table like structure. We present specification of this query language and an example of the architecture for the database system.

  4. Mr Cameron's Three Million Apprenticeships

    Science.gov (United States)

    Allen, Martin

    2015-01-01

    In the 2015 general election campaign David Cameron celebrated the success of apprenticeships during the Coalition and promised another 3 million. This article argues that the "reinvention" of apprenticeships has neither created real skills nor provided real alternatives for young people and that the UK schemes fall far short of those in…

  5. EquiX-A Search and Query Language for XML.

    Science.gov (United States)

    Cohen, Sara; Kanza, Yaron; Kogan, Yakov; Sagiv, Yehoshua; Nutt, Werner; Serebrenik, Alexander

    2002-01-01

    Describes EquiX, a search language for XML that combines querying with searching to query the data and the meta-data content of Web pages. Topics include search engines; a data model for XML documents; search query syntax; search query semantics; an algorithm for evaluating a query on a document; and indexing EquiX queries. (LRW)

  6. Monitoring nearest neighbor queries with cache strategies

    Institute of Scientific and Technical Information of China (English)

    PAN Peng; LU Yan-sheng

    2007-01-01

    The problem of continuously monitoring multiple K-nearest neighbor (K-NN) queries with dynamic object and query dataset is valuable for many location-based applications. A practical method is to partition the data space into grid cells, with both object and query table being indexed by this grid structure, while solving the problem by periodically joining cells of objects with queries having their influence regions intersecting the cells. In the worst case, all cells of objects will be accessed once. Object and query cache strategies are proposed to further reduce the I/O cost. With object cache strategy, queries remaining static in current processing cycle seldom need I/O cost, they can be returned quickly. The main I/O cost comes from moving queries, the query cache strategy is used to restrict their search-regions, which uses current results of queries in the main memory buffer. The queries can share not only the accessing of object pages, but also their influence regions. Theoretical analysis of the expected I/O cost is presented, with the I/O cost being about 40% that of the SEA-CNN method in the experiment results.

  7. Adding query privacy to robust DHTs

    DEFF Research Database (Denmark)

    Backes, Michael; Goldberg, Ian; Kate, Aniket

    2012-01-01

    Interest in anonymous communication over distributed hash tables (DHTs) has increased in recent years. However, almost all known solutions solely aim at achieving sender or requestor anonymity in DHT queries. In many application scenarios, it is crucial that the queried key remains secret from...... intermediate peers that (help to) route the queries towards their destinations. In this paper, we satisfy this requirement by presenting an approach for providing privacy for the keys in DHT queries. We use the concept of oblivious transfer (OT) in communication over DHTs to preserve query privacy without...... compromising spam resistance. Although our OT-based approach can work over any DHT, we concentrate on robust DHTs that can tolerate Byzantine faults and resist spam. We choose the best-known robust DHT construction, and employ an efficient OT protocol well-suited for achieving our goal of obtaining query...

  8. jQuery for ASPNET Developers

    CERN Document Server

    Brinkman, Joe

    2009-01-01

    This Wrox Blox teaches you how to use jQuery with your ASP.NET-based websites.  jQuery greatly simplifies JavaScript development and allows you to create highly interactive and responsive websites using the latest JavaScript and AJAX techniques. The author walks you through the jQuery API using a simple ASP.NET MVC application to highlight major topics, and shows how you can apply jQuery to your own applications. After learning the basics of using jQuery, you'll discover how easy it is to use within your own ASP.NET projects.  Whether you are using WebForms or the MVC framework, jQuery will gr

  9. GQL: Extending XQuery to Query GML Documents

    Institute of Scientific and Technical Information of China (English)

    GUAN Jihong; ZHU Fubao; ZHOU Jiaogen; NIU Liping

    2006-01-01

    GML is becoming the de facto standard for electronic data exchange among the applications of Web and distributed geographic information systems. However, the conventional query languages (e.g. SQL and its extended versions) are not suitable for direct querying and updating of GML documents. Even the effective approaches working well with XML could not guarantee good results when applied to GML documents. Although XQuery is a powerful standard query language for XML, it is not proposed for querying spatial features, which constitute the most important components in GML documents. We propose GQL, a query language specification to support spatial queries over GML documents by extending XQuery. The data model, algebra, and formal semantics as well as various spatial functions and operations of GQL are presented in detail.

  10. jQuery Tools UI Library

    CERN Document Server

    Libby, Alex

    2012-01-01

    A practical tutorial with powerful yet simple projects that are quick to implement. This book is aimed at developers who have prior jQuery knowledge, but may not have any prior experience with jQuery Tools. It is possible that they may have started with the basics of jQuery Tools, but want to learn more about how it can be used, as well as get ideas for future projects.

  11. Comparative analysis of online health queries originating from personal computers and smart devices on a consumer health information portal.

    Science.gov (United States)

    Jadhav, Ashutosh; Andrews, Donna; Fiksdal, Alexander; Kumbamu, Ashok; McCormick, Jennifer B; Misitano, Andrew; Nelsen, Laurie; Ryu, Euijung; Sheth, Amit; Wu, Stephen; Pathak, Jyotishman

    2014-07-04

    The number of people using the Internet and mobile/smart devices for health information seeking is increasing rapidly. Although the user experience for online health information seeking varies with the device used, for example, smart devices (SDs) like smartphones/tablets versus personal computers (PCs) like desktops/laptops, very few studies have investigated how online health information seeking behavior (OHISB) may differ by device. The objective of this study is to examine differences in OHISB between PCs and SDs through a comparative analysis of large-scale health search queries submitted through Web search engines from both types of devices. Using the Web analytics tool, IBM NetInsight OnDemand, and based on the type of devices used (PCs or SDs), we obtained the most frequent health search queries between June 2011 and May 2013 that were submitted on Web search engines and directed users to the Mayo Clinic's consumer health information website. We performed analyses on "Queries with considering repetition counts (QwR)" and "Queries without considering repetition counts (QwoR)". The dataset contains (1) 2.74 million and 3.94 million QwoR, respectively for PCs and SDs, and (2) more than 100 million QwR for both PCs and SDs. We analyzed structural properties of the queries (length of the search queries, usage of query operators and special characters in health queries), types of search queries (keyword-based, wh-questions, yes/no questions), categorization of the queries based on health categories and information mentioned in the queries (gender, age-groups, temporal references), misspellings in the health queries, and the linguistic structure of the health queries. Query strings used for health information searching via PCs and SDs differ by almost 50%. The most searched health categories are "Symptoms" (1 in 3 search queries), "Causes", and "Treatments & Drugs". The distribution of search queries for different health categories differs with the device used for

  12. Queries with Guarded Negation (full version)

    CERN Document Server

    Barany, Vince; Otto, Martin

    2012-01-01

    A well-established and fundamental insight in database theory is that negation (also known as complementation) tends to make queries difficult to process and difficult to reason about. Many basic problems are decidable and admit practical algorithms in the case of unions of conjunctive queries, but become difficult or even undecidable when queries are allowed to contain negation. Inspired by recent results in finite model theory, we consider a restricted form of negation, guarded negation. We introduce a fragment of SQL, called GN-SQL, as well as a fragment of Datalog with stratified negation, called GN-Datalog, that allow only guarded negation, and we show that these query languages are computationally well behaved, in terms of testing query containment, query evaluation, open-world query answering, and boundedness. GN-SQL and GN-Datalog subsume a number of well known query languages and constraint languages, such as unions of conjunctive queries, monadic Datalog, and frontier-guarded tgds. In addition, an a...

  13. Oceanographic ontology-based spatial knowledge query

    Institute of Scientific and Technical Information of China (English)

    2005-01-01

    The construction of oceanographic ontologies is fundamental to the "digital ocean". Therefore, on the basis of introduction of new concept of oceanographic ontology, an oceanographic ontology-based spatial knowledge query (OOBSKQ) method was proposed and developed. Because the method uses a natural language to describe query conditions and the query result is highly integrated knowledge,it can provide users with direct answers while hiding the complicated computation and reasoning processes, and achieves intelligent,automatic oceanographic spatial information query on the level of knowledge and semantics. A case study of resource and environmental application in bay has shown the implementation process of the method and its feasibility and usefulness.

  14. Querying moving objects detected by sensor networks

    CERN Document Server

    Bestehorn, Markus

    2012-01-01

    Declarative query interfaces to Sensor Networks (SN) have become a commodity. These interfaces allow access to SN deployed for collecting data using relational queries. However, SN are not confined to data collection, but may track object movement, e.g., wildlife observation or traffic monitoring. While rational approaches are well suited for data collection, research on ""Moving Object Databases"" (MOD) has shown that relational operators are unsuitable to express information needs on object movement, i.e., spatio-temporal queries. ""Querying Moving Objects Detected by Sensor Networks"" studi

  15. Topic Level Disambiguation for Weak Queries

    Directory of Open Access Journals (Sweden)

    Zhang, Hui

    2013-09-01

    Full Text Available Despite limited success, today's information retrieval (IR systems are not intelligent or reliable. IR systems return poor search results when users formulate their information needs into incomplete or ambiguous queries (i.e., weak queries. Therefore, one of the main challenges in modern IR research is to provide consistent results across all queries by improving the performance on weak queries. However, existing IR approaches such as query expansion are not overly effective because they make little effort to analyze and exploit the meanings of the queries. Furthermore, word sense disambiguation approaches, which rely on textual context, are ineffective against weak queries that are typically short. Motivated by the demand for a robust IR system that can consistently provide highly accurate results, the proposed study implemented a novel topic detection that leveraged both the language model and structural knowledge of Wikipedia and systematically evaluated the effect of query disambiguation and topic-based retrieval approaches on TREC collections. The results not only confirm the effectiveness of the proposed topic detection and topic-based retrieval approaches but also demonstrate that query disambiguation does not improve IR as expected.

  16. Effective Density Queries of Continuously Moving Objects

    DEFF Research Database (Denmark)

    Jensen, Christian Søndergaard; Lin, D.; Ooi, B.C.

    2006-01-01

    control system, we need to identify the places that are or would be affected by a traffic jam, and report this information to drivers so that they can choose a less congested route. As a naive way to solve the problem is prohibitively expensive, we first introduce a framework which makes the problem......In this paper, we study a newly emerging type of queries on moving objects - the density query. Basically, this query locates regions in the data space where the density of the objects is high. This type of queries is especially useful in Location Based Services (LBS). For example, in a traffic...

  17. Adding Query Privacy to Robust DHTs

    DEFF Research Database (Denmark)

    Backes, Michael; Goldberg, Ian; Kate, Aniket

    2011-01-01

    of obtaining query privacy over robust DHTs. Finally, we compare the performance of our privacy-preserving protocols with their more privacy-invasive counterparts. We observe that there is no increase in the message complexity and only a small overhead in the computational complexity....... intermediate peers that (help to) route the queries towards their destinations. In this paper, we satisfy this requirement by presenting an approach for providing privacy for the keys in DHT queries. We use the concept of oblivious transfer (OT) in communication over DHTs to preserve query privacy without...

  18. Database queries and constraints via lifting problems

    CERN Document Server

    Spivak, David I

    2012-01-01

    Previous work has shown a tight relationship between databases and categories. In the present paper we extend that connection to show that certain queries and constraints correspond to the algebro-topological notion of lifting problems. In our formulation, each so-called SPARQL graph pattern query corresponds to a lifting problem, and each solution to the query corresponds to a lift. We interpret constraints within the same formalism and then investigate some formal properties of queries and constraints, e.g. their behavior under data migration functors.

  19. Object-Extended OLAP Querying

    DEFF Research Database (Denmark)

    Pedersen, Torben Bach; Gu, Junmin; Shoshani, Arie

    2009-01-01

    On-line analytical processing (OLAP) systems based on a dimensional view of data have found widespread use in business applications and are being used increasingly in non-standard applications. These systems provide good performance and ease-of-use. However, the complex structures and relationshi...... with performance measurements that show that the approach is a viable alternative to a physically integrated data warehouse.......On-line analytical processing (OLAP) systems based on a dimensional view of data have found widespread use in business applications and are being used increasingly in non-standard applications. These systems provide good performance and ease-of-use. However, the complex structures and relationships...... inherent in data in non-standard applications are not accommodated well by OLAP systems. In contrast, object database systems are built to handle such complexity, but do not support OLAP-type querying well. This paper presents the concepts and techniques underlying a flexible, "multi-model" federated...

  20. Query Optimization for Deductive Databases

    Institute of Scientific and Technical Information of China (English)

    周傲英; 施伯乐

    1995-01-01

    A systematic,efficient compilation method for query evaluation of Deductive Databases (DeDB) is proposed in this paper.In order to eliminate redundancy and to minimize the potentially relevant facts,which are two key issues to the efficiency of a DeDB,the compilation process is decomposed into two phases.The first is the pre-compilation phase,which is responsible for the minimization of the potentially relevant facts.The second,which we refer to as the general compilation phase,is responsible for the elimination of redundancy.The rule/goal graph devised by J.D.Ullman is appropriately extended and used as a uniform formalism.Two general algorithms corresponding to the two phases respectively are described intuitively and formally.

  1. Object-Extended OLAP Querying

    DEFF Research Database (Denmark)

    Pedersen, Torben Bach; Gu, Junmin; Shoshani, Arie

    2009-01-01

    inherent in data in non-standard applications are not accommodated well by OLAP systems. In contrast, object database systems are built to handle such complexity, but do not support OLAP-type querying well. This paper presents the concepts and techniques underlying a flexible, "multi-model" federated...... system that enables OLAP users to exploit simultaneously the features of OLAP and object systems. The system allows data to be handled using the most appropriate data model and technology: OLAP systems for dimensional data and object database systems for more complex, general data. This allows data...... analysis on the OLAP data to be significantly enriched by the use of additional object data. Additionally, physical integration of the OLAP and the object data can be avoided. As a vehicle for demonstrating the capabilities of the system, a prototypical OLAP language is defined and extended to naturally...

  2. Twenty-Year Reunion

    Institute of Scientific and Technical Information of China (English)

    1996-01-01

    A massive earthquake measuring 7.8 on the Richter Scale struck the city of Tangshan at 3:42 a.m. on July 28, 1976. The city, which had history of well over 100 years and a population of over one million people, was left in ruins. The violent earthquake killed 242,469 people, with 164,851 people sustaining injuries, 2,652 children under 16 years of age left as orphans and 885 elderly people as widowers. Some 96 percent of the city’s buildings and houses collapsed, with a direct economic loss of 10 billion yuan. Earthquake tremors spread over a two million square kilometer area, with hundreds of millions people placed in peril by the natural disaster.The Tangshan earthquake set a record as the world’s strongest earthquake of the century. The people of Tangshan have set a new record for the rate of reconstruction accomplished over the past 20 years. The support of the People’s Liberation Army and people across the nation combined with the unyielding efforts of survivors led to the establishment of a new Ta

  3. Querying Business Process Models with VMQL

    DEFF Research Database (Denmark)

    Störrle, Harald; Acretoaie, Vlad

    2013-01-01

    The Visual Model Query Language (VMQL) has been invented with the objectives (1) to make it easier for modelers to query models effectively, and (2) to be universally applicable to all modeling languages. In previous work, we have applied VMQL to UML, and validated the first of these two claims. ...

  4. Path Minima Queries in Dynamic Weighted Trees

    DEFF Research Database (Denmark)

    Davoodi, Pooya; Brodal, Gerth Stølting; Satti, Srinivasa Rao

    2011-01-01

    In the path minima problem on a tree, each edge is assigned a weight and a query asks for the edge with minimum weight on a path between two nodes. For the dynamic version of the problem, where the edge weights can be updated, we give data structures that achieve optimal query time\\todo{what about...

  5. Meet Charles, big data query advisor

    NARCIS (Netherlands)

    Sellam, T.; Kersten, M.

    2013-01-01

    In scientific data management and business analytics, the most informative queries are a holy grail. Data collection becomes increasingly simpler, yet data exploration gets significantly harder. Exploratory querying is likely to return an empty or an overwhelming result set. On the other hand, data

  6. Meet Charles, big data query advisor

    NARCIS (Netherlands)

    Sellam, T.; Kersten, M.

    2013-01-01

    In scientific data management and business analytics, the most informative queries are a holy grail. Data collection becomes increasingly simpler, yet data exploration gets significantly harder. Exploratory querying is likely to return an empty or an overwhelming result set. On the other hand, data

  7. Quantum associative memory with improved distributed queries

    CERN Document Server

    Njafa, J -P Tchapet; Woafo, Paul

    2012-01-01

    The paper proposes an improved quantum associative algorithm with distributed query based on model proposed by Ezhov et al. We introduce two modifications of the query that optimized data retrieval of correct multi-patterns simultaneously for any rate of the number of the recognition pattern on the total patterns. Simulation results are given.

  8. Improving Web Search for Difficult Queries

    Science.gov (United States)

    Wang, Xuanhui

    2009-01-01

    Search engines have now become essential tools in all aspects of our life. Although a variety of information needs can be served very successfully, there are still a lot of queries that search engines can not answer very effectively and these queries always make users feel frustrated. Since it is quite often that users encounter such "difficult…

  9. Efficient caching for constrained skyline queries

    DEFF Research Database (Denmark)

    Mortensen, Michael Lind; Chester, Sean; Assent, Ira;

    2015-01-01

    Constrained skyline queries retrieve all points that optimize some user’s preferences subject to orthogonal range constraints, but at significant computational cost. This paper is the first to propose caching to improve constrained skyline query response time. Because arbitrary range constraints ...

  10. Exploring features for automatic identification of news queries through query logs

    Institute of Scientific and Technical Information of China (English)

    Xiaojuan; ZHANG; Jian; LI

    2014-01-01

    Purpose:Existing researches of predicting queries with news intents have tried to extract the classification features from external knowledge bases,this paper tries to present how to apply features extracted from query logs for automatic identification of news queries without using any external resources.Design/methodology/approach:First,we manually labeled 1,220 news queries from Sogou.com.Based on the analysis of these queries,we then identified three features of news queries in terms of query content,time of query occurrence and user click behavior.Afterwards,we used 12 effective features proposed in literature as baseline and conducted experiments based on the support vector machine(SVM)classifier.Finally,we compared the impacts of the features used in this paper on the identification of news queries.Findings:Compared with baseline features,the F-score has been improved from 0.6414 to0.8368 after the use of three newly-identified features,among which the burst point(bst)was the most effective while predicting news queries.In addition,query expression(qes)was more useful than query terms,and among the click behavior-based features,news URL was the most effective one.Research limitations:Analyses based on features extracted from query logs might lead to produce limited results.Instead of short queries,the segmentation tool used in this study has been more widely applied for long texts.Practical implications:The research will be helpful for general-purpose search engines to address search intents for news events.Originality/value:Our approach provides a new and different perspective in recognizing queries with news intent without such large news corpora as blogs or Twitter.

  11. Adding Query Privacy to Robust DHTs

    DEFF Research Database (Denmark)

    Backes, Michael; Goldberg, Ian; Kate, Aniket

    2011-01-01

    Interest in anonymous communication over distributed hash tables (DHTs) has increased in recent years. However, almost all known solutions solely aim at achieving sender or requestor anonymity in DHT queries. In many application scenarios, it is crucial that the queried key remains secret from...... intermediate peers that (help to) route the queries towards their destinations. In this paper, we satisfy this requirement by presenting an approach for providing privacy for the keys in DHT queries. We use the concept of oblivious transfer (OT) in communication over DHTs to preserve query privacy without...... compromising spam resistance. Although our OT-based approach can work over any DHT, we concentrate on communication over robust DHTs that can tolerate Byzantine faults and resist spam. We choose the best-known robust DHT construction, and employ an efficient OT protocol well-suited for achieving our goal...

  12. An Effective Information Retrieval for Ambiguous Query

    CERN Document Server

    Roul, R K

    2012-01-01

    Search engine returns thousands of web pages for a single user query, in which most of them are not relevant. In this context, effective information retrieval from the expanding web is a challenging task, in particular, if the query is ambiguous. The major question arises here is that how to get the relevant pages for an ambiguous query. We propose an approach for the effective result of an ambiguous query by forming community vector based on association concept of data minning using vector space model and the freedictionary. We develop clusters by computing the similarity between community vectors and document vectors formed from the extracted web pages by the search engine. We use Gensim package to implement the algorithm because of its simplicity and robust nature. Analysis shows that our approach is an effective way to form clusters for an ambiguous query.

  13. Ensuring Query Compatibility with Evolving XML Schemas

    CERN Document Server

    Genevès, Pierre; Quint, Vincent

    2008-01-01

    During the life cycle of an XML application, both schemas and queries may change from one version to another. Schema evolutions may affect query results and potentially the validity of produced data. Nowadays, a challenge is to assess and accommodate the impact of theses changes in rapidly evolving XML applications. This article proposes a logical framework and tool for verifying forward/backward compatibility issues involving schemas and queries. First, it allows analyzing relations between schemas. Second, it allows XML designers to identify queries that must be reformulated in order to produce the expected results across successive schema versions. Third, it allows examining more precisely the impact of schema changes over queries, therefore facilitating their reformulation.

  14. Performance of Point and Range Queries for In-memory Databases using Radix Trees on GPUs

    Energy Technology Data Exchange (ETDEWEB)

    Alam, Maksudul [ORNL; Yoginath, Srikanth B [ORNL; Perumalla, Kalyan S [ORNL

    2016-01-01

    In in-memory database systems augmented by hardware accelerators, accelerating the index searching operations can greatly increase the runtime performance of database queries. Recently, adaptive radix trees (ART) have been shown to provide very fast index search implementation on the CPU. Here, we focus on an accelerator-based implementation of ART. We present a detailed performance study of our GPU-based adaptive radix tree (GRT) implementation over a variety of key distributions, synthetic benchmarks, and actual keys from music and book data sets. The performance is also compared with other index-searching schemes on the GPU. GRT on modern GPUs achieves some of the highest rates of index searches reported in the literature. For point queries, a throughput of up to 106 million and 130 million lookups per second is achieved for sparse and dense keys, respectively. For range queries, GRT yields 600 million and 1000 million lookups per second for sparse and dense keys, respectively, on a large dataset of 64 million 32-bit keys.

  15. Querying Schemas With Access Restrictions

    CERN Document Server

    Benedikt, Michael; Ley, Clemens

    2012-01-01

    We study verification of systems whose transitions consist of accesses to a Web-based data-source. An access is a lookup on a relation within a relational database, fixing values for a set of positions in the relation. For example, a transition can represent access to a Web form, where the user is restricted to filling in values for a particular set of fields. We look at verifying properties of a schema describing the possible accesses of such a system. We present a language where one can describe the properties of an access path, and also specify additional restrictions on accesses that are enforced by the schema. Our main property language, AccLTL, is based on a first-order extension of linear-time temporal logic, interpreting access paths as sequences of relational structures. We also present a lower-level automaton model, Aautomata, which AccLTL specifications can compile into. We show that AccLTL and A-automata can express static analysis problems related to "querying with limited access patterns" that h...

  16. Query-By-Keywords (QBK): Query Formulation Using Semantics and Feedback

    Science.gov (United States)

    Telang, Aditya; Chakravarthy, Sharma; Li, Chengkai

    The staples of information retrieval have been querying and search, respectively, for structured and unstructured repositories. Processing queries over known, structured repositories (e.g., Databases) has been well-understood, and search has become ubiquitous when it comes to unstructured repositories (e.g., Web). Furthermore, searching structured repositories has been explored to a limited extent. However, there is not much work in querying unstructured sources. We argue that querying unstructured sources is the next step in performing focused retrievals. This paper proposed a new approach to generate queries from search-like inputs for unstructured repositories. Instead of burdening the user with schema details, we believe that pre-discovered semantic information in the form of taxonomies, relationship of keywords based on context, and attribute & operator compatibility can be used to generate query skeletons. Furthermore, progressive feedback from users can be used to improve the accuracy of query skeletons generated.

  17. Twenty lectures on thermodynamics

    CERN Document Server

    Buchdahl, H A

    2013-01-01

    Twenty Lectures on Thermodynamics is a course of lectures, parts of which the author has given various times over the last few years. The book gives the readers a bird's eye view of phenomenological and statistical thermodynamics. The book covers many areas in thermodynamics such as states and transition; adiabatic isolation; irreversibility; the first, second, third and Zeroth laws of thermodynamics; entropy and entropy law; the idea of the application of thermodynamics; pseudo-states; the quantum-static al canonical and grand canonical ensembles; and semi-classical gaseous systems. The text

  18. Spatial information semantic query based on SPARQL

    Science.gov (United States)

    Xiao, Zhifeng; Huang, Lei; Zhai, Xiaofang

    2009-10-01

    How can the efficiency of spatial information inquiries be enhanced in today's fast-growing information age? We are rich in geospatial data but poor in up-to-date geospatial information and knowledge that are ready to be accessed by public users. This paper adopts an approach for querying spatial semantic by building an Web Ontology language(OWL) format ontology and introducing SPARQL Protocol and RDF Query Language(SPARQL) to search spatial semantic relations. It is important to establish spatial semantics that support for effective spatial reasoning for performing semantic query. Compared to earlier keyword-based and information retrieval techniques that rely on syntax, we use semantic approaches in our spatial queries system. Semantic approaches need to be developed by ontology, so we use OWL to describe spatial information extracted by the large-scale map of Wuhan. Spatial information expressed by ontology with formal semantics is available to machines for processing and to people for understanding. The approach is illustrated by introducing a case study for using SPARQL to query geo-spatial ontology instances of Wuhan. The paper shows that making use of SPARQL to search OWL ontology instances can ensure the result's accuracy and applicability. The result also indicates constructing a geo-spatial semantic query system has positive efforts on forming spatial query and retrieval.

  19. Query Optimizations over Decentralized RDF Graphs

    KAUST Repository

    Abdelaziz, Ibrahim

    2017-05-18

    Applications in life sciences, decentralized social networks, Internet of Things, and statistical linked dataspaces integrate data from multiple decentralized RDF graphs via SPARQL queries. Several approaches have been proposed to optimize query processing over a small number of heterogeneous data sources by utilizing schema information. In the case of schema similarity and interlinks among sources, these approaches cause unnecessary data retrieval and communication, leading to poor scalability and response time. This paper addresses these limitations and presents Lusail, a system for scalable and efficient SPARQL query processing over decentralized graphs. Lusail achieves scalability and low query response time through various optimizations at compile and run times. At compile time, we use a novel locality-aware query decomposition technique that maximizes the number of query triple patterns sent together to a source based on the actual location of the instances satisfying these triple patterns. At run time, we use selectivity-awareness and parallel query execution to reduce network latency and to increase parallelism by delaying the execution of subqueries expected to return large results. We evaluate Lusail using real and synthetic benchmarks, with data sizes up to billions of triples on an in-house cluster and a public cloud. We show that Lusail outperforms state-of-the-art systems by orders of magnitude in terms of scalability and response time.

  20. Earth: 15 Million Years Ago

    CERN Document Server

    Mizushima, Masataka

    2008-01-01

    In Einstein's general relativity theory the metric component gxx in the direction of motion (x-direction) of the sun deviates from unity due to a tensor potential caused by the black hole existing around the center of the galaxy. Because the solar system is orbiting around the galactic center at 200 km/s, the theory shows that the Newtonian gravitational potential due to the sun is not quite radial. At the present time, the ecliptic plane is almost perpendicular to the galactic plane, consistent with this modification of the Newtonian gravitational force. The ecliptic plane is assumed to maintain this orientation in the galactic space as it orbits around the galactic center, but the rotational angular momentum of the earth around its own axis can be assumed to be conserved. The earth is between the sun and the galactic center at the summer solstice all the time. As a consequence, the rotational axis of the earth would be parallel to the axis of the orbital rotation of the earth 15 million years ago, if the so...

  1. Complexity of temporal query abduction in DL-Lite

    CSIR Research Space (South Africa)

    Klarman, S

    2014-07-01

    Full Text Available and Temporal Query Language, based on the combination of LTL with conjunctive queries. In this defined setting, we study the complexity of temporal query abduction, assuming different restrictions on the problem and minimality criteria for abductive solutions...

  2. Query Through Heterogeneous Ontologies Using Association Matrix

    Institute of Scientific and Technical Information of China (English)

    KANG Da-zhou; XU Bao-wen; LU Jian-jiang; WANG Peng; LI Yan-hui

    2004-01-01

    This paper introduces the definition and calculation of the association matrix between ontologies.It uses the association matrix to describe the relations between concepts in different ontologies and uses concept vectors to represent queries; then computes the vectors with the association matrix in order to rewrite queries.This paper proposes a simple method of querying through heterogeneous Ontology using association matrix.This method is based on the correctness of approximate information filtering theory; and it is simple to be implemented and expected to run quite fast.

  3. Instant MDX queries for SQL Server 2012

    CERN Document Server

    Emond, Nicholas

    2013-01-01

    Get to grips with a new technology, understand what it is and what it can do for you, and then get to work with the most important features and tasks. This short, focused guide is a great way to get stated with writing MDX queries. New developers can use this book as a reference for how to use functions and the syntax of a query as well as how to use Calculated Members and Named Sets.This book is great for new developers who want to learn the MDX query language from scratch and install SQL Server 2012 with Analysis Services

  4. Relative aggregation operator in database fuzzy querying

    Directory of Open Access Journals (Sweden)

    Luminita DUMITRIU

    2005-12-01

    Full Text Available Fuzzy selection criteria querying relational databases include vague terms; they usually refer linguistic values form the attribute linguistic domains, defined as fuzzy sets. Generally, when a vague query is processed, the definitions of vague terms must already exist in a knowledge base. But there are also cases when vague terms must be dynamically defined, when a particular operation is used to aggregate simple criteria in a complex selection. The paper presents a new aggregation operator and the corresponding algorithm to evaluate the fuzzy query.

  5. Provenance Storage, Querying, and Visualization in PBase

    Energy Technology Data Exchange (ETDEWEB)

    Kianmajd, Parisa [University of California, Davis; Ludascher, Bertram [University of California, Davis; Missier, Paolo [Newcastle University, UK; Chirigati, Fernando [New York University; Wei, Yaxing [ORNL; Koop, David [New York University; Dey, Saumen [University of California, Davis

    2015-01-01

    We present PBase, a repository for scientific workflows and their corresponding provenance information that facilitates the sharing of experiments among the scientific community. PBase is interoperable since it uses ProvONE, a standard provenance model for scientific workflows. Workflows and traces are stored in RDF, and with the support of SPARQL and the tree cover encoding, the repository provides a scalable infrastructure for querying the provenance data. Furthermore, through its user interface, it is possible to: visualize workflows and execution traces; visualize reachability relations within these traces; issue SPARQL queries; and visualize query results.

  6. Query Load Balancing For Visible Object Extraction

    DEFF Research Database (Denmark)

    Bukauskas, Linas; Bøhlen, Michael Hanspeter

    2004-01-01

    Interactive visual data explorations impose rigid real-time requirements on the extraction of visible objects. Often these requirements are met by deploying powerful hardware that maintains the entire data set in huge main memory structures. In this paper we propose an approach that retrieves...... objects along the path. The visible objects are retrieved incrementally, and it is possible to precisely control the query load and the number of retrieved objects. The minimal distance path method issues frequent queries and retrieves the lowest possible number of objects at each query point. The end...

  7. Evaluating Trajectory Queries over Imprecise Location Data

    DEFF Research Database (Denmark)

    Xie, Scott, Xike; Cheng, Reynold; Yiu, Man Lung

    2012-01-01

    Trajectory queries, which retrieve nearby objects for every point of a given route, can be used to identify alerts of potential threats along a vessel route, or monitor the adjacent rescuers to a travel path. However, the locations of these objects (e.g., threats, succours) may not be precisely......, the query is quite time-consuming, since all the points on the trajectory are considered. In this paper, we study how to efficiently evaluate trajectory queries over imprecise location data, by proposing a new concept called the u-bisector. In general, the u-bisector is an extension of bisector to handle...

  8. Federated query processing for the semantic web

    CERN Document Server

    Buil-Aranda, C

    2014-01-01

    During the last years, the amount of RDF data has increased exponentially over the Web, exposed via SPARQL endpoints. These SPARQL endpoints allow users to direct SPARQL queries to the RDF data. Federated SPARQL query processing allows to query several of these RDF databases as if they were a single one, integrating the results from all of them. This is a key concept in the Web of Data and it is also a hot topic in the community. Besides of that, the W3C SPARQL-WG has standardized it in the new Recommendation SPARQL 1.1.This book provides a formalisation of the W3C proposed recommendation. Thi

  9. Responsive web design with jQuery

    CERN Document Server

    Carlos, Gilberto

    2013-01-01

    Responsive Web Design with jQuery follows a standard tutorial-based approach, covering various aspects of responsive web design by building a comprehensive website.""Responsive Web Design with jQuery"" is aimed at web designers who are interested in building device-agnostic websites. You should have a grasp of standard HTML, CSS, and JavaScript development, and have a familiarity with graphic design. Some exposure to jQuery and HTML5 will be beneficial but isn't essential.

  10. OntoQuery: easy-to-use web-based OWL querying.

    Science.gov (United States)

    Tudose, Ilinca; Hastings, Janna; Muthukrishnan, Venkatesh; Owen, Gareth; Turner, Steve; Dekker, Adriano; Kale, Namrata; Ennis, Marcus; Steinbeck, Christoph

    2013-11-15

    The Web Ontology Language (OWL) provides a sophisticated language for building complex domain ontologies and is widely used in bio-ontologies such as the Gene Ontology. The Protégé-OWL ontology editing tool provides a query facility that allows composition and execution of queries with the human-readable Manchester OWL syntax, with syntax checking and entity label lookup. No equivalent query facility such as the Protégé Description Logics (DL) query yet exists in web form. However, many users interact with bio-ontologies such as chemical entities of biological interest and the Gene Ontology using their online Web sites, within which DL-based querying functionality is not available. To address this gap, we introduce the OntoQuery web-based query utility.  The source code for this implementation together with instructions for installation is available at http://github.com/IlincaTudose/OntoQuery. OntoQuery software is fully compatible with all OWL-based ontologies and is available for download (CC-0 license). The ChEBI installation, ChEBI OntoQuery, is available at http://www.ebi.ac.uk/chebi/tools/ontoquery. hastings@ebi.ac.uk.

  11. STBase: One Million Species Trees for Comparative Biology

    Science.gov (United States)

    McMahon, Michelle M.; Deepak, Akshay; Fernández-Baca, David; Boss, Darren; Sanderson, Michael J.

    2015-01-01

    Comprehensively sampled phylogenetic trees provide the most compelling foundations for strong inferences in comparative evolutionary biology. Mismatches are common, however, between the taxa for which comparative data are available and the taxa sampled by published phylogenetic analyses. Moreover, many published phylogenies are gene trees, which cannot always be adapted immediately for species level comparisons because of discordance, gene duplication, and other confounding biological processes. A new database, STBase, lets comparative biologists quickly retrieve species level phylogenetic hypotheses in response to a query list of species names. The database consists of 1 million single- and multi-locus data sets, each with a confidence set of 1000 putative species trees, computed from GenBank sequence data for 413,000 eukaryotic taxa. Two bodies of theoretical work are leveraged to aid in the assembly of multi-locus concatenated data sets for species tree construction. First, multiply labeled gene trees are pruned to conflict-free singly-labeled species-level trees that can be combined between loci. Second, impacts of missing data in multi-locus data sets are ameliorated by assembling only decisive data sets. Data sets overlapping with the user’s query are ranked using a scheme that depends on user-provided weights for tree quality and for taxonomic overlap of the tree with the query. Retrieval times are independent of the size of the database, typically a few seconds. Tree quality is assessed by a real-time evaluation of bootstrap support on just the overlapping subtree. Associated sequence alignments, tree files and metadata can be downloaded for subsequent analysis. STBase provides a tool for comparative biologists interested in exploiting the most relevant sequence data available for the taxa of interest. It may also serve as a prototype for future species tree oriented databases and as a resource for assembly of larger species phylogenies from precomputed

  12. STBase: one million species trees for comparative biology.

    Directory of Open Access Journals (Sweden)

    Michelle M McMahon

    Full Text Available Comprehensively sampled phylogenetic trees provide the most compelling foundations for strong inferences in comparative evolutionary biology. Mismatches are common, however, between the taxa for which comparative data are available and the taxa sampled by published phylogenetic analyses. Moreover, many published phylogenies are gene trees, which cannot always be adapted immediately for species level comparisons because of discordance, gene duplication, and other confounding biological processes. A new database, STBase, lets comparative biologists quickly retrieve species level phylogenetic hypotheses in response to a query list of species names. The database consists of 1 million single- and multi-locus data sets, each with a confidence set of 1000 putative species trees, computed from GenBank sequence data for 413,000 eukaryotic taxa. Two bodies of theoretical work are leveraged to aid in the assembly of multi-locus concatenated data sets for species tree construction. First, multiply labeled gene trees are pruned to conflict-free singly-labeled species-level trees that can be combined between loci. Second, impacts of missing data in multi-locus data sets are ameliorated by assembling only decisive data sets. Data sets overlapping with the user's query are ranked using a scheme that depends on user-provided weights for tree quality and for taxonomic overlap of the tree with the query. Retrieval times are independent of the size of the database, typically a few seconds. Tree quality is assessed by a real-time evaluation of bootstrap support on just the overlapping subtree. Associated sequence alignments, tree files and metadata can be downloaded for subsequent analysis. STBase provides a tool for comparative biologists interested in exploiting the most relevant sequence data available for the taxa of interest. It may also serve as a prototype for future species tree oriented databases and as a resource for assembly of larger species phylogenies

  13. FastBLAST: homology relationships for millions of proteins.

    Directory of Open Access Journals (Sweden)

    Morgan N Price

    Full Text Available BACKGROUND: All-versus-all BLAST, which searches for homologous pairs of sequences in a database of proteins, is used to identify potential orthologs, to find new protein families, and to provide rapid access to these homology relationships. As DNA sequencing accelerates and data sets grow, all-versus-all BLAST has become computationally demanding. METHODOLOGY/PRINCIPAL FINDINGS: We present FastBLAST, a heuristic replacement for all-versus-all BLAST that relies on alignments of proteins to known families, obtained from tools such as PSI-BLAST and HMMer. FastBLAST avoids most of the work of all-versus-all BLAST by taking advantage of these alignments and by clustering similar sequences. FastBLAST runs in two stages: the first stage identifies additional families and aligns them, and the second stage quickly identifies the homologs of a query sequence, based on the alignments of the families, before generating pairwise alignments. On 6.53 million proteins from the non-redundant Genbank database ("NR", FastBLAST identifies new families 25 times faster than all-versus-all BLAST. Once the first stage is completed, FastBLAST identifies homologs for the average query in less than 5 seconds (8.6 times faster than BLAST and gives nearly identical results. For hits above 70 bits, FastBLAST identifies 98% of the top 3,250 hits per query. CONCLUSIONS/SIGNIFICANCE: FastBLAST enables research groups that do not have supercomputers to analyze large protein sequence data sets. FastBLAST is open source software and is available at http://microbesonline.org/fastblast.

  14. Twenty Years of KSHV

    Directory of Open Access Journals (Sweden)

    Yuan Chang

    2014-11-01

    Full Text Available Twenty years ago, Kaposi’s sarcoma (KS was the oncologic counterpart to Winston Churchill’s Russia: a riddle, wrapped in a mystery, inside an enigma. First described by Moritz Kaposi in 1872, who reported it to be an aggressive skin tumor, KS became known over the next century as a slow-growing tumor of elderly men—in fact, most KS patients were expected to die with the tumor rather than from it. Nevertheless, the course and manifestations of the disease varied widely in different clinical contexts. The puzzle of KS came to the forefront as a harbinger of the AIDS epidemic. The articles in this issue of Viruses recount progress made in understanding Kaposi’s sarcoma herpesvirus (KSHV since its initial description in 1994.

  15. Schedule Sales Query Report Generation System

    Data.gov (United States)

    General Services Administration — Schedule Sales Query presents sales volume figures as reported to GSA by contractors. The reports are generated as quarterly reports for the current year and the...

  16. Clean Air Markets - Compliance Query Wizard

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Compliance Query Wizard is part of a suite of Clean Air Markets-related tools that are accessible at http://ampd.epa.gov/ampd/. The Compliance module provides...

  17. Business information query expansion through semantic network

    Science.gov (United States)

    Gong, Zhiguo; Muyeba, Maybin; Guo, Jingzhi

    2010-02-01

    In this article, we propose a method for business information query expansions. In our approach, hypernym/hyponymy and synonym relations in WordNet are used as the basic expansion rules. Then we use WordNet Lexical Chains and WordNet semantic similarity to assign terms in the same query into different groups with respect to their semantic similarities. For each group, we expand the highest terms in the WordNet hierarchies with hypernym and synonym, the lowest terms with hyponym and synonym and all other terms with only synonym. In this way, the contradictory caused by full expansion can be well controlled. Furthermore, we use collection-related term semantic network to further improve the expansion performance. And our experiment reveals that our solution for query expansion can improve the query performance dramatically.

  18. Mining tree-query associations in graphs

    CERN Document Server

    Hoekx, Eveline

    2010-01-01

    New applications of data mining, such as in biology, bioinformatics, or sociology, are faced with large datasetsstructured as graphs. We introduce a novel class of tree-shapedpatterns called tree queries, and present algorithms for miningtree queries and tree-query associations in a large data graph. Novel about our class of patterns is that they can containconstants, and can contain existential nodes which are not counted when determining the number of occurrences of the patternin the data graph. Our algorithms have a number of provableoptimality properties, which are based on the theory of conjunctive database queries. We propose a practical, database-oriented implementation in SQL, and show that the approach works in practice through experiments on data about food webs, protein interactions, and citation analysis.

  19. Medical Expenditure Panel Survey (MEPS) Query Tool

    Data.gov (United States)

    U.S. Department of Health & Human Services — MEPSnet HC Query Tool MEPSnet/Household Component provides easy access to nationally representative statistics of health care use, expenditures, sources of payment,...

  20. Range Query Processing in Multidisk Systems

    Institute of Scientific and Technical Information of China (English)

    李建中

    1992-01-01

    In order to reduce the disk access time,a database can be stored on several simultaneously accessible disks.In this paper,we are concerned with the dynamic d-attribute database allocation problem for range queries,An allocation method,called coordinate moule allocation method,is proposed to allocate data in a d-attribute database among disks so that the maximum disk accessing concurrency can be achieved for range queries.Our analysis and experiments show that the method achieves the optimum or near-optimum parallelism for range queries.The paper offers the conditions under which the method is optimal .The worst case bounds of the performance of the method are also given.In addition,the parallel algorithm of processing range queries in described at the end of the paper.The method has been used in the statistic and scientific database management system whic is being designed by us.

  1. Efficient Probabilistic Inference with Partial Ranking Queries

    CERN Document Server

    Huang, Jonathan; Guestrin, Carlos E

    2012-01-01

    Distributions over rankings are used to model data in various settings such as preference analysis and political elections. The factorial size of the space of rankings, however, typically forces one to make structural assumptions, such as smoothness, sparsity, or probabilistic independence about these underlying distributions. We approach the modeling problem from the computational principle that one should make structural assumptions which allow for efficient calculation of typical probabilistic queries. For ranking models, "typical" queries predominantly take the form of partial ranking queries (e.g., given a user's top-k favorite movies, what are his preferences over remaining movies?). In this paper, we argue that riffled independence factorizations proposed in recent literature [7, 8] are a natural structural assumption for ranking distributions, allowing for particularly efficient processing of partial ranking queries.

  2. Mobile Information Access with Spoken Query Answering

    DEFF Research Database (Denmark)

    Brøndsted, Tom; Larsen, Henrik Legind; Larsen, Lars Bo

    2006-01-01

    This paper addresses the problem of information and service accessibility in mobile devices with limited resources. A solution is developed and tested through a prototype that applies state-of-the-art Distributed Speech Recognition (DSR) and knowledge-based Information Retrieval (IR) processing...... for spoken query answering. For the DSR part, a configurable DSR system is implemented on the basis of the ETSI-DSR advanced front-end and the SPHINX IV recognizer. For the knowledge-based IR part, a distributed system solution is developed for fast retrieval of the most relevant documents, with a text...... window focused over the part which most likely contains an answer to the query. The two systems are integrated into a full spoken query answering system. The prototype can answer queries and questions within the chosen football (soccer) test domain, but the system has the flexibility for being ported...

  3. Querying temporal databases via OWL 2 QL

    CSIR Research Space (South Africa)

    Klarman, S

    2014-06-01

    Full Text Available SQL:2011, the most recently adopted version of the SQL query language, has unprecedentedly standardized the representation of temporal data in relational databases. Following the successful paradigm of ontology-based data access, we develop a...

  4. Search Result Diversification Based on Query Facets

    Institute of Scientific and Technical Information of China (English)

    胡莎; 窦志成; 王晓捷; 继荣

    2015-01-01

    In search engines, different users may search for different information by issuing the same query. To satisfy more users with limited search results, search result diversification re-ranks the results to cover as many user intents as possible. Most existing intent-aware diversification algorithms recognize user intents as subtopics, each of which is usually a word, a phrase, or a piece of description. In this paper, we leverage query facets to understand user intents in diversification, where each facet contains a group of words or phrases that explain an underlying intent of a query. We generate subtopics based on query facets and propose faceted diversification approaches. Experimental results on the public TREC 2009 dataset show that our faceted approaches outperform state-of-the-art diversification models.

  5. A Query Language for Formal Mathematical Libraries

    CERN Document Server

    Rabe, Florian

    2012-01-01

    One of the most promising applications of mathematical knowledge management is search: Even if we restrict attention to the tiny fragment of mathematics that has been formalized, the amount exceeds the comprehension of an individual human. Based on the generic representation language MMT, we introduce the mathematical query language QMT: It combines simplicity, expressivity, and scalability while avoiding a commitment to a particular logical formalism. QMT can integrate various search paradigms such as unification, semantic web, or XQuery style queries, and QMT queries can span different mathematical libraries. We have implemented QMT as a part of the MMT API. This combination provides a scalable indexing and query engine that can be readily applied to any library of mathematical knowledge. While our focus here is on libraries that are available in a content markup language, QMT naturally extends to presentation and narration markup languages.

  6. Clean Air Markets - Allowances Query Wizard

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Allowances Query Wizard is part of a suite of Clean Air Markets-related tools that are accessible at http://camddataandmaps.epa.gov/gdm/index.cfm. The Allowances...

  7. Evaluating SPARQL queries on massive RDF datasets

    KAUST Repository

    Al-Harbi, Razen

    2015-08-01

    Distributed RDF systems partition data across multiple computer nodes. Partitioning is typically based on heuristics that minimize inter-node communication and it is performed in an initial, data pre-processing phase. Therefore, the resulting partitions are static and do not adapt to changes in the query workload; as a result, existing systems are unable to consistently avoid communication for queries that are not favored by the initial data partitioning. Furthermore, for very large RDF knowledge bases, the partitioning phase becomes prohibitively expensive, leading to high startup costs. In this paper, we propose AdHash, a distributed RDF system which addresses the shortcomings of previous work. First, AdHash initially applies lightweight hash partitioning, which drastically minimizes the startup cost, while favoring the parallel processing of join patterns on subjects, without any data communication. Using a locality-aware planner, queries that cannot be processed in parallel are evaluated with minimal communication. Second, AdHash monitors the data access patterns and adapts dynamically to the query load by incrementally redistributing and replicating frequently accessed data. As a result, the communication cost for future queries is drastically reduced or even eliminated. Our experiments with synthetic and real data verify that AdHash (i) starts faster than all existing systems, (ii) processes thousands of queries before other systems become online, and (iii) gracefully adapts to the query load, being able to evaluate queries on billion-scale RDF data in sub-seconds. In this demonstration, audience can use a graphical interface of AdHash to verify its performance superiority compared to state-of-the-art distributed RDF systems.

  8. Nearest Neighbor Queries in Road Networks

    DEFF Research Database (Denmark)

    Jensen, Christian Søndergaard; Kolar, Jan; Pedersen, Torben Bach

    2003-01-01

    With wireless communications and geo-positioning being widely available, it becomes possible to offer new e-services that provide mobile users with information about other mobile objects. This paper concerns active, ordered k-nearest neighbor queries for query and data objects that are moving...... for the nearest neighbor search in the prototype is presented in detail. In addition, the paper reports on results from experiments with the prototype system....

  9. Processing keyword queries under access limitations

    OpenAIRE

    Calì, Andrea; Martinenghi, D.; Torlone, R.

    2015-01-01

    The Deep Web is constituted by data accessible through Web pages, but not readily indexable by search engines, as they are returned in dynamic pages. In this paper we propose a framework for accessing Deep Web sources, represented as relational tables with so-called access limitations, with keyword-based queries. We formalize the notion of optimal answer and propose methods for query processing. To the best of our knowledge, ours is the first systematic approach to keyword search in such cont...

  10. Managing and querying whole slide images

    Science.gov (United States)

    Wang, Fusheng; Oh, Tae W.; Vergara-Niedermayr, Cristobal; Kurc, Tahsin; Saltz, Joel

    2012-02-01

    High-resolution pathology images provide rich information about the morphological and functional characteristics of biological systems, and are transforming the field of pathology into a new era. To facilitate the use of digital pathology imaging for biomedical research and clinical diagnosis, it is essential to manage and query both whole slide images (WSI) and analytical results generated from images, such as annotations made by humans and computed features and classifications made by computer algorithms. There are unique requirements on modeling, managing and querying whole slide images, including compatibility with standards, scalability, support of image queries at multiple granularities, and support of integrated queries between images and derived results from the images. In this paper, we present our work on developing the Pathology Image Database System (PIDB), which is a standard oriented image database to support retrieval of images, tiles, regions and analytical results, image visualization and experiment management through a unified interface and architecture. The system is deployed for managing and querying whole slide images for In Silico brain tumor studies at Emory University. PIDB is generic and open source, and can be easily used to support other biomedical research projects. It has the potential to be integrated into a Picture Archiving and Communications System (PACS) with powerful query capabilities to support pathology imaging.

  11. Implementing Graph Pattern Queries on a Relational Database

    Energy Technology Data Exchange (ETDEWEB)

    Kaplan, I L; Abdulla, G M; Brugger, S T; Kohn, S R

    2007-12-26

    When a graph database is implemented on top of a relational database, queries in the graph query language are translated into relational SQL queries. Graph pattern queries are an important feature of a graph query language. Translating graph pattern queries into single SQL statements results in very poor query performance. By taking into account the pattern query structure and generating multiple SQL statements, pattern query performance can be dramatically improved. The performance problems encountered with the single SQL statements generated for pattern queries reflects a problem in the SQL query planner and optimizer. Addressing this problem would allow relational databases to better support semantic graph databases. Relational database systems that provide good support for graph databases may also be more flexible platforms for data warehouses.

  12. k-Nearest Neighbor Query Processing Algorithms for a Query Region in Road Networks

    Institute of Scientific and Technical Information of China (English)

    Hyeong-Il Kim; Jae-Woo Chang

    2013-01-01

    Recent development of wireless communication technologies and the popularity of smart phones are making location-based services (LBS) popular.However,requesting queries to LBS servers with users' exact locations may threat the privacy of users.Therefore,there have been many researches on generating a cloaked query region for user privacy protection.Consequently,an efficient query processing algorithm for a query region is required.So,in this paper,we propose k-nearest neighbor query (k-NN) processing algorithms for a query region in road networks.To efficiently retrieve k-NN points of interest (POIs),we make use of the Island index.We also propose a method that generates an adaptive Island index to improve the query processing performance and storage usage.Finally,we show by our performance analysis that our k-NN query processing algorithms outperform the existing k-Range Nearest Neighbor (kRNN) algorithm in terms of network expansion cost and query processing time.

  13. CDC Allocates $184 Million for Zika Protection

    Science.gov (United States)

    ... fullstory_162694.html CDC Allocates $184 Million for Zika Protection Funds are earmarked for states, territories, local ... million has been earmarked to protect Americans against Zika virus infection, the U.S. Centers for Disease Control ...

  14. An Efficient Query Rewriting Approach for Web Cached Data Management

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    With the internet development, querying data on the Web is an attention problem of involving information from distributed, and often dynamically, related Web sources. Basically, some sub-queries can be effectively cached from previous queries or materialized views in order to achieve a better query performance based on the notion of rewriting queries. In this paper, we propose a novel query-rewriting model, called Hierarchical Query Tree, for representing Web queries. Hierarchical Query Tree is a labeled tree that is suitable for representing the inherent hierarchy feature of data on the Web. Based on Hierarchical Query Tree, we use case-based approach to determine what the query results should be. The definitions of queries and query results are both represented as labeled trees. Thus, we can use the same model for representing cases and the medium query results can also be dynamically updated by the user queries. We show that our case-based method can be used to answer a new query based on the combination of previous queries, including changes of requirements and various information sources.

  15. Top at Twenty

    CERN Document Server

    2015-01-01

    The "Top at Twenty" workshop is dedicated to the celebration of 20 years since the top quark discovery at Fermilab in 1995. Speakers from all experiments capable of studying top quark, ATLAS, CDF, CMS and DZero, will present the most recent results of the top quark studies based on Run II of the Tevatron and Run I of the LHC. Reviews of such fundamental measurements as mass of the top quark, its spin, charge and production properties are planned with some of them orders of magnitude better in precision in comparison with original CDF and DZero papers announcing the top quark discovery. Measurements of top quark production and decay that illuminate the nature of the Higgs boson and seek new phenomena will be presented. Theoretical talks on how the top quark fits into the Standard Model and its potential extensions will also be presented. This workshop will complement the yearly Top Workshop which is held in September and will benefit from many new results expected to be presented at winter conferences in 2015...

  16. An alternative database approach for management of SNOMED CT and improved patient data queries.

    Science.gov (United States)

    Campbell, W Scott; Pedersen, Jay; McClay, James C; Rao, Praveen; Bastola, Dhundy; Campbell, James R

    2015-10-01

    SNOMED CT is the international lingua franca of terminologies for human health. Based in Description Logics (DL), the terminology enables data queries that incorporate inferences between data elements, as well as, those relationships that are explicitly stated. However, the ontologic and polyhierarchical nature of the SNOMED CT concept model make it difficult to implement in its entirety within electronic health record systems that largely employ object oriented or relational database architectures. The result is a reduction of data richness, limitations of query capability and increased systems overhead. The hypothesis of this research was that a graph database (graph DB) architecture using SNOMED CT as the basis for the data model and subsequently modeling patient data upon the semantic core of SNOMED CT could exploit the full value of the terminology to enrich and support advanced data querying capability of patient data sets. The hypothesis was tested by instantiating a graph DB with the fully classified SNOMED CT concept model. The graph DB instance was tested for integrity by calculating the transitive closure table for the SNOMED CT hierarchy and comparing the results with transitive closure tables created using current, validated methods. The graph DB was then populated with 461,171 anonymized patient record fragments and over 2.1 million associated SNOMED CT clinical findings. Queries, including concept negation and disjunction, were then run against the graph database and an enterprise Oracle relational database (RDBMS) of the same patient data sets. The graph DB was then populated with laboratory data encoded using LOINC, as well as, medication data encoded with RxNorm and complex queries performed using LOINC, RxNorm and SNOMED CT to identify uniquely described patient populations. A graph database instance was successfully created for two international releases of SNOMED CT and two US SNOMED CT editions. Transitive closure tables and descriptive

  17. Distributed Top-k Queries in E-commerce Environment

    Institute of Scientific and Technical Information of China (English)

    JiangZhan; YiqingSong; HaixiaZhang

    2004-01-01

    This paper focus on how to make distributed top-k query in e-commerce environment through web service. We first give the query process in such environment, then we present an algorithms for processing such queries, which based on the query model we defined. Experimental results show that the algorithms is efficient.

  18. Using Bitmap Indexing Technology for Combined Numerical and TextQueries

    Energy Technology Data Exchange (ETDEWEB)

    Stockinger, Kurt; Cieslewicz, John; Wu, Kesheng; Rotem, Doron; Shoshani, Arie

    2006-10-16

    In this paper, we describe a strategy of using compressedbitmap indices to speed up queries on both numerical data and textdocuments. By using an efficient compression algorithm, these compressedbitmap indices are compact even for indices with millions of distinctterms. Moreover, bitmap indices can be used very efficiently to answerBoolean queries over text documents involving multiple query terms.Existing inverted indices for text searches are usually inefficient forcorpora with a very large number of terms as well as for queriesinvolving a large number of hits. We demonstrate that our compressedbitmap index technology overcomes both of those short-comings. In aperformance comparison against a commonly used database system, ourindices answer queries 30 times faster on average. To provide full SQLsupport, we integrated our indexing software, called FastBit, withMonetDB. The integrated system MonetDB/FastBit provides not onlyefficient searches on a single table as FastBit does, but also answersjoin queries efficiently. Furthermore, MonetDB/FastBit also provides avery efficient retrieval mechanism of result records.

  19. Fast Discovering Frequent Patterns for Incremental XML Queries

    Institute of Scientific and Technical Information of China (English)

    PENG Dun-lu; QIU Yang

    2004-01-01

    It is nontrivial to maintain such discovered frequent query patterns in real XML-DBMS because the transaction database of queries may allow frequent updates and such updates may not only invalidate some existing frequent query patterns but also generate some new frequent query patterns.In this paper, two incremental updating algorithms, FUXQMiner and FUFXQMiner, are proposed for efficient maintenance of discovered frequent query patterns and generation the new frequent query patterns when new XML queries are added into the database.Experimental results from our implementation show that the proposed algorithms have good performance.

  20. Implementation of Quantum Private Queries Using Nuclear Magnetic Resonance

    Institute of Scientific and Technical Information of China (English)

    WANG Chuan; HAO Liang; ZHAO Lian-Jie

    2011-01-01

    @@ We present a modified protocol for the realization of a quantum private query process on a classical database.Using one-qubit query and CNOT operation,the query process can be realized in a two-mode database.In the query process,the data privacy is preserved as the sender would not reveal any information about the database besides her query information,and the database provider cannot retain any information about the query.We implement the quantum private query protocol in a nuclear magnetic resonance system.The density matrix of the memory registers are constructed.

  1. Compressed Data Cube for Approximate OLAP Query Processing

    Institute of Scientific and Technical Information of China (English)

    冯玉; 王珊

    2002-01-01

    Approximate query processing has emerged as an approach to dealing with thehuge data volume and complex queries in the environment of data warehouse. In this paper,we present a novel method that provides approximate answers to OLAP queries. Our methodis based on building a compressed (approximate) data cube by a clustering technique and usingthis compressed data cube to provide answers to queries directly, so it improves the performanceof the queries. We also provide the algorithm of the OLAP queries and the confidence intervalsof query results. An extensive experimental study with the OLAP council benchmark showsthe effectiveness and scalability of our cluster-based approach compared to sampling.

  2. Index and query methods in road networks

    CERN Document Server

    Feng, Jun

    2015-01-01

    This book presents the index and query techniques on road network and moving objects which are limited to road network. Here, the road network of non-Euclidean space has its unique characteristics such that two moving objects may be very close in a straight line distance. The index used in two-dimensional Euclidean space is not always appropriate for moving objects on road network. Therefore, the index structure needs to be improved in order to obtain suitable indexing methods, explore the shortest path and acquire nearest neighbor query and aggregation query methods under the new index structures. Chapter 1 of this book introduces the present situation of intelligent traffic and index in road network, Chapter 2 introduces the relevant existing spatial indexing methods. Chapter 3-5 focus on several issues of road network and query, they involves: traffic road network models (see Chapter 3), index structures (see Chapter 4) and aggregate query methods (see Chapter 5). Finally, in Chapter 6, the book briefly de...

  3. Indexing Reverse Top-k Queries

    CERN Document Server

    Chester, Sean; Venkatesh, S; Whitesides, Sue

    2012-01-01

    We consider the recently introduced monochromatic reverse top-k queries which ask for, given a new tuple q and a dataset D, all possible top-k queries on D union {q} for which q is in the result. Towards this problem, we focus on designing indexes in two dimensions for repeated (or batch) querying, a novel but practical consideration. We present the insight that by representing the dataset as an arrangement of lines, a critical k-polygon can be identified and used exclusively to respond to reverse top-k queries. We construct an index based on this observation which has guaranteed worst-case query cost that is logarithmic in the size of the k-polygon. We implement our work and compare it to related approaches, demonstrating that our index is fast in practice. Furthermore, we demonstrate through our experiments that a k-polygon is comprised of a small proportion of the original data, so our index structure consumes little disk space.

  4. EHR query language (EQL)--a query language for archetype-based health records.

    Science.gov (United States)

    Ma, Chunlan; Frankel, Heath; Beale, Thomas; Heard, Sam

    2007-01-01

    OpenEHR specifications have been developed to standardise the representation of an international electronic health record (EHR). The language used for querying EHR data is not as yet part of the specification. To fill in this gap, Ocean Informatics has developed a query language currently known as EHR Query Language (EQL), a declarative language supporting queries on EHR data. EQL is neutral to EHR systems, programming languages and system environments and depends only on the openEHR archetype model and semantics. Thus, in principle, EQL can be used in any archetype-based computational context. In the EHR context described here, particular queries mention concepts from the openEHR EHR Reference Model (RM). EQL can be used as a common query language for disparate archetype-based applications. The use of a common RM, archetypes, and a companion query language, such as EQL, semantic interoperability of EHR information is much closer. This paper introduces the EQL syntax and provides example clinical queries to illustrate the syntax. Finally, current implementations and future directions are outlined.

  5. A Preliminary Mapping of Web Queries Using Existing Image Query Schemes.

    Science.gov (United States)

    Jansen, Bernard J.

    End user searching on the Web has become the primary method of locating images for many people. This study investigates the nature of Web image queries by attempting to map them to known image classification schemes. In this study, approximately 100,000 image queries from a major Web search engine were collected in 1997, 1999, and 2001. A…

  6. SPARQL Query Re-writing Using Partonomy Based Transformation Rules

    Science.gov (United States)

    Jain, Prateek; Yeh, Peter Z.; Verma, Kunal; Henson, Cory A.; Sheth, Amit P.

    Often the information present in a spatial knowledge base is represented at a different level of granularity and abstraction than the query constraints. For querying ontology's containing spatial information, the precise relationships between spatial entities has to be specified in the basic graph pattern of SPARQL query which can result in long and complex queries. We present a novel approach to help users intuitively write SPARQL queries to query spatial data, rather than relying on knowledge of the ontology structure. Our framework re-writes queries, using transformation rules to exploit part-whole relations between geographical entities to address the mismatches between query constraints and knowledge base. Our experiments were performed on completely third party datasets and queries. Evaluations were performed on Geonames dataset using questions from National Geographic Bee serialized into SPARQL and British Administrative Geography Ontology using questions from a popular trivia website. These experiments demonstrate high precision in retrieval of results and ease in writing queries.

  7. jQuery for designers beginner's guide

    CERN Document Server

    MacLees, Natalie

    2014-01-01

    A step-by-step guide that spices up your web pages and designs them in the way you want using the most widely used JavaScript library, jQuery. The beginner-friendly and easy-to-understand approach of the book will help get to grips with jQuery in no time. If you know the fundamentals of HTML and CSS, and want to extend your knowledge by learning to use JavaScript, then this is just the book for you. jQuery makes JavaScript straightforward and approachable - you'll be surprised at how easy it can be to add animations and special effects to your beautifully designed pages.

  8. Extending OLAP Querying to External Object

    DEFF Research Database (Denmark)

    Pedersen, Torben Bach; Shoshani, Arie; Gu, Junmin

    inherent in data in nonstandard applications are not accommodated well by OLAP systems. In contrast, object database systems are built to handle such complexity, but do not support OLAP-type querying well. This paper presents the concepts and techniques underlying a flexible, multi-model federated system...... that enables OLAP users to exploit simultaneously the features of OLAP and object systems. The system allows data to be handled using the most appropriate data model and technology: OLAP systems for dimensional data and object database systems for more complex, general data. Additionally, physical data...... integration can be avoided. As a vehicle for demonstrating the capabilities of the system, a prototypical OLAP language is defined and extended to naturally support queries that involve data in object databases. The language permits selection criteria that reference object data, queries that return...

  9. Optimal Planar Orthogonal Skyline Counting Queries

    DEFF Research Database (Denmark)

    Brodal, Gerth Stølting; Larsen, Kasper Green

    2014-01-01

    The skyline of a set of points in the plane is the subset of maximal points, where a point (x,y) is maximal if no other point (x',y') satisfies x'≥ x and y'≥ x. We consider the problem of preprocessing a set P of n points into a space efficient static data structure supporting orthogonal skyline...... counting queries, i.e. given a query rectangle R to report the size of the skyline of P\\cap R. We present a data structure for storing n points with integer coordinates having query time O(lg n/lglg n) and space usage O(n). The model of computation is a unit cost RAM with logarithmic word size. We prove...

  10. How Do Search Engines Handle Chinese Queries?

    Directory of Open Access Journals (Sweden)

    Hong Cui

    2005-10-01

    Full Text Available The use of languages other than English has been growing exponentially on the Web. However, the major search engines have been lagging behind in providing indexes and search features to handle these languages. This article explores the characteristics of the Chinese language and how queries in this language are handled by different search engines. Queries were entered in two major search engines (Google and AlltheWeb and two search engines developed for Chinese (Sohu and Baidu. Criteria such as handling word segmentation, number of retrieved documents, and correct display and identification of Chinese characters were used to examine how the search engines handled the queries. The results showed that the performance of the two major search engines was not on a par with that of the search engines developed for Chinese.

  11. jQuery Mobile Up and Running

    CERN Document Server

    Firtman, Maximiliano

    2012-01-01

    Would you like to build one mobile web application that works on iPad and Kindle Fire as well as iPhone and Android smartphones? This introductory guide to jQuery Mobile shows you how. Through a series of hands-on exercises, you'll learn the best ways to use this framework's many interface components to build customizable, multiplatform apps. You don't need any programming skills or previous experience with jQuery to get started. By the time you finish this book, you'll know how to create responsive, Ajax-based interfaces that work on a variety of smartphones and tablets, using jQuery Mobile

  12. Query strategy for sequential ontology debugging

    CERN Document Server

    Shchekotykhina, Kostyantyn; Fleiss, Philipp; Rodler, Patrick

    2011-01-01

    Debugging of ontologies is an important prerequisite for their wide-spread application, especially in areas that rely upon everyday users to create and maintain knowledge bases, as in the case of the Semantic Web. Recent approaches use diagnosis methods to identify causes of inconsistent or incoherent ontologies. However, in most debugging scenarios these methods return many alternative diagnoses, thus placing the burden of fault localization on the user. This paper demonstrates how the target diagnosis can be identified by performing a sequence of observations, that is, by querying an oracle about entailments of the target ontology. We exploit a-priori probabilities of typical user errors to formulate information-theoretic concepts for query selection. Our evaluation showed that the proposed method significantly reduces the number of required queries compared to myopic strategies. We experimented with different probability distributions of user errors and different qualities of the a-priori probabilities. Ou...

  13. Automatic Building Information Model Query Generation

    Energy Technology Data Exchange (ETDEWEB)

    Jiang, Yufei; Yu, Nan; Ming, Jiang; Lee, Sanghoon; DeGraw, Jason; Yen, John; Messner, John I.; Wu, Dinghao

    2015-12-01

    Energy efficient building design and construction calls for extensive collaboration between different subfields of the Architecture, Engineering and Construction (AEC) community. Performing building design and construction engineering raises challenges on data integration and software interoperability. Using Building Information Modeling (BIM) data hub to host and integrate building models is a promising solution to address those challenges, which can ease building design information management. However, the partial model query mechanism of current BIM data hub collaboration model has several limitations, which prevents designers and engineers to take advantage of BIM. To address this problem, we propose a general and effective approach to generate query code based on a Model View Definition (MVD). This approach is demonstrated through a software prototype called QueryGenerator. By demonstrating a case study using multi-zone air flow analysis, we show how our approach and tool can help domain experts to use BIM to drive building design with less labour and lower overhead cost.

  14. Transfer active learning by querying committee

    Institute of Scientific and Technical Information of China (English)

    Hao SHAO; Feng TAO; Rui XU

    2014-01-01

    In real applications of inductive learning for classifi cation, labeled instances are often defi cient, and labeling them by an oracle is often expensive and time-consuming. Active learning on a single task aims to select only informative unlabeled instances for querying to improve the classifi cation accuracy while decreasing the querying cost. However, an inevitable problem in active learning is that the informative measures for selecting queries are commonly based on the initial hypotheses sampled from only a few labeled instances. In such a circumstance, the initial hypotheses are not reliable and may deviate from the true distribution underlying the target task. Consequently, the informative measures will possibly select irrelevant instances. A promising way to compensate this problem is to borrow useful knowledge from other sources with abundant labeled information, which is called transfer learning. However, a signifi cant challenge in transfer learning is how to measure the similarity between the source and the target tasks. One needs to be aware of different distributions or label assignments from unrelated source tasks;otherwise, they will lead to degenerated performance while transferring. Also, how to design an effective strategy to avoid selecting irrelevant samples to query is still an open question. To tackle these issues, we propose a hybrid algorithm for active learning with the help of transfer learning by adopting a divergence measure to alleviate the negative transfer caused by distribution differences. To avoid querying irrelevant instances, we also present an adaptive strategy which could eliminate unnecessary instances in the input space and models in the model space. Extensive experiments on both the synthetic and the real data sets show that the proposed algorithm is able to query fewer instances with a higher accuracy and that it converges faster than the state-of-the-art methods.

  15. A structural query system for Han characters

    DEFF Research Database (Denmark)

    Skala, Matthew

    2016-01-01

    The IDSgrep structural query system for Han character dictionaries is presented. This dictionary search system represents the spatial structure of Han characters using Extended Ideographic Description Sequences (EIDSes), a data model and syntax based on the Unicode IDS concept. It includes a query...... language for EIDS databases, with a freely available implementation and format translation from popular third-party IDS and XML character databases. The system is designed to suit the needs of font developers and foreign language learners. The search algorithm includes a bit vector index inspired by Bloom...

  16. Approximate Distance Oracles with Improved Query Time

    CERN Document Server

    Wulff-Nilsen, Christian

    2012-01-01

    Given an undirected graph $G$ with $m$ edges, $n$ vertices, and non-negative edge weights, and given an integer $k\\geq 2$, we show that a $(2k-1)$-approximate distance oracle for $G$ of size $O(kn^{1 + 1/k})$ and with $O(\\log k)$ query time can be constructed in $O(\\min\\{kmn^{1/k},\\sqrt km + kn^{1 + c/\\sqrt k}\\})$ time for some constant $c$. This improves the $O(k)$ query time of Thorup and Zwick. For any $0 0$ and $k = O(\\log n/\\log\\log n)$.

  17. Optimization and Evaluation of Nested Queries and Procedures

    CERN Document Server

    Guravannavar, Ravindra

    2009-01-01

    Many database applications perform complex data retrieval and update tasks. Nested queries, and queries that invoke user-defined functions, which are written using a mix of procedural and SQL constructs, are often used in such applications. A straight-forward evaluation of such queries involves repeated execution of parameterized sub-queries or blocks containing queries and procedural code. An important problem that arises while optimizing nested queries as well as queries with joins, aggregates and set operations is the problem of finding an optimal sort order from a factorial number of possible sort orders. We show that even a special case of this problem is NP-Hard, and present practical heuristics that are effective and easy to incorporate in existing query optimizers. We also consider iterative execution of queries and updates inside complex procedural blocks such as user-defined functions and stored procedures. Parameter batching is an important means of improving performance as it enables set-orientate...

  18. Hybrid Filtering in Semantic Query Processing

    Science.gov (United States)

    Jeong, Hanjo

    2011-01-01

    This dissertation presents a hybrid filtering method and a case-based reasoning framework for enhancing the effectiveness of Web search. Web search may not reflect user needs, intent, context, and preferences, because today's keyword-based search is lacking semantic information to capture the user's context and intent in posing the search query.…

  19. Beginning SQL queries from novice to professional

    CERN Document Server

    Churcher, Clare

    2016-01-01

    Anyone who does any work at all with databases needs to know something of SQL. This is a friendly and easy-to-read guide to writing queries with the all-important - in the database world - SQL language. The author writes with exceptional clarity.

  20. Anytime skyline query processing for interactive systems

    DEFF Research Database (Denmark)

    Magnani, Matteo; Assent, Ira; Mortensen, Michael L.

    In this paper we introduce the concept of anytime skyline query . The skyline database operator returns the top-1record for every possible monotone record scoring function.However, computing a skyline can be very time-consumingdepending on the size, distribution and dimensionality of the data, ma...

  1. Parallel hierarchical evaluation of transitive closure queries

    NARCIS (Netherlands)

    Houtsma, M.A.W.; Cacace, F.; Ceri, S.

    1991-01-01

    Presents a new approach to parallel computation of transitive closure queries using a semantic data fragmentation. Tuples of a large base relation denote edges in a graph, which models a transportation network. A fragmentation algorithm is proposed which produces a partitioning of the base relation

  2. Parallel evaluation of multi-join queries

    NARCIS (Netherlands)

    Wilschut, A.N.; Flokstra, Jan; Apers, Peter M.G.

    1995-01-01

    A number of execution strategies for parallel evaluation of multi-join queries have been proposed in the literature; their performance was evaluated by simulation. In this paper we give a comparative performance evaluation of four execution strategies by implementing all of them on the same parallel

  3. Query term suggestion in academic search

    NARCIS (Netherlands)

    Verberne, S.; Sappelli, M.; Kraaij, W.

    2014-01-01

    In this paper, we evaluate query term suggestion in the context of academic professional search. Our overall goal is to support scientists in their information seeking tasks. We set up an interactive search system in which terms are extracted from clicked documents and suggested to the user before e

  4. Querying Source Code with Natural Language

    CERN Document Server

    Kimmig, Markus; Mezini, Mira

    2012-01-01

    One common task of developing or maintaining software is searching the source code for information like specific method calls or write accesses to certain fields. This kind of information is required to correctly implement new features and to solve bugs. This paper presents an approach for querying source code with natural language.

  5. Exploiting cost distributions for query optimization

    NARCIS (Netherlands)

    Waas, F.; Pellenkoft, A.J.

    1998-01-01

    Large-scale query optimization is, besides its practical relevance, a hard test case for optimization techniques. Since exact methods cannot be applied due to the combinatorial explosion of the search space, heuristics and probabilistic strategies have been deployed for more than a decade. However,

  6. Enabling Incremental Query Re-Optimization

    Science.gov (United States)

    Liu, Mengmeng; Ives, Zachary G.; Loo, Boon Thau

    2017-01-01

    As declarative query processing techniques expand to the Web, data streams, network routers, and cloud platforms, there is an increasing need to re-plan execution in the presence of unanticipated performance changes. New runtime information may affect which query plan we prefer to run. Adaptive techniques require innovation both in terms of the algorithms used to estimate costs, and in terms of the search algorithm that finds the best plan. We investigate how to build a cost-based optimizer that recomputes the optimal plan incrementally given new cost information, much as a stream engine constantly updates its outputs given new data. Our implementation especially shows benefits for stream processing workloads. It lays the foundations upon which a variety of novel adaptive optimization algorithms can be built. We start by leveraging the recently proposed approach of formulating query plan enumeration as a set of recursive datalog queries; we develop a variety of novel optimization approaches to ensure effective pruning in both static and incremental cases. We further show that the lessons learned in the declarative implementation can be equally applied to more traditional optimizer implementations. PMID:28659658

  7. Cooperative Scalable Moving Continuous Query Processing

    DEFF Research Database (Denmark)

    Li, Xiaohui; Karras, Panagiotis; Jensen, Christian S.

    2012-01-01

    A range of applications call for a mobile client to continuously monitor others in close proximity. Past research on such problems has covered two extremes: It has offered totally centralized solutions, where a server takes care of all queries, and totally distributed solutions, in which there is...

  8. Using temporal bursts for query modeling

    NARCIS (Netherlands)

    Peetz, M.H.; Meij, E.; de Rijke, M.

    2014-01-01

    We present an approach to query modeling that leverages the temporal distribution of documents in an initially retrieved set of documents. In news-related document collections such distributions tend to exhibit bursts. Here, we define a burst to be a time period where unusually many documents are pu

  9. Parallel hierarchical evaluation of transitive closure queries

    NARCIS (Netherlands)

    Houtsma, M.A.W.; Houtsma, M.A.W.; Cacace, F.; Ceri, S.

    1991-01-01

    Presents a new approach to parallel computation of transitive closure queries using a semantic data fragmentation. Tuples of a large base relation denote edges in a graph, which models a transportation network. A fragmentation algorithm is proposed which produces a partitioning of the base relation

  10. Adapting Query Expansion to Search Proficiency

    NARCIS (Netherlands)

    C. Boscarino (Corrado); V. Hollink (Vera); A.P. de Vries (Arjen); B. Carterette; E. Kanoulas; P. Clough; M. Sanderson

    2012-01-01

    htmlabstractWe argue that query expansion (QE) based on the full ses- sion improves the overall search experience provided that we know how to adapt the QE weighting schema to a user's search proficiency. We propose a strategy to predict search ability from session parameters. Us- ing an

  11. Enriching a Descriptive Grammar with Treebank Queries

    NARCIS (Netherlands)

    Bouma, G.; van Koppen, J.M.|info:eu-repo/dai/nl/203188934; Landsbergen, Frank; Odijk, J.E.J.M.|info:eu-repo/dai/nl/082781710; van der Wouden, Ton; van de Camp, Matje

    2015-01-01

    The Syntax of Dutch (SoD) is a descriptive and detailed grammar of Dutch, that provides data for many issues raised in linguistic theory. We present the results of a pilot project that investigated the possibility of enriching the online version of the text with links to queries that provide

  12. Query term suggestion in academic search

    NARCIS (Netherlands)

    Verberne, S.; Sappelli, M.; Kraaij, W.

    2014-01-01

    In this paper, we evaluate query term suggestion in the context of academic professional search. Our overall goal is to support scientists in their information seeking tasks. We set up an interactive search system in which terms are extracted from clicked documents and suggested to the user before

  13. Enabling Incremental Query Re-Optimization.

    Science.gov (United States)

    Liu, Mengmeng; Ives, Zachary G; Loo, Boon Thau

    2016-01-01

    As declarative query processing techniques expand to the Web, data streams, network routers, and cloud platforms, there is an increasing need to re-plan execution in the presence of unanticipated performance changes. New runtime information may affect which query plan we prefer to run. Adaptive techniques require innovation both in terms of the algorithms used to estimate costs, and in terms of the search algorithm that finds the best plan. We investigate how to build a cost-based optimizer that recomputes the optimal plan incrementally given new cost information, much as a stream engine constantly updates its outputs given new data. Our implementation especially shows benefits for stream processing workloads. It lays the foundations upon which a variety of novel adaptive optimization algorithms can be built. We start by leveraging the recently proposed approach of formulating query plan enumeration as a set of recursive datalog queries; we develop a variety of novel optimization approaches to ensure effective pruning in both static and incremental cases. We further show that the lessons learned in the declarative implementation can be equally applied to more traditional optimizer implementations.

  14. A novel methodology for querying web images

    Science.gov (United States)

    Prabhakara, Rashmi; Lee, Ching Cheng

    2005-01-01

    Ever since the advent of Internet, there has been an immense growth in the amount of image data that is available on the World Wide Web. With such a magnitude of image availability, an efficient and effective image retrieval system is required to make use of this information. This research presents an effective image matching and indexing technique that improvises on existing integrated image retrieval methods. The proposed technique follows a two-phase approach, integrating query by topic and query by example specification methods. The first phase consists of topic-based image retrieval using an improved text information retrieval (IR) technique that makes use of the structured format of HTML documents. It consists of a focused crawler that not only provides for the user to enter the keyword for the topic-based search but also, the scope in which the user wants to find the images. The second phase uses the query by example specification to perform a low-level content-based image match for the retrieval of smaller and relatively closer results of the example image. Information related to the image feature is automatically extracted from the query image by the image processing system. A technique that is not computationally intensive based on color feature is used to perform content-based matching of images. The main goal is to develop a functional image search and indexing system and to demonstrate that better retrieval results can be achieved with this proposed hybrid search technique.

  15. Approximate Nearest Neighbor Queries among Parallel Segments

    DEFF Research Database (Denmark)

    Emiris, Ioannis Z.; Malamatos, Theocharis; Tsigaridas, Elias

    2010-01-01

    We develop a data structure for answering efficiently approximate nearest neighbor queries over a set of parallel segments in three dimensions. We connect this problem to approximate nearest neighbor searching under weight constraints and approximate nearest neighbor searching on historical data...

  16. Templates and Queries in Contextual Hypermedia

    DEFF Research Database (Denmark)

    Anderson, Kenneth Mark; Hansen, Frank Allan; Bouvin, Niels Olof

    2006-01-01

    This paper presents a new definition of context for context-aware computing based on a model that relies on dynamic queries over structured objects. This new model enables developers to flexibly specify the relationship between context and context data for their context-aware applications. We dis...

  17. Visualizing multidimensional query results using animation

    Science.gov (United States)

    Sawant, Amit P.; Healey, Christopher G.

    2008-01-01

    Effective representation of large, complex collections of information (datasets) presents a difficult challenge. Visualization is a solution that uses a visual interface to support efficient analysis and discovery within the data. Our primary goal in this paper is a technique that allows viewers to compare multiple query results representing user-selected subsets of a multidimensional dataset. We present an algorithm that visualizes multidimensional information along a space-filling spiral. Graphical glyphs that vary their position, color, and texture appearance are used to represent attribute values for the data elements in each query result. Guidelines from human perception allow us to construct glyphs that are specifically designed to support exploration, facilitate the discovery of trends and relationships both within and between data elements, and highlight exceptions. A clustering algorithm applied to a user-chosen ranking attribute bundles together similar data elements. This encapsulation is used to show relationships across different queries via animations that morph between query results. We apply our techniques to the MovieLens recommender system, to demonstrate their applicability in a real-world environment, and then conclude with a simple validation experiment to identify the strengths and limitations of our design, compared to a traditional side-by-side visualization.

  18. Outlook: The Next Twenty Years

    Energy Technology Data Exchange (ETDEWEB)

    Murayama, Hitoshi

    2003-12-07

    I present an outlook for the next twenty years in particle physics. I start with the big questions in our field, broken down into four categories: horizontal, vertical, heaven, and hell. Then I discuss how we attack the bigquestions in each category during the next twenty years. I argue for a synergy between many different approaches taken in our field.

  19. Children Adrift: Educating China's Millions of Migrants.

    Science.gov (United States)

    Cao, Haili

    1999-01-01

    The population of migrants moving within China's borders has reached some 80 million, including 2-3 million school-aged children. As migrant workers flock to cities, their children strain urban school systems or receive no education. But independent schools for migrants are illegal and substandard. In some rural provinces, vocational training may…

  20. Classifying proteins into functional groups based on all-versus-all BLAST of 10 million proteins.

    Science.gov (United States)

    Kolker, Natali; Higdon, Roger; Broomall, William; Stanberry, Larissa; Welch, Dean; Lu, Wei; Haynes, Winston; Barga, Roger; Kolker, Eugene

    2011-01-01

    To address the monumental challenge of assigning function to millions of sequenced proteins, we completed the first of a kind all-versus-all sequence alignments using BLAST for 9.9 million proteins in the UniRef100 database. Microsoft Windows Azure produced over 3 billion filtered records in 6 days using 475 eight-core virtual machines. Protein classification into functional groups was then performed using Hive and custom jars implemented on top of Apache Hadoop utilizing the MapReduce paradigm. First, using the Clusters of Orthologous Genes (COG) database, a length normalized bit score (LNBS) was determined to be the best similarity measure for classification of proteins. LNBS achieved sensitivity and specificity of 98% each. Second, out of 5.1 million bacterial proteins, about two-thirds were assigned to significantly extended COG groups, encompassing 30 times more assigned proteins. Third, the remaining proteins were classified into protein functional groups using an innovative implementation of a single-linkage algorithm on an in-house Hadoop compute cluster. This implementation significantly reduces the run time for nonindexed queries and optimizes efficient clustering on a large scale. The performance was also verified on Amazon Elastic MapReduce. This clustering assigned nearly 2 million proteins to approximately half a million different functional groups. A similar approach was applied to classify 2.8 million eukaryotic sequences resulting in over 1 million proteins being assign to existing KOG groups and the remainder clustered into 100,000 functional groups.

  1. jQuery UI 1.10 the user interface library for jQuery

    CERN Document Server

    Libby, Alex

    2013-01-01

    This book consists of an easy-to-follow, example-based approach that leads you step-by-step through the implementation and customization of each library component.This book is for frontend designers and developers who need to learn how to use jQuery UI quickly. To get the most out of this book, you should have a good working knowledge of HTML, CSS, and JavaScript, and should ideally be comfortable using jQuery.

  2. Automated Query Learning with Wikipedia and Genetic Programming

    CERN Document Server

    Malo, Pekka; Sinha, Ankur

    2010-01-01

    Most of the existing information retrieval systems are based on bag of words model and are not equipped with common world knowledge. Work has been done towards improving the efficiency of such systems by using intelligent algorithms to generate search queries, however, not much research has been done in the direction of incorporating human-and-society level knowledge in the queries. This paper is one of the first attempts where such information is incorporated into the search queries using Wikipedia semantics. The paper presents an essential shift from conventional token based queries to concept based queries, leading to an enhanced efficiency of information retrieval systems. To efficiently handle the automated query learning problem, we propose Wikipedia-based Evolutionary Semantics (Wiki-ES) framework where concept based queries are learnt using a co-evolving evolutionary procedure. Learning concept based queries using an intelligent evolutionary procedure yields significant improvement in performance whic...

  3. An Approach to Assist Designers With Their Queries and Designs

    DEFF Research Database (Denmark)

    Ahmed, Saeema

    2006-01-01

    Recent research investigating how engineers search for information has concluded that engineering designers acquire assistance when formulating queries. An approach to assist designers with their queries is presented. This approach forms part of a knowledge management system, where indexed docume...

  4. A Revisit of Query Expansion with Different Semantic Levels

    DEFF Research Database (Denmark)

    Zhang, Ce; Cui, Bin; Cong, Gao;

    2009-01-01

    Query expansion has received extensive attention in information retrieval community. Although semantic based query expansion appears to be promising in improving retrieval performance, previous research has shown that it cannot consistently improve retrieval performance. It is a tricky problem to...

  5. QUERY RESPONSE TIME COMPARISON NOSQLDB MONGODB WITH SQLDB ORACLE

    Directory of Open Access Journals (Sweden)

    Humasak T. A. Simanjuntak

    2015-01-01

    Full Text Available Penyimpanan data saat ini terdapat dua jenis yakni relational database dan non-relational database. Kedua jenis DBMS (Database Managemnet System tersebut berbeda dalam berbagai aspek seperti per-formansi eksekusi query, scalability, reliability maupun struktur penyimpanan data. Kajian ini memiliki tujuan untuk mengetahui perbandingan performansi DBMS antara Oracle sebagai jenis relational data-base dan MongoDB sebagai jenis non-relational database dalam mengolah data terstruktur. Eksperimen dilakukan untuk mengetahui perbandingan performansi kedua DBMS tersebut untuk operasi insert, select, update dan delete dengan menggunakan query sederhana maupun kompleks pada database Northwind. Untuk mencapai tujuan eksperimen, 18 query yang terdiri dari 2 insert query, 10 select query, 2 update query dan 2 delete query dieksekusi. Query dieksekusi melalui sebuah aplikasi .Net yang dibangun sebagai perantara antara user dengan basis data. Eksperimen dilakukan pada tabel dengan atau tanpa relasi pada Oracle dan embedded atau bukan embedded dokumen pada MongoDB. Response time untuk setiap eksekusi query dibandingkan dengan menggunakan metode statistik. Eksperimen menunjukkan response time query untuk proses select, insert, dan update pada MongoDB lebih cepatdaripada Oracle. MongoDB lebih cepat 64.8 % untuk select query;MongoDB lebihcepat 72.8 % untuk insert query dan MongoDB lebih cepat 33.9 % untuk update query. Pada delete query, Oracle lebih cepat 96.8 % daripada MongoDB untuk table yang berelasi, tetapi MongoDB lebih cepat 83.8 % daripada Oracle untuk table yang tidak memiliki relasi.Untuk query kompleks dengan Map Reduce pada MongoDB lebih lambat 97.6% daripada kompleks query dengan aggregate function pada Oracle.

  6. Constructing a Relational Query Optimizer for Non-Relational Languages

    OpenAIRE

    Rittinger, Jan

    2010-01-01

    Flat, unordered table data and a declarative query language established today’s success of relational database systems. Provided with the freedom to choose the evaluation order and underlying algorithms, their complex query optimizers are geared to come up with the best execution plan for a given query. With over 30 years of development and research, relational database management systems belong to the most mature and efficient query processors (especially for substantial amounts of data). ...

  7. Query Expansion Using SNOMED-CT and Weighing Schemes

    Science.gov (United States)

    2014-11-01

    recommend using the full capacity of the different Ontology that they used such as MeSH. Martinez et al. [2] from University of Melbourne, Australia and...for the first query. Query #1 58-year-old woman with hypertension and obesity presents with exercise-related episodic chest pain radiating to the...were then included with the original query as following. Query#1.0 <Summary>58-year-old woman with hypertension and obesity presents with exercise

  8. A Faceted Query Engine Applied to Archaeology

    Directory of Open Access Journals (Sweden)

    Kenneth A. Ross

    2007-04-01

    Full Text Available In this article we present the Faceted Query Engine, a system developed at Columbia University under the aegis of the inter-disciplinary project Computational Tools for Modeling, Visualizing and Analyzing Historic and Archaeological Sites. Our system is based on novel Database Systems research that has been published in Computer Science venues (Ross and Janevski, 2004 and Ross et al., 2005. The goal of this article is to introduce our system to the target user audience - the archaeology community. We demonstrate the use of the Faceted Query Engine on a previously unpublished dataset: the Thulamela (South Africa collection. This dataset is comprised of iron-age finds from the Thulamela site at the Kruger National Park. Our project is the first to systematically compile and classify this dataset. We also use a larger dataset, a collection of ancient Egyptian artifacts from the Memphis site (Giddy,1999, to demonstrate some of the features of our system.

  9. Virtual Solar Observatory Distributed Query Construction

    Science.gov (United States)

    Gurman, J. B.; Dimitoglou, G.; Bogart, R.; Davey, A.; Hill, F.; Martens, P.

    2003-01-01

    Through a prototype implementation (Tian et al., this meeting) the VSO has already demonstrated the capability of unifying geographically distributed data sources following the Web Services paradigm and utilizing mechanisms such as the Simple Object Access Protocol (SOAP). So far, four participating sites (Stanford, Montana State University, National Solar Observatory and the Solar Data Analysis Center) permit Web-accessible, time-based searches that allow browse access to a number of diverse data sets. Our latest work includes the extension of the simple, time-based queries to include numerous other searchable observation parameters. For VSO users, this extended functionality enables more refined searches. For the VSO, it is a proof of concept that more complex, distributed queries can be effectively constructed and that results from heterogeneous, remote sources can be synthesized and presented to users as a single, virtual data product.

  10. Mathematical Formula Search using Natural Language Queries

    Directory of Open Access Journals (Sweden)

    YANG, S.

    2014-11-01

    Full Text Available This paper presents how to search mathematical formulae written in MathML when given plain words as a query. Since the proposed method allows natural language queries like the traditional Information Retrieval for the mathematical formula search, users do not need to enter any complicated math symbols and to use any formula input tool. For this, formula data is converted into plain texts, and features are extracted from the converted texts. In our experiments, we achieve an outstanding performance, a MRR of 0.659. In addition, we introduce how to utilize formula classification for formula search. By using class information, we finally achieve an improved performance, a MRR of 0.690.

  11. 16 million [pounds] investment for 'virtual supercomputer'

    CERN Multimedia

    Holland, C

    2003-01-01

    "The Particle Physics and Astronomy Research Council is to spend 16million [pounds] to create a massive computing Grid, equivalent to the world's second largest supercomputer after Japan's Earth Simulator computer" (1/2 page)

  12. Sharing Economic Fruits with 900 Million Farmers

    Institute of Scientific and Technical Information of China (English)

    QIAOTIANBI

    2005-01-01

    The current goal of the central government is to benefit China′s900 million farmers through the development of mrket economy,as there can be no harmonious society without the participation of its major body.

  13. Exploiting Conceptual Knowledge for Querying Information Systems

    OpenAIRE

    Selke, Joachim; Balke, Wolf-Tilo

    2011-01-01

    Whereas today's information systems are well-equipped for efficient query handling, their strict mathematical foundations hamper their use for everyday tasks. In daily life, people expect information to be offered in a personalized and focused way. But currently, personalization in digital systems still only takes explicit knowledge into account and does not yet process conceptual information often naturally implied by users. We discuss how to bridge the gap between users and today's systems,...

  14. Exploiting Conceptual Knowledge for Querying Information Systems

    CERN Document Server

    Selke, Joachim

    2011-01-01

    Whereas today's information systems are well-equipped for efficient query handling, their strict mathematical foundations hamper their use for everyday tasks. In daily life, people expect information to be offered in a personalized and focused way. But currently, personalization in digital systems still only takes explicit knowledge into account and does not yet process conceptual information often naturally implied by users. We discuss how to bridge the gap between users and today's systems, building on results from cognitive psychology.

  15. Date restricted queries in web search engines

    OpenAIRE

    Lewandowski, Dirk

    2004-01-01

    Search engines usually offer a date restricted search on their advanced search pages. But determining the actual update of a web page is not without problems. We conduct a study testing date restricted queries on the search engines Google, Teoma and Yahoo!. We find that these searches fail to work properly in the examined engines. We discuss implications of this for further research and search engine development.

  16. Query Reformulation for Clinical Decision Support Search

    Science.gov (United States)

    2014-11-01

    general purpose search engines: case reports are much longer than traditional queries and present a narrative structure. Our system, initially...relevance feedback (PRF). The advantage of using such technique is that it is able to expand the case report not only by adding relevant medical terms...v.4.8. The following fields were indexed and used for document retrieval (unless otherwise stated): article title, article abstract, and article text

  17. AREVA net income: 649 million euros

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2007-03-15

    This document presents the financial statements for 2006 of Areva group: net income: 649 million euros; backlog up by 24.6% to 25.6 billion euros; steady growth of sales revenue: + 7.3%1 to 10.863 billion euros; operating income of 407 million euros: excellent divisional performance and constitution of a significant provision for the OL3 project in Finland; dividend proposed to Annual General Meeting of Shareholders: 8.46 euros per share.

  18. MQ-2 A Tool for Prolog-based Model Querying

    DEFF Research Database (Denmark)

    Acretoaie, Vlad; Störrle, Harald

    2012-01-01

    MQ-2 integrates a Prolog console into the MagicDraw1 modeling environment and equips this console with features targeted specifically to the task of querying models. The vision of MQ-2 is to make Prolog-based model querying accessible to both student and expert modelers by offering powerful query...

  19. Efficient Processing of Multiple DTW Queries in Time Series Databases

    DEFF Research Database (Denmark)

    Kremer, Hardy; Günnemann, Stephan; Ivanescu, Anca-Maria

    2011-01-01

    . In many of today’s applications, however, large numbers of queries arise at any given time. Existing DTW techniques do not process multiple DTW queries simultaneously, a serious limitation which slows down overall processing. In this paper, we propose an efficient processing approach for multiple DTW...... for multiple DTW queries....

  20. Multiple Query Evaluation Based on an Enhanced Genetic Algorithm.

    Science.gov (United States)

    Tamine, Lynda; Chrisment, Claude; Boughanem, Mohand

    2003-01-01

    Explains the use of genetic algorithms to combine results from multiple query evaluations to improve relevance in information retrieval. Discusses niching techniques, relevance feedback techniques, and evolution heuristics, and compares retrieval results obtained by both genetic multiple query evaluation and classical single query evaluation…

  1. Result Diversification Based on Query-Specific Cluster Ranking

    NARCIS (Netherlands)

    J. He (Jiyin); E. Meij; M. de Rijke

    2011-01-01

    htmlabstractResult diversification is a retrieval strategy for dealing with ambiguous or multi-faceted queries by providing documents that cover as many facets of the query as possible. We propose a result diversification framework based on query-specific clustering and cluster ranking,

  2. Result diversification based on query-specific cluster ranking

    NARCIS (Netherlands)

    He, J.; Meij, E.; de Rijke, M.

    2011-01-01

    Result diversification is a retrieval strategy for dealing with ambiguous or multi-faceted queries by providing documents that cover as many facets of the query as possible. We propose a result diversification framework based on query-specific clustering and cluster ranking, in which diversification

  3. Annotating URLs with query terms: What factors predict reliable annotations?

    NARCIS (Netherlands)

    Verberne, S.; Hinne, M.; Heijden, M. van der; Kraaij, W.; D'hondt, E.; Weide, T. van der

    2009-01-01

    A number of recent studies have investigated the relation be-ween URLs and associated query terms from search engine log files. In [5], the query terms associated with the domain of a URL were used as features for a URL classification task. The idea is that query terms that lead to successful classi

  4. A NEW TOP-K CONDITIONAL XML PREFERENCE QUERIES

    Directory of Open Access Journals (Sweden)

    Shaikhah Alhazmi

    2014-09-01

    Full Text Available Preference querying technology is a very important issue in a variety of applications ranging from ecommerce to personalized search engines. Most of recent research works have been dedicated to this topic in the Artificial Intelligence and Database fields. Several formalisms allowing preference reasoning and specification have been proposed in the Artificial Intelligence domain. On the other hand, in the Database field the interest has been focused mainly in extending standard Structured Query Language (SQL and also eXtensible Markup Language (XML with preference facilities in order to provide personalized query answering. More precisely, the interest in the database context focuses on the notion of Top-k preference query and on the development of efficient methods for evaluating these queries. A Top-k preference query returns k data tuples which are the most preferred according to the user’s preferences. Of course, Top-k preference query answering is closely dependent on the particular preference model underlying the semantics of the operators responsible for selecting the best tuples. In this paper, we consider the Conditional Preference queries (CP-queries where preferences are specified by a set of rules expressed in a logical formalism. We introduce Top-k conditional preference queries (Top-k CP-queries, and the operators BestK-Match and Best-Match for evaluating these queries will be presented.

  5. A comparison of user and system query performance predictions

    NARCIS (Netherlands)

    Hauff, C.; Kelly, Diane; Azzopardi, Leif

    2010-01-01

    Query performance prediction methods are usually applied to estimate the retrieval effectiveness of queries, where the evaluation is largely system sided. However, little work has been conducted to understand query performance prediction from the user's perspective. The question we consider is,

  6. Enabling Ontology Based Semantic Queries in Biomedical Database Systems.

    Science.gov (United States)

    Zheng, Shuai; Wang, Fusheng; Lu, James

    2014-03-01

    There is a lack of tools to ease the integration and ontology based semantic queries in biomedical databases, which are often annotated with ontology concepts. We aim to provide a middle layer between ontology repositories and semantically annotated databases to support semantic queries directly in the databases with expressive standard database query languages. We have developed a semantic query engine that provides semantic reasoning and query processing, and translates the queries into ontology repository operations on NCBO BioPortal. Semantic operators are implemented in the database as user defined functions extended to the database engine, thus semantic queries can be directly specified in standard database query languages such as SQL and XQuery. The system provides caching management to boosts query performance. The system is highly adaptable to support different ontologies through easy customizations. We have implemented the system DBOntoLink as an open source software, which supports major ontologies hosted at BioPortal. DBOntoLink supports a set of common ontology based semantic operations and have them fully integrated with a database management system IBM DB2. The system has been deployed and evaluated with an existing biomedical database for managing and querying image annotations and markups (AIM). Our performance study demonstrates the high expressiveness of semantic queries and the high efficiency of the queries.

  7. Tomograph: Highlighting query parallelism in a multi-core system

    NARCIS (Netherlands)

    Gawade, M.M.; Kersten, M.L.

    2013-01-01

    Query parallelism improves serial query execution performance by orders of magnitude. Getting optimal performance from an already parallelized query plan is however difficult due to its dependency on run time factors such as correct operator scheduling, memory pressure, disk io performance, and oper

  8. Tomograph: highlighting query parallelism in a multi-core system

    NARCIS (Netherlands)

    M. Gawade; M. Kersten

    2013-01-01

    Query parallelism improves serial query execution performance by orders of magnitude. Getting optimal performance from an already parallelized query plan is however difficult due to its dependency on run time factors such as correct operator scheduling, memory pressure, disk io performance, and oper

  9. Result diversification based on query-specific cluster ranking

    NARCIS (Netherlands)

    He, J.; Meij, E.; de Rijke, M.

    2011-01-01

    Result diversification is a retrieval strategy for dealing with ambiguous or multi-faceted queries by providing documents that cover as many facets of the query as possible. We propose a result diversification framework based on query-specific clustering and cluster ranking, in which diversification

  10. Semantic vs term-based query modification analysis

    NARCIS (Netherlands)

    V. Hollink (Vera); T. Tsikrika (Theodora); A.P. de Vries (Arjen)

    2010-01-01

    htmlabstractPrevious research has studied query modifications on a syntactic level by focusing on the addition, elimination and substitution of terms between consecutive queries that have at least one term in common. In this paper, we determine semantic relations between queries by first mapping

  11. A Relational Algebra Query Language for Programming Relational Databases

    Science.gov (United States)

    McMaster, Kirby; Sambasivam, Samuel; Anderson, Nicole

    2011-01-01

    In this paper, we describe a Relational Algebra Query Language (RAQL) and Relational Algebra Query (RAQ) software product we have developed that allows database instructors to teach relational algebra through programming. Instead of defining query operations using mathematical notation (the approach commonly taken in database textbooks), students…

  12. Discrete-query quantum algorithm for NAND trees

    CERN Document Server

    Childs, A M; Jordan, S P; Yeung, D; Childs, Andrew M.; Cleve, Richard; Jordan, Stephen P.; Yeung, David

    2007-01-01

    Recently, Farhi, Goldstone, and Gutmann gave a quantum algorithm for evaluating NAND trees that runs in time O(sqrt(N log N)) in the Hamiltonian query model. In this note, we point out that their algorithm can be converted into an algorithm using O(N^{1/2 + epsilon}) queries in the conventional quantum query model, for any fixed epsilon > 0.

  13. Predicting the Effectiveness of Queries and Retrieval Systems

    NARCIS (Netherlands)

    Hauff, C.

    2010-01-01

    In this thesis we consider users' attempts to express their information needs through queries, or search requests and try to predict whether those requests will be of high or low quality. Intuitively, a query's quality is determined by the outcome of the query, that is, whether the retrieved search

  14. Optimal Succinctness for Range Minimum Queries

    CERN Document Server

    Fischer, Johannes

    2008-01-01

    For an array A of n objects from a totally ordered universe, a range minimum query (RMQ) asks for the position of the minimum element in the sub-array A[i,j]. We focus on the setting where the array $A$ is static and known in advance, and can hence be preprocessed into a scheme in order to answer future queries faster. We make the further assumption that the input array A cannot be used at query time. Under this assumption, a natural lower bound of 2n bits for RMQ-schemes exists. We give the first truly succinct preprocessing scheme for O(1)-RMQs. Its final space consumption is 2n+o(n) bits, thus being asymptotically optimal. We also give a simple linear-time construction algorithm for this scheme that needs only n+o(n) bits of space in addition to the 2n+o(n) bits needed for the final data structure, thereby lowering the peak space consumption of previous schemes from O(n\\log n) to O(n) bits. We also improve on LCA-computation in BPS- and DFUDS-encoded trees.

  15. Nearest and reverse nearest neighbor queries for moving objects

    DEFF Research Database (Denmark)

    Benetis, R.; Jensen, Christian Søndergaard; Karciauskas, G.

    2006-01-01

    With the continued proliferation of wireless communications and advances in positioning technologies, algorithms for efficiently answering queries about large populations of moving objects are gaining in interest. This paper proposes algorithms for k nearest and reverse k nearest neighbor queries...... on the current and anticipated future positions of points moving continuously in the plane. The former type of query returns k objects nearest to a query object for each time point during a time interval, while the latter returns the objects that have a specified query object as one of their k closest neighbors...

  16. Algebra-Based Optimization of XML-Extended OLAP Queries

    DEFF Research Database (Denmark)

    Yin, Xuepeng; Pedersen, Torben Bach

    is desirable. This report presents a complete foundation for such OLAP-XML federations. This includes a prototypical query engine, a simplified query semantics based on previous work, and a complete physical algebra which enables precise modeling of the execution tasks of an OLAP-XML query. Effective algebra......-based and cost-based query optimization and implementation are also proposed, as well as the execution techniques. Finally, experiments with the prototypical query engine w.r.t. federation performance, optimization effectiveness, and feasibility suggest that our approach, unlike the physical integration...

  17. High-performance web services for querying gene and variant annotation.

    Science.gov (United States)

    Xin, Jiwen; Mark, Adam; Afrasiabi, Cyrus; Tsueng, Ginger; Juchler, Moritz; Gopal, Nikhil; Stupp, Gregory S; Putman, Timothy E; Ainscough, Benjamin J; Griffith, Obi L; Torkamani, Ali; Whetzel, Patricia L; Mungall, Christopher J; Mooney, Sean D; Su, Andrew I; Wu, Chunlei

    2016-05-06

    Efficient tools for data management and integration are essential for many aspects of high-throughput biology. In particular, annotations of genes and human genetic variants are commonly used but highly fragmented across many resources. Here, we describe MyGene.info and MyVariant.info, high-performance web services for querying gene and variant annotation information. These web services are currently accessed more than three million times permonth. They also demonstrate a generalizable cloud-based model for organizing and querying biological annotation information. MyGene.info and MyVariant.info are provided as high-performance web services, accessible at http://mygene.info and http://myvariant.info . Both are offered free of charge to the research community.

  18. OPTIMIZATION OF LOCATION BASED QUERIES USING SPATIAL INDEXING

    Directory of Open Access Journals (Sweden)

    S. Geetha

    2014-04-01

    Full Text Available The recent development in the technology leads to the introduction of various mobile terminals and there is a demand that the client requires effective location based services. The valid regions expand and also query retrieval time increases which lead to poor performance of query processing. The spatial indexing techniques are one of the most effective optimization methods to improve the quality of services. In existing system NN queries and window queries are used. In that R-tree and grid indexing has been used for increasing the query efficiency. But the Grid-index technique support low memory and thus large databases cannot be handled effectively. In the proposed system we are using Ordered grid index and EVR-tree to minimize the query retrieval time and to decrease the depth of the search index. The Ordered grid index and EVR-tree to speed up the spatial query processing.

  19. A Comprehensive Trainable Error Model for Sung Music Queries

    CERN Document Server

    Birmingham, W P; 10.1613/jair.1334

    2011-01-01

    We propose a model for errors in sung queries, a variant of the hidden Markov model (HMM). This is a solution to the problem of identifying the degree of similarity between a (typically error-laden) sung query and a potential target in a database of musical works, an important problem in the field of music information retrieval. Similarity metrics are a critical component of query-by-humming (QBH) applications which search audio and multimedia databases for strong matches to oral queries. Our model comprehensively expresses the types of error or variation between target and query: cumulative and non-cumulative local errors, transposition, tempo and tempo changes, insertions, deletions and modulation. The model is not only expressive, but automatically trainable, or able to learn and generalize from query examples. We present results of simulations, designed to assess the discriminatory potential of the model, and tests with real sung queries, to demonstrate relevance to real-world applications.

  20. A Grammar Analysis Model for the Unified Multimedia Query Language

    Institute of Scientific and Technical Information of China (English)

    Zhong-Sheng Cao; Zong-Da Wu; Yuan-Zhen Wang

    2008-01-01

    The unified multimedia query language (UMQL) is a powerful general-purpose multimedia query language, and it is very suitable for multimedia information retrieval. The paper proposes a grammar analysis model to implement an effective grammatical processing for the language. It separates the grammar analysis of a UMQL query specification into two phases: syntactic analysis and semantic analysis, and then respectively uses Backus-Naur form (EBNF) and logical algebra to specify both restrictive grammar rules. As a result, the model can present error guiding information for a query specification which owns incorrect grammar. The model not only suits well the processing of UMQL queries, but also has a guiding significance for other projects concerning query processings of descriptive query languages.

  1. Improve Performance of Data Warehouse by Query Cache

    Science.gov (United States)

    Gour, Vishal; Sarangdevot, S. S.; Sharma, Anand; Choudhary, Vinod

    2010-11-01

    The primary goal of data warehouse is to free the information locked up in the operational database so that decision makers and business analyst can make queries, analysis and planning regardless of the data changes in operational database. As the number of queries is large, therefore, in certain cases there is reasonable probability that same query submitted by the one or multiple users at different times. Each time when query is executed, all the data of warehouse is analyzed to generate the result of that query. In this paper we will study how using query cache improves performance of Data Warehouse and try to find the common problems faced. These kinds of problems are faced by Data Warehouse administrators which are minimizes response time and improves the efficiency of query in data warehouse overall, particularly when data warehouse is updated at regular interval.

  2. Performance Oriented Query Processing In GEO Based Location Search Engines

    CERN Document Server

    Umamaheswari, M

    2010-01-01

    Geographic location search engines allow users to constrain and order search results in an intuitive manner by focusing a query on a particular geographic region. Geographic search technology, also called location search, has recently received significant interest from major search engine companies. Academic research in this area has focused primarily on techniques for extracting geographic knowledge from the web. In this paper, we study the problem of efficient query processing in scalable geographic search engines. Query processing is a major bottleneck in standard web search engines, and the main reason for the thousands of machines used by the major engines. Geographic search engine query processing is different in that it requires a combination of text and spatial data processing techniques. We propose several algorithms for efficient query processing in geographic search engines, integrate them into an existing web search query processor, and evaluate them on large sets of real data and query traces.

  3. 200 million Euros:Worthy or Not?

    Institute of Scientific and Technical Information of China (English)

    2009-01-01

    "Pierre Cardin to Sell Licenses to Chinese Companies with 200 million Euros",frankly speaking,at the first time I knew the news,several questions ran into my mind:"200 million Euros,is it too much for such a not-first-class international fashion brand?","200 millions Euros,with such a large amount of capital,domestic enterprises could gain more investment to develop their own brand?",or"Pierre Cardin…".Although later one of the spokesman of Pierre Cardin Group explained:"This is not about the sale of the Pierre Cardin group,but the sale of certain licenses in China",this hot deal still arouse fierce comments from not only the domestic industrial insiders,but also the publics.

  4. Querying Archetype-Based Electronic Health Records Using Hadoop and Dewey Encoding of openEHR Models.

    Science.gov (United States)

    Sundvall, Erik; Wei-Kleiner, Fang; Freire, Sergio M; Lambrix, Patrick

    2017-01-01

    Archetype-based Electronic Health Record (EHR) systems using generic reference models from e.g. openEHR, ISO 13606 or CIMI should be easy to update and reconfigure with new types (or versions) of data models or entries, ideally with very limited programming or manual database tweaking. Exploratory research (e.g. epidemiology) leading to ad-hoc querying on a population-wide scale can be a challenge in such environments. This publication describes implementation and test of an archetype-aware Dewey encoding optimization that can be used to produce such systems in environments supporting relational operations, e.g. RDBMs and distributed map-reduce frameworks like Hadoop. Initial testing was done using a nine-node 2.2 GHz quad-core Hadoop cluster querying a dataset consisting of targeted extracts from 4+ million real patient EHRs, query results with sub-minute response time were obtained.

  5. EMBL pay settlement will cost millions

    CERN Multimedia

    Abott, A

    1999-01-01

    A labour dispute at EMBL, Heidelberg, was settled last week at a cost of at least DM4 million for the organisation's 16 member states. The lab has asked for clarification on whether the ruling from the IL0 refers simply to a salary adjustment from 1995 or also to a backdated implementation of higher salary scales. This second option would cost considerably more - 8 percent of the budget in back pay and DM3.5 million per annum (1/2 page).

  6. Query Specific Rank Fusion for Image Retrieval.

    Science.gov (United States)

    Zhang, Shaoting; Yang, Ming; Cour, Timothee; Yu, Kai; Metaxas, Dimitris N

    2015-04-01

    Recently two lines of image retrieval algorithms demonstrate excellent scalability: 1) local features indexed by a vocabulary tree, and 2) holistic features indexed by compact hashing codes. Although both of them are able to search visually similar images effectively, their retrieval precision may vary dramatically among queries. Therefore, combining these two types of methods is expected to further enhance the retrieval precision. However, the feature characteristics and the algorithmic procedures of these methods are dramatically different, which is very challenging for the feature-level fusion. This motivates us to investigate how to fuse the ordered retrieval sets, i.e., the ranks of images, given by multiple retrieval methods, to boost the retrieval precision without sacrificing their scalability. In this paper, we model retrieval ranks as graphs of candidate images and propose a graph-based query specific fusion approach, where multiple graphs are merged and reranked by conducting a link analysis on a fused graph. The retrieval quality of an individual method is measured on-the-fly by assessing the consistency of the top candidates' nearest neighborhoods. Hence, it is capable of adaptively integrating the strengths of the retrieval methods using local or holistic features for different query images. This proposed method does not need any supervision, has few parameters, and is easy to implement. Extensive and thorough experiments have been conducted on four public datasets, i.e., the UKbench, Corel-5K, Holidays and the large-scale San Francisco Landmarks datasets. Our proposed method has achieved very competitive performance, including state-of-the-art results on several data sets, e.g., the N-S score 3.83 for UKbench.

  7. A Geospatial Semantic Enrichment and Query Service for Geotagged Photographs.

    Science.gov (United States)

    Ennis, Andrew; Nugent, Chris; Morrow, Philip; Chen, Liming; Ioannidis, George; Stan, Alexandru; Rachev, Preslav

    2015-07-20

    With the increasing abundance of technologies and smart devices, equipped with a multitude of sensors for sensing the environment around them, information creation and consumption has now become effortless. This, in particular, is the case for photographs with vast amounts being created and shared every day. For example, at the time of this writing, Instagram users upload 70 million photographs a day. Nevertheless, it still remains a challenge to discover the "right" information for the appropriate purpose. This paper describes an approach to create semantic geospatial metadata for photographs, which can facilitate photograph search and discovery. To achieve this we have developed and implemented a semantic geospatial data model by which a photograph can be enrich with geospatial metadata extracted from several geospatial data sources based on the raw low-level geo-metadata from a smartphone photograph. We present the details of our method and implementation for searching and querying the semantic geospatial metadata repository to enable a user or third party system to find the information they are looking for.

  8. A Geospatial Semantic Enrichment and Query Service for Geotagged Photographs

    Directory of Open Access Journals (Sweden)

    Andrew Ennis

    2015-07-01

    Full Text Available With the increasing abundance of technologies and smart devices, equipped with a multitude of sensors for sensing the environment around them, information creation and consumption has now become effortless. This, in particular, is the case for photographs with vast amounts being created and shared every day. For example, at the time of this writing, Instagram users upload 70 million photographs a day. Nevertheless, it still remains a challenge to discover the “right” information for the appropriate purpose. This paper describes an approach to create semantic geospatial metadata for photographs, which can facilitate photograph search and discovery. To achieve this we have developed and implemented a semantic geospatial data model by which a photograph can be enrich with geospatial metadata extracted from several geospatial data sources based on the raw low-level geo-metadata from a smartphone photograph. We present the details of our method and implementation for searching and querying the semantic geospatial metadata repository to enable a user or third party system to find the information they are looking for.

  9. SPARQL Assist Language-Neutral Query Composer

    CERN Document Server

    McCarthy, Luke; Wilkinson, Mark

    2010-01-01

    SPARQL query composition is difficult for the lay-person or even the experienced bioinformatician in cases where the data model is unfamiliar. Established best-practices and internationalization concerns dictate that semantic web ontologies should use terms with opaque identifiers, further complicating the task. We present SPARQL Assist: a web application that addresses these issues by providing context-sensitive type-ahead completion to existing web forms. Ontological terms are suggested using their labels and descriptions, leveraging existing XML support for internationalization and language-neutrality.

  10. Immune Algorithm For Document Query Optimization

    Institute of Scientific and Technical Information of China (English)

    WangZiqiang; FengBoqin

    2005-01-01

    To efficiently retrieve relevant document from the rapid proliferation of large information collections, a novel immune algorithm for document query optimization is proposed. The essential ideal of the immune algorithm is that the crossover and mutation of operator are constructed according to its own characteristics of information retrieval. Immune operator is adopted to avoid degeneracy. Relevant documents retrieved am merged to a single document list according to rank formula. Experimental results show that the novel immune algorithm can lead to substantial improvements of relevant document retrieval effectiveness.

  11. Downloading Multiple Records Using Query Strings

    Directory of Open Access Journals (Sweden)

    Adam Crymble

    2012-11-01

    Full Text Available Downloading a single record from a website is easy, but downloading many records at a time – an increasingly frequent need for a historian – is much more efficient using a programming language such as Python. In this lesson, we will write a program that will download a series of records from the Old Bailey Online using custom search criteria, and save them to a directory on our computer. This process involves interpreting and manipulating URL Query Strings. In this case, the tutorial will seek to download sources that contain references to people of African descent that were published in the Old Bailey Proceedings between 1700 and 1750.

  12. An Optimal Labeling Scheme for Ancestry Queries

    OpenAIRE

    2009-01-01

    An ancestry labeling scheme assigns labels (bit strings) to the nodes of rooted trees such that ancestry queries between any two nodes in a tree can be answered merely by looking at their corresponding labels. The quality of an ancestry labeling scheme is measured by its label size, that is the maximal number of bits in a label of a tree node. In addition to its theoretical appeal, the design of efficient ancestry labeling schemes is motivated by applications in web search engines. For this p...

  13. A Querying Method over RDF-ized Health Level Seven v2.5 Messages Using Life Science Knowledge Resources

    Science.gov (United States)

    2016-01-01

    Background Health level seven version 2.5 (HL7 v2.5) is a widespread messaging standard for information exchange between clinical information systems. By applying Semantic Web technologies for handling HL7 v2.5 messages, it is possible to integrate large-scale clinical data with life science knowledge resources. Objective Showing feasibility of a querying method over large-scale resource description framework (RDF)-ized HL7 v2.5 messages using publicly available drug databases. Methods We developed a method to convert HL7 v2.5 messages into the RDF. We also converted five kinds of drug databases into RDF and provided explicit links between the corresponding items among them. With those linked drug data, we then developed a method for query expansion to search the clinical data using semantic information on drug classes along with four types of temporal patterns. For evaluation purpose, medication orders and laboratory test results for a 3-year period at the University of Tokyo Hospital were used, and the query execution times were measured. Results Approximately 650 million RDF triples for medication orders and 790 million RDF triples for laboratory test results were converted. Taking three types of query in use cases for detecting adverse events of drugs as an example, we confirmed these queries were represented in SPARQL Protocol and RDF Query Language (SPARQL) using our methods and comparison with conventional query expressions were performed. The measurement results confirm that the query time is feasible and increases logarithmically or linearly with the amount of data and without diverging. Conclusions The proposed methods enabled query expressions that separate knowledge resources and clinical data, thereby suggesting the feasibility for improving the usability of clinical data by enhancing the knowledge resources. We also demonstrate that when HL7 v2.5 messages are automatically converted into RDF, searches are still possible through SPARQL without

  14. Foreclosed: Two Million Homeless Students and Counting

    Science.gov (United States)

    McKibben, Sarah

    2009-01-01

    This article reports that according to First Focus, a bipartisan advocacy organization for children and families, a predicted two million children will lose their homes over the next two years because of the foreclosure crisis. From an economy deep in recession, an entirely new population of homeless students has emerged. And with job losses at…

  15. Carleton to oversee $40 million lab grant

    CERN Multimedia

    Singer, Zev

    2003-01-01

    "Carleton University got a major gift yesterday, as the federal government announced the university will oversee a $40-million grant to run the world's deepest underground lab at the Sudbury Neutrino Observatory. Five other universities are partners in the project" (1/2 page).

  16. 730 Million Farmers Freed from Agricultural Tax

    Institute of Scientific and Technical Information of China (English)

    2005-01-01

    About 730 million farmers will benefit from tax cuts totaling more than 20 billion yuan (US$2.4 billion) this year, as 26 of China's 31 provinces will terminate the Agricultural Tax, according to Vice Minister of Agriculture Fan Xiaojian.

  17. Clustering Millions of Faces by Identity.

    Science.gov (United States)

    Otto, Charles; Wang, Dayong; Jain, Anil

    2017-03-07

    Given a large collection of unlabeled face images, we address the problem of clustering faces into an unknown number of identities. This problem is of interest in social media, law enforcement, and other applications, where the number of faces can be of the order of hundreds of million, while the number of identities (clusters) can range from a few thousand to millions. To address the challenges of run-time complexity and cluster quality, we present an approximate Rank-Order clustering algorithm that performs better than popular clustering algorithms (k-Means and Spectral). Our experiments include clustering up to 123 million face images into over 10 million clusters. Clustering results are analyzed in terms of external (known face labels) and internal (unknown face labels) quality measures, and run-time. Our algorithm achieves an F-measure of 0:87 on the LFW benchmark (13K faces of 5; 749 individuals), which drops to 0:27 on the largest dataset considered (13K faces in LFW + 123M distractor images). Additionally, we show that frames in the YouTube benchmark can be clustered with an F-measure of 0:71. An internal per-cluster quality measure is developed to rank individual clusters for manual exploration of high quality clusters that are compact and isolated.

  18. Partial match queries in random quadtrees

    CERN Document Server

    Broutin, Nicolas; Sulzbach, Henning

    2011-01-01

    We consider the problem of recovering items matching a partially specified pattern in multidimensional trees (quad trees and k-d trees). We assume the traditional model where the data consist of independent and uniform points in the unit square. For this model, in a structure on $n$ points, it is known that the number of nodes $C_n(\\xi)$ to visit in order to report the items matching an independent and uniformly on $[0,1]$ random query $\\xi$ satisfies $\\Ec{C_n(\\xi)}\\sim \\kappa n^{\\beta}$, where $\\kappa$ and $\\beta$ are explicit constants. We develop an approach based on the analysis of the cost $C_n(x)$ of any fixed query $x\\in [0,1]$, and give precise estimates for the variance and limit distribution of the cost $C_n(x)$. Our results permit to describe a limit process for the costs $C_n(x)$ as $x$ varies in $[0,1]$; one of the consequences is that $E{\\max_{x\\in [0,1]} C_n(x)} \\sim \\gamma n^\\beta$.

  19. Query-Time Optimization Techniques for Structured Queries in Information Retrieval

    Science.gov (United States)

    Cartright, Marc-Allen

    2013-01-01

    The use of information retrieval (IR) systems is evolving towards larger, more complicated queries. Both the IR industrial and research communities have generated significant evidence indicating that in order to continue improving retrieval effectiveness, increases in retrieval model complexity may be unavoidable. From an operational perspective,…

  20. CrossQuery: a web tool for easy associative querying of transcriptome data.

    Directory of Open Access Journals (Sweden)

    Toni U Wagner

    Full Text Available Enormous amounts of data are being generated by modern methods such as transcriptome or exome sequencing and microarray profiling. Primary analyses such as quality control, normalization, statistics and mapping are highly complex and need to be performed by specialists. Thereafter, results are handed back to biomedical researchers, who are then confronted with complicated data lists. For rather simple tasks like data filtering, sorting and cross-association there is a need for new tools which can be used by non-specialists. Here, we describe CrossQuery, a web tool that enables straight forward, simple syntax queries to be executed on transcriptome sequencing and microarray datasets. We provide deep-sequencing data sets of stem cell lines derived from the model fish Medaka and microarray data of human endothelial cells. In the example datasets provided, mRNA expression levels, gene, transcript and sample identification numbers, GO-terms and gene descriptions can be freely correlated, filtered and sorted. Queries can be saved for later reuse and results can be exported to standard formats that allow copy-and-paste to all widespread data visualization tools such as Microsoft Excel. CrossQuery enables researchers to quickly and freely work with transcriptome and microarray data sets requiring only minimal computer skills. Furthermore, CrossQuery allows growing association of multiple datasets as long as at least one common point of correlated information, such as transcript identification numbers or GO-terms, is shared between samples. For advanced users, the object-oriented plug-in and event-driven code design of both server-side and client-side scripts allow easy addition of new features, data sources and data types.

  1. CrossQuery: a web tool for easy associative querying of transcriptome data.

    Science.gov (United States)

    Wagner, Toni U; Fischer, Andreas; Thoma, Eva C; Schartl, Manfred

    2011-01-01

    Enormous amounts of data are being generated by modern methods such as transcriptome or exome sequencing and microarray profiling. Primary analyses such as quality control, normalization, statistics and mapping are highly complex and need to be performed by specialists. Thereafter, results are handed back to biomedical researchers, who are then confronted with complicated data lists. For rather simple tasks like data filtering, sorting and cross-association there is a need for new tools which can be used by non-specialists. Here, we describe CrossQuery, a web tool that enables straight forward, simple syntax queries to be executed on transcriptome sequencing and microarray datasets. We provide deep-sequencing data sets of stem cell lines derived from the model fish Medaka and microarray data of human endothelial cells. In the example datasets provided, mRNA expression levels, gene, transcript and sample identification numbers, GO-terms and gene descriptions can be freely correlated, filtered and sorted. Queries can be saved for later reuse and results can be exported to standard formats that allow copy-and-paste to all widespread data visualization tools such as Microsoft Excel. CrossQuery enables researchers to quickly and freely work with transcriptome and microarray data sets requiring only minimal computer skills. Furthermore, CrossQuery allows growing association of multiple datasets as long as at least one common point of correlated information, such as transcript identification numbers or GO-terms, is shared between samples. For advanced users, the object-oriented plug-in and event-driven code design of both server-side and client-side scripts allow easy addition of new features, data sources and data types.

  2. A Study of Library Databases by Translating Those SQL Queries into Relational Algebra and Generating Query Trees

    Directory of Open Access Journals (Sweden)

    Santhi Lasya

    2011-09-01

    Full Text Available Even in this World Wide Web era where there is unrestricted access to a lot of articles and books at a mouses click, the role of an organized library is immense. It is vital to have effective software to manage various functions in a library and the fundamental for effective software is the underlying database access and the queries used. And hence library databases become our use-case for this study. This paper starts off with considering a basic ER model of a typical library relational database. We would also list all the basic use-cases in a library management system. The next part of the paper deals with the sql queries used for performing certain functions in a library database management system. Along with the queries, we would generate reports for some of the use cases. The final section of the paper forms the crux of this library database study, wherein we would dwell on the concepts of query processing and query optimization in the relational database domain. We would analyze the above mentioned queries, by translating the query into a relational algebra expression and generating a query tree for the same. By converting algebra, we look at optimizing the query, and by generating a query tree, we would come up a cheapest cost plan.

  3. Multidimensional Data Querying on Tree-Structured Overlay

    Institute of Scientific and Technical Information of China (English)

    XU Lizhen; WANG Shiyuan

    2006-01-01

    Multidimensional data query has been gaining much interest in database research communities in recent years, yet many of the existing studies focus mainly on centralized systems.A solution to querying in Peer-to-Peer(P2P) environment was proposed to achieve both low processing cost in terms of the number of peers accessed and search messages and balanced query loads among peers.The system is based on a balanced tree structured P2P network.By partitioning the query space intelligently, the amount of query forwarding is effectively controlled, and the number of peers involved and search messages are also limited.Dynamic load balancing can be achieved during space partitioning and query resolving.Extensive experiments confirm the effectiveness and scalability of our algorithms on P2P networks.

  4. Structured Query Translation in Peer to Peer Database Sharing Systems

    Directory of Open Access Journals (Sweden)

    Mehedi Masud

    2009-10-01

    Full Text Available This paper presents a query translation mechanism between heterogeneous peers in Peer to Peer Database Sharing Systems (PDSSs. A PDSS combines a database management system with P2P functionalities. The local databases on peers are called peer databases. In a PDSS, each peer chooses its own data model and schema and maintains data independently without any global coordinator. One of the problems in such a system is translating queries between peers, taking into account both the schema and data heterogeneity. Query translation is the problem of rewriting a query posed in terms of one peer schema to a query in terms of another peer schema. This paper proposes a query translation mechanism between peers where peers are acquainted in data sharing systems through data-level mappings for sharing data.

  5. Query-Based Outlier Detection in Heterogeneous Information Networks

    Science.gov (United States)

    Kuck, Jonathan; Zhuang, Honglei; Yan, Xifeng; Cam, Hasan; Han, Jiawei

    2015-01-01

    Outlier or anomaly detection in large data sets is a fundamental task in data science, with broad applications. However, in real data sets with high-dimensional space, most outliers are hidden in certain dimensional combinations and are relative to a user’s search space and interest. It is often more effective to give power to users and allow them to specify outlier queries flexibly, and the system will then process such mining queries efficiently. In this study, we introduce the concept of query-based outlier in heterogeneous information networks, design a query language to facilitate users to specify such queries flexibly, define a good outlier measure in heterogeneous networks, and study how to process outlier queries efficiently in large data sets. Our experiments on real data sets show that following such a methodology, interesting outliers can be defined and uncovered flexibly and effectively in large heterogeneous networks. PMID:27064397

  6. Goal Directed Relative Skyline Queries in Time Dependent Road Networks

    CERN Document Server

    Iyer, K B Priya

    2012-01-01

    The Wireless GIS technology is progressing rapidly in the area of mobile communications. Location-based spatial queries are becoming an integral part of many new mobile applications. The Skyline queries are latest apps under Location-based services. In this paper we introduce Goal Directed Relative Skyline queries on Time dependent (GD-RST) road networks. The algorithm uses travel time as a metric in finding the data object by considering multiple query points (multi-source skyline) relative to user location and in the user direction of travelling. We design an efficient algorithm based on Filter phase, Heap phase and Refine Skyline phases. At the end, we propose a dynamic skyline caching (DSC) mechanism which helps to reduce the computation cost for future skyline queries. The experimental evaluation reflects the performance of GD-RST algorithm over the traditional branch and bound algorithm for skyline queries in real road networks.

  7. Querying and Extracting Timeline Information from Road Traffic Sensor Data.

    Science.gov (United States)

    Imawan, Ardi; Indikawati, Fitri Indra; Kwon, Joonho; Rao, Praveen

    2016-08-23

    The escalation of traffic congestion in urban cities has urged many countries to use intelligent transportation system (ITS) centers to collect historical traffic sensor data from multiple heterogeneous sources. By analyzing historical traffic data, we can obtain valuable insights into traffic behavior. Many existing applications have been proposed with limited analysis results because of the inability to cope with several types of analytical queries. In this paper, we propose the QET (querying and extracting timeline information) system-a novel analytical query processing method based on a timeline model for road traffic sensor data. To address query performance, we build a TQ-index (timeline query-index) that exploits spatio-temporal features of timeline modeling. We also propose an intuitive timeline visualization method to display congestion events obtained from specified query parameters. In addition, we demonstrate the benefit of our system through a performance evaluation using a Busan ITS dataset and a Seattle freeway dataset.

  8. Multi-Dimensional Top-k Dominating Queries

    DEFF Research Database (Denmark)

    Yiu, Man Lung; Mamoulis, Nikos

    2009-01-01

    The top-k dominating query returns k data objects which dominate the highest number of objects in a dataset. This query is an important tool for decision support since it provides data analysts an intuitive way for finding significant objects. In addition, it combines the advantages of top...... attention from the research community. This paper is an extensive study on the evaluation of topk dominating queries. First, we propose a set of algorithms that apply on indexed multi-dimensional data. Second, we investigate query evaluation on data that are not indexed. Finally, we study a relaxed variant...... of the query which considers dominance in dimensional subspaces. Experiments using synthetic and real datasets demonstrate that our algorithms significantly outperform a previous skyline-based approach. We also illustrate the applicability of this multi-dimensional analysis query by studying the meaningfulness...

  9. A novel adaptive Cuckoo search for optimal query plan generation.

    Science.gov (United States)

    Gomathi, Ramalingam; Sharmila, Dhandapani

    2014-01-01

    The emergence of multiple web pages day by day leads to the development of the semantic web technology. A World Wide Web Consortium (W3C) standard for storing semantic web data is the resource description framework (RDF). To enhance the efficiency in the execution time for querying large RDF graphs, the evolving metaheuristic algorithms become an alternate to the traditional query optimization methods. This paper focuses on the problem of query optimization of semantic web data. An efficient algorithm called adaptive Cuckoo search (ACS) for querying and generating optimal query plan for large RDF graphs is designed in this research. Experiments were conducted on different datasets with varying number of predicates. The experimental results have exposed that the proposed approach has provided significant results in terms of query execution time. The extent to which the algorithm is efficient is tested and the results are documented.

  10. An adaptive range-query optimization technique with distributed replicas

    Institute of Scientific and Technical Information of China (English)

    Sayar Ahmet; Pierce Marlon; Fox C.Geoffrey

    2014-01-01

    Replication is an approach often used to speed up the execution of queries submitted to a large dataset. A compile-time/run-time approach is presented for minimizing the response time of 2-dimensional range when a distributed replica of a dataset exists. The aim is to partition the query payload (and its range) into subsets and distribute those to the replica nodes in a way that minimizes a client’s response time. However, since query size and distribution characteristics of data (data dense/sparse regions) in varying ranges are not known a priori, performing efficient load balancing and parallel processing over the unpredictable workload is difficult. A technique based on the creation and manipulation of dynamic spatial indexes for query payload estimation in distributed queries was proposed. The effectiveness of this technique was demonstrated on queries for analysis of archived earthquake-generated seismic data records.

  11. Querying Big Data:Bridging Theory and Practice

    Institute of Scientific and Technical Information of China (English)

    樊文飞; 怀进鹏

    2014-01-01

    Big data introduces challenges to query answering, from theory to practice. A number of questions arise. What queries are “tractable” on big data? How can we make big data “small” so that it is feasible to find exact query answers? When exact answers are beyond reach in practice, what approximation theory can help us strike a balance between the quality of approximate query answers and the costs of computing such answers? To get sensible query answers in big data, what else do we necessarily do in addition to coping with the size of the data? This position paper aims to provide an overview of recent advances in the study of querying big data. We propose approaches to tackling these challenging issues, and identify open problems for future research.

  12. A Novel Adaptive Cuckoo Search for Optimal Query Plan Generation

    Directory of Open Access Journals (Sweden)

    Ramalingam Gomathi

    2014-01-01

    Full Text Available The emergence of multiple web pages day by day leads to the development of the semantic web technology. A World Wide Web Consortium (W3C standard for storing semantic web data is the resource description framework (RDF. To enhance the efficiency in the execution time for querying large RDF graphs, the evolving metaheuristic algorithms become an alternate to the traditional query optimization methods. This paper focuses on the problem of query optimization of semantic web data. An efficient algorithm called adaptive Cuckoo search (ACS for querying and generating optimal query plan for large RDF graphs is designed in this research. Experiments were conducted on different datasets with varying number of predicates. The experimental results have exposed that the proposed approach has provided significant results in terms of query execution time. The extent to which the algorithm is efficient is tested and the results are documented.

  13. Querying and Extracting Timeline Information from Road Traffic Sensor Data

    Directory of Open Access Journals (Sweden)

    Ardi Imawan

    2016-08-01

    Full Text Available The escalation of traffic congestion in urban cities has urged many countries to use intelligent transportation system (ITS centers to collect historical traffic sensor data from multiple heterogeneous sources. By analyzing historical traffic data, we can obtain valuable insights into traffic behavior. Many existing applications have been proposed with limited analysis results because of the inability to cope with several types of analytical queries. In this paper, we propose the QET (querying and extracting timeline information system—a novel analytical query processing method based on a timeline model for road traffic sensor data. To address query performance, we build a TQ-index (timeline query-index that exploits spatio-temporal features of timeline modeling. We also propose an intuitive timeline visualization method to display congestion events obtained from specified query parameters. In addition, we demonstrate the benefit of our system through a performance evaluation using a Busan ITS dataset and a Seattle freeway dataset.

  14. Emulating a million machines to investigate botnets.

    Energy Technology Data Exchange (ETDEWEB)

    Rudish, Donald W.

    2010-06-01

    Researchers at Sandia National Laboratories in Livermore, California are creating what is in effect a vast digital petridish able to hold one million operating systems at once in an effort to study the behavior of rogue programs known as botnets. Botnets are used extensively by malicious computer hackers to steal computing power fron Internet-connected computers. The hackers harness the stolen resources into a scattered but powerful computer that can be used to send spam, execute phishing, scams or steal digital information. These remote-controlled 'distributed computers' are difficult to observe and track. Botnets may take over parts of tens of thousands or in some cases even millions of computers, making them among the world's most powerful computers for some applications.

  15. Towards the Formalization of Fuzzy Relational Database Queries

    Directory of Open Access Journals (Sweden)

    Aleksandar Perović

    2009-03-01

    Full Text Available The aim of this paper is to give guidelines on how to formalize fuzzy relationaldatabase queries using 1LΠ 2 fuzzy logic. After the short introduction, we give anoverview of the1LΠ 2 logic. In the continuation we give a brief overview of the FRDBqueries and query-database similarity relation. We conclude the paper with the descriptionof FRDB query formalization using presented definitions.

  16. Learning from minimum entropy queries in a large committee machine

    CERN Document Server

    Sollich, P

    1996-01-01

    In supervised learning, the redundancy contained in random examples can be avoided by learning from queries. Using statistical mechanics, we study learning from minimum entropy queries in a large tree-committee machine. The generalization error decreases exponentially with the number of training examples, providing a significant improvement over the algebraic decay for random examples. The connection between entropy and generalization error in multi-layer networks is discussed, and a computationally cheap algorithm for constructing queries is suggested and analysed.

  17. Nearly Four Million Californians Are Food Insecure

    OpenAIRE

    Chaparro, M. Pia; Langellier, Brent; Birnbach, Kerry; Sharp, Kerry; Harrison, Gail

    2012-01-01

    Food insecurity has increased significantly among low-income Californians over the last decade. According to data from the 2009 California Health Interview Survey, 3.8 million adults in households with incomes at or below 200% of the Federal Poverty Level (FPL) could not afford enough food at least once in the previous year. Low-income households with children and Spanish-speaking households suffered from the worst levels of food insecurity. Expanding nutrition assistance programs, such as th...

  18. Error Checking for Chinese Query by Mining Web Log

    Directory of Open Access Journals (Sweden)

    Jianyong Duan

    2015-01-01

    Full Text Available For the search engine, error-input query is a common phenomenon. This paper uses web log as the training set for the query error checking. Through the n-gram language model that is trained by web log, the queries are analyzed and checked. Some features including query words and their number are introduced into the model. At the same time data smoothing algorithm is used to solve data sparseness problem. It will improve the overall accuracy of the n-gram model. The experimental results show that it is effective.

  19. The effect of query complexity on Web searching results

    Directory of Open Access Journals (Sweden)

    B.J. Jansen

    2000-01-01

    Full Text Available This paper presents findings from a study of the effects of query structure on retrieval by Web search services. Fifteen queries were selected from the transaction log of a major Web search service in simple query form with no advanced operators (e.g., Boolean operators, phrase operators, etc. and submitted to 5 major search engines - Alta Vista, Excite, FAST Search, Infoseek, and Northern Light. The results from these queries became the baseline data. The original 15 queries were then modified using the various search operators supported by each of the 5 search engines for a total of 210 queries. Each of these 210 queries was also submitted to the applicable search service. The results obtained were then compared to the baseline results. A total of 2,768 search results were returned by the set of all queries. In general, increasing the complexity of the queries had little effect on the results with a greater than 70% overlap in results, on average. Implications for the design of Web search services and directions for future research are discussed.

  20. Wild Card Queries for Searching Resources on the Web

    CERN Document Server

    Rafiei, Davood

    2009-01-01

    We propose a domain-independent framework for searching and retrieving facts and relationships within natural language text sources. In this framework, an extraction task over a text collection is expressed as a query that combines text fragments with wild cards, and the query result is a set of facts in the form of unary, binary and general $n$-ary tuples. A significance of our querying mechanism is that, despite being both simple and declarative, it can be applied to a wide range of extraction tasks. A problem in querying natural language text though is that a user-specified query may not retrieve enough exact matches. Unlike term queries which can be relaxed by removing some of the terms (as is done in search engines), removing terms from a wild card query without ruining its meaning is more challenging. Also, any query expansion has the potential to introduce false positives. In this paper, we address the problem of query expansion, and also analyze a few ranking alternatives to score the results and to r...

  1. Web Database Schema Identification through Simple Query Interface

    Science.gov (United States)

    Lin, Ling; Zhou, Lizhu

    Web databases provide different types of query interfaces to access the data records stored in the backend databases. While most existing works exploit a complex query interface with multiple input fields to perform schema identification of the Web databases, little attention has been paid on how to identify the schema of web databases by simple query interface (SQI), which has only one single query text input field. This paper proposes a new method of instance-based query probing to identify WDBs' interface and result schema for SQI. The interface schema identification problem is defined as generating the fullcondition query of SQI and a novel query probing strategy is proposed. The result schema is also identified based on the result webpages of SQI's full-condition query, and an extended identification of the non-query attributes is proposed to improve the attribute recall rate. Experimental results on web databases of online shopping for book, movie and mobile phone show that our method is effective and efficient.

  2. QVIZ: A FRAMEWORK FOR QUERYING AND VISUALIZING DATA

    Energy Technology Data Exchange (ETDEWEB)

    T. KEAHEY; P. MCCORMICK; ET AL

    2000-12-01

    Qviz is a lightweight, modular,and easy to use parallel system for interactive analytical query processing and visual presentation of large datasets. Qviz allows queries of arbitrary complexity to be easily constructed using a specialized scripting language. Visual presentation of the results is also easily achieved via simple scripted and interactive commands to our query-specific visualization tools. This paper describes our initial experiences with the Qviz system for querying and visualizing scientific datasets, showing how Qviz has been used in two different applications: ocean modeling and linear accelerator simulations.

  3. A Grammar Analysis Model for the Unified Multimedia Query Language

    Institute of Scientific and Technical Information of China (English)

    Zhong-Sheng Cao; Zong-Da Wu; Yuan-Zhen Wang

    2008-01-01

    The unified multimedia query language(UMQL) is a powerful general-purpose multimediaquery language, and it is very suitable for multimediainformation retrieval. The paper proposes a grammaranalysis model to implement an effective grammaticalprocessing for the language. It separates the grammaranalysis of a UMQL query specification into two phases:syntactic analysis and semantic analysis, and thenrespectively uses Backus-Naur form (EBNF) and logicalalgebra to specify both restrictive grammar rules. As aresult, the model can present error guiding informationfor a query specification which owns incorrect grammar.The model not only suits well the processing of UMQLqueries, but also has a guiding significance for otherprojects concerning query processings of descriptivequery languages.

  4. AQBE — QBE Style Queries for Archetyped Data

    Science.gov (United States)

    Sachdeva, Shelly; Yaginuma, Daigo; Chu, Wanming; Bhalla, Subhash

    Large-scale adoption of electronic healthcare applications requires semantic interoperability. The new proposals propose an advanced (multi-level) DBMS architecture for repository services for health records of patients. These also require query interfaces at multiple levels and at the level of semi-skilled users. In this regard, a high-level user interface for querying the new form of standardized Electronic Health Records system has been examined in this study. It proposes a step-by-step graphical query interface to allow semi-skilled users to write queries. Its aim is to decrease user effort and communication ambiguities, and increase user friendliness.

  5. Distributed query plan generation using multiobjective genetic algorithm.

    Science.gov (United States)

    Panicker, Shina; Kumar, T V Vijay

    2014-01-01

    A distributed query processing strategy, which is a key performance determinant in accessing distributed databases, aims to minimize the total query processing cost. One way to achieve this is by generating efficient distributed query plans that involve fewer sites for processing a query. In the case of distributed relational databases, the number of possible query plans increases exponentially with respect to the number of relations accessed by the query and the number of sites where these relations reside. Consequently, computing optimal distributed query plans becomes a complex problem. This distributed query plan generation (DQPG) problem has already been addressed using single objective genetic algorithm, where the objective is to minimize the total query processing cost comprising the local processing cost (LPC) and the site-to-site communication cost (CC). In this paper, this DQPG problem is formulated and solved as a biobjective optimization problem with the two objectives being minimize total LPC and minimize total CC. These objectives are simultaneously optimized using a multiobjective genetic algorithm NSGA-II. Experimental comparison of the proposed NSGA-II based DQPG algorithm with the single objective genetic algorithm shows that the former performs comparatively better and converges quickly towards optimal solutions for an observed crossover and mutation probability.

  6. Distributed Query Plan Generation Using Multiobjective Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Shina Panicker

    2014-01-01

    Full Text Available A distributed query processing strategy, which is a key performance determinant in accessing distributed databases, aims to minimize the total query processing cost. One way to achieve this is by generating efficient distributed query plans that involve fewer sites for processing a query. In the case of distributed relational databases, the number of possible query plans increases exponentially with respect to the number of relations accessed by the query and the number of sites where these relations reside. Consequently, computing optimal distributed query plans becomes a complex problem. This distributed query plan generation (DQPG problem has already been addressed using single objective genetic algorithm, where the objective is to minimize the total query processing cost comprising the local processing cost (LPC and the site-to-site communication cost (CC. In this paper, this DQPG problem is formulated and solved as a biobjective optimization problem with the two objectives being minimize total LPC and minimize total CC. These objectives are simultaneously optimized using a multiobjective genetic algorithm NSGA-II. Experimental comparison of the proposed NSGA-II based DQPG algorithm with the single objective genetic algorithm shows that the former performs comparatively better and converges quickly towards optimal solutions for an observed crossover and mutation probability.

  7. PAQ: Persistent Adaptive Query Middleware for Dynamic Environments

    Science.gov (United States)

    Rajamani, Vasanth; Julien, Christine; Payton, Jamie; Roman, Gruia-Catalin

    Pervasive computing applications often entail continuous monitoring tasks, issuing persistent queries that return continuously updated views of the operational environment. We present PAQ, a middleware that supports applications' needs by approximating a persistent query as a sequence of one-time queries. PAQ introduces an integration strategy abstraction that allows composition of one-time query responses into streams representing sophisticated spatio-temporal phenomena of interest. A distinguishing feature of our middleware is the realization that the suitability of a persistent query's result is a function of the application's tolerance for accuracy weighed against the associated overhead costs. In PAQ, programmers can specify an inquiry strategy that dictates how information is gathered. Since network dynamics impact the suitability of a particular inquiry strategy, PAQ associates an introspection strategy with a persistent query, that evaluates the quality of the query's results. The result of introspection can trigger application-defined adaptation strategies that alter the nature of the query. PAQ's simple API makes developing adaptive querying systems easily realizable. We present the key abstractions, describe their implementations, and demonstrate the middleware's usefulness through application examples and evaluation.

  8. Measuring persistence of implementation: QUERI Series

    Directory of Open Access Journals (Sweden)

    Asch Steven M

    2008-04-01

    Full Text Available Abstract As more quality improvement programs are implemented to achieve gains in performance, the need to evaluate their lasting effects has become increasingly evident. However, such long-term follow-up evaluations are scarce in healthcare implementation science, being largely relegated to the "need for further research" section of most project write-ups. This article explores the variety of conceptualizations of implementation sustainability, as well as behavioral and organizational factors that influence the maintenance of gains. It highlights the finer points of design considerations and draws on our own experiences with measuring sustainability, framed within the rich theoretical and empirical contributions of others. In addition, recommendations are made for designing sustainability analyses. This article is one in a Series of articles documenting implementation science frameworks and approaches developed by the U.S. Department of Veterans Affairs Quality Enhancement Research Initiative (QUERI.

  9. ODQ: A Fluid Office Document Query Language

    Directory of Open Access Journals (Sweden)

    Xuhong Liu

    2015-06-01

    Full Text Available Fluid office documents, as semi-structured data often represented by Extensible Markup Language (XML are important parts of Big Data. These office documents have different formats, and their matching Application Programming Interfaces (APIs depend on developing platform and versions, which causes difficulty in custom development and information retrieval from them. To solve this problem, we have been developing an office document query (ODQ language which provides a uniform method to retrieve content from documents with different formats and versions. ODQ builds common document model ontology to conceal the format details of documents and provides a uniform operation interface to handle office documents with different formats. The results show that ODQ has advantages in format independence, and can facilitate users in developing documents processing systems with good interoperability.

  10. Ontobee: A linked ontology data server to support ontology term dereferencing, linkage, query and integration.

    Science.gov (United States)

    Ong, Edison; Xiang, Zuoshuang; Zhao, Bin; Liu, Yue; Lin, Yu; Zheng, Jie; Mungall, Chris; Courtot, Mélanie; Ruttenberg, Alan; He, Yongqun

    2017-01-04

    Linked Data (LD) aims to achieve interconnected data by representing entities using Unified Resource Identifiers (URIs), and sharing information using Resource Description Frameworks (RDFs) and HTTP. Ontologies, which logically represent entities and relations in specific domains, are the basis of LD. Ontobee (http://www.ontobee.org/) is a linked ontology data server that stores ontology information using RDF triple store technology and supports query, visualization and linkage of ontology terms. Ontobee is also the default linked data server for publishing and browsing biomedical ontologies in the Open Biological Ontology (OBO) Foundry (http://obofoundry.org) library. Ontobee currently hosts more than 180 ontologies (including 131 OBO Foundry Library ontologies) with over four million terms. Ontobee provides a user-friendly web interface for querying and visualizing the details and hierarchy of a specific ontology term. Using the eXtensible Stylesheet Language Transformation (XSLT) technology, Ontobee is able to dereference a single ontology term URI, and then output RDF/eXtensible Markup Language (XML) for computer processing or display the HTML information on a web browser for human users. Statistics and detailed information are generated and displayed for each ontology listed in Ontobee. In addition, a SPARQL web interface is provided for custom advanced SPARQL queries of one or multiple ontologies. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. Hashing hyperplane queries to near points with applications to large-scale active learning.

    Science.gov (United States)

    Vijayanarasimhan, Sudheendra; Jain, Prateek; Grauman, Kristen

    2014-02-01

    We consider the problem of retrieving the database points nearest to a given hyperplane query without exhaustively scanning the entire database. For this problem, we propose two hashing-based solutions. Our first approach maps the data to 2-bit binary keys that are locality sensitive for the angle between the hyperplane normal and a database point. Our second approach embeds the data into a vector space where the euclidean norm reflects the desired distance between the original points and hyperplane query. Both use hashing to retrieve near points in sublinear time. Our first method's preprocessing stage is more efficient, while the second has stronger accuracy guarantees. We apply both to pool-based active learning: Taking the current hyperplane classifier as a query, our algorithm identifies those points (approximately) satisfying the well-known minimal distance-to-hyperplane selection criterion. We empirically demonstrate our methods' tradeoffs and show that they make it practical to perform active selection with millions of unlabeled points.

  12. Million Dollar Baby (2004 and Palliative Care

    Directory of Open Access Journals (Sweden)

    José Elías García Sánchez

    2008-10-01

    Full Text Available The worst misfortune that can befall an old, tormented and fearful boxing trainer is that the pupil he is training and of whom he is very fond should have a lesion as serious as a quadriplegia. This is the crux of the plot in Million Dollar Baby. A person who suffers a quadriplegia sees how most of her physical and sensorial abilities disappear and habitually suffers psychological disturbances requiring palliative medical care. Relatives are subjected to great stress and suffering. All these aspects are reflected, in general accurately, in the film.

  13. Classifying queries submitted to a vertical search engine

    NARCIS (Netherlands)

    Berendsen, R.; Kovachev, B.; Meij, E.; de Rijke, M.; Weerkamp, W.

    2011-01-01

    We propose and motivate a scheme for classifying queries submitted to a people search engine. We specify a number of features for automatically classifying people queries into the proposed classes and examine the eectiveness of these features. Our main nding is that classication is feasible and that

  14. The Acoi Algebra: a Query Algebra for Image Retrieval Systems

    NARCIS (Netherlands)

    Nes, N.J.; Kersten, M.L.

    1998-01-01

    Content-based image retrieval systems rely on a query-by-example technique often using a limited set of global image features. This leads to a rather coarse-grain approach to locate images. The next step is to concentrate on queries over spatial relations amongst objects within the images. This call

  15. Group-by Skyline Query Processing in Relational Engines

    DEFF Research Database (Denmark)

    Yiu, Man Lung; Luk, Ming-Hay; Lo, Eric

    2009-01-01

    the missing cost model for the BBS algorithm. Experimental results show that our techniques are able to devise the best query plans for a variety of group-by skyline queries. Our focus is on algorithms that can be directly implemented in today's commercial database systems without the addition of new access...

  16. Ontology Based Queries - Investigating a Natural Language Interface

    NARCIS (Netherlands)

    van der Sluis, Ielka; Hielkema, F.; Mellish, C.; Doherty, G.

    2010-01-01

    In this paper we look at what may be learned from a comparative study examining non-technical users with a background in social science browsing and querying metadata. Four query tasks were carried out with a natural language interface and with an interface that uses a web paradigm with hyperlinks.

  17. Dynamic Query Optimization Approach for Semantic Database Grid

    Institute of Scientific and Technical Information of China (English)

    Xiao-Qing Zheng; Hua-Jun Chen; Zhao-Hui Wu; Yu-Xin Mao

    2006-01-01

    Fundamentally, semantic grid database is about bringing globally distributed databases together in order to coordinate resource sharing and problem solving in which information is given well-defined meaning, and DartGrid Ⅱ is the implemented database gird system whose goal is to provide a semantic solution for integrating database resources on the Web.Although many algorithms have been proposed for optimizing query-processing in order to minimize costs and/or response time, associated with obtaining the answer to query in a distributed database system, database grid query optimization problem is fundamentally different from traditional distributed query optimization. These differences are shown to be the consequences of autonomy and heterogeneity of database nodes in database grid. Therefore, more challenges have arisen for query optimization in database grid than traditional distributed database. Following this observation, the design of a query optimizer in DartGrid Ⅱ is presented, and a heuristic, dynamic and parallel query optimization approach to processing query in database grid is proposed. A set of semantic tools supporting relational database integration and semantic-based information browsing has also been implemented to realize the above vision.

  18. Multidimensional indexing structure for use with linear optimization queries

    Science.gov (United States)

    Bergman, Lawrence David (Inventor); Castelli, Vittorio (Inventor); Chang, Yuan-Chi (Inventor); Li, Chung-Sheng (Inventor); Smith, John Richard (Inventor)

    2002-01-01

    Linear optimization queries, which usually arise in various decision support and resource planning applications, are queries that retrieve top N data records (where N is an integer greater than zero) which satisfy a specific optimization criterion. The optimization criterion is to either maximize or minimize a linear equation. The coefficients of the linear equation are given at query time. Methods and apparatus are disclosed for constructing, maintaining and utilizing a multidimensional indexing structure of database records to improve the execution speed of linear optimization queries. Database records with numerical attributes are organized into a number of layers and each layer represents a geometric structure called convex hull. Such linear optimization queries are processed by searching from the outer-most layer of this multi-layer indexing structure inwards. At least one record per layer will satisfy the query criterion and the number of layers needed to be searched depends on the spatial distribution of records, the query-issued linear coefficients, and N, the number of records to be returned. When N is small compared to the total size of the database, answering the query typically requires searching only a small fraction of all relevant records, resulting in a tremendous speedup as compared to linearly scanning the entire dataset.

  19. Persuasive Elements of 100 Successful Magazine Query Letters.

    Science.gov (United States)

    Jolliffe, Lee

    Building from scholarly works on persuasion and compliance-gaining, a study investigated magazine query letters that attempt to persuade an editor to buy the article offered, examining what message elements make them successful. Forty magazine editors provided copies of 100 recently accepted magazine query letters, which were compared with 50…

  20. Time-sensitive personalized query auto-completion

    NARCIS (Netherlands)

    Cai, F.; Liang, S.; de Rijke, M.; Li, J.; Wang, X.S.

    2014-01-01

    Query auto-completion (QAC) is a prominent feature of modern search engines. It is aimed at saving user's time and enhancing the search experience. Current QAC models mostly rank matching QAC candidates according to their past popularity, i.e., frequency. However, query popularity changes over time

  1. The Imposed Query: Implications for Library Service Evaluation.

    Science.gov (United States)

    Gross, Melissa

    1998-01-01

    Explores the potential impact of imposed query, a new model of information-seeking behavior, on current approaches to library service and system evaluation. Discusses reference service evaluation, user studies, output measures, and relevance as an evaluation tool. Argues that imposed query broadens understanding of the user and of the role that…

  2. Real SQL queries 50 challenges : practice for reporting and analysis

    CERN Document Server

    Cohen, Brian; Mishra, Neerja

    2015-01-01

    Queries improve when challenges are authentic. This book sets your learning on the fast track with realistic problems to solve. Topics span sales, marketing, human resources, purchasing, and production. Real SQL Queries: 50 Challenges is perfect for analysts, report writers, or anyone searching for a hands-on approach to learning SQL Server.

  3. Topology-free querying of protein interaction networks.

    Science.gov (United States)

    Bruckner, Sharon; Hüffner, Falk; Karp, Richard M; Shamir, Ron; Sharan, Roded

    2010-03-01

    In the network querying problem, one is given a protein complex or pathway of species A and a protein-protein interaction network of species B; the goal is to identify subnetworks of B that are similar to the query in terms of sequence, topology, or both. Existing approaches mostly depend on knowledge of the interaction topology of the query in the network of species A; however, in practice, this topology is often not known. To address this problem, we develop a topology-free querying algorithm, which we call Torque. Given a query, represented as a set of proteins, Torque seeks a matching set of proteins that are sequence-similar to the query proteins and span a connected region of the network, while allowing both insertions and deletions. The algorithm uses alternatively dynamic programming and integer linear programming for the search task. We test Torque with queries from yeast, fly, and human, where we compare it to the QNet topology-based approach, and with queries from less studied species, where only topology-free algorithms apply. Torque detects many more matches than QNet, while giving results that are highly functionally coherent.

  4. Low Redundancy in Static Dictionaries with Constant Query Time

    DEFF Research Database (Denmark)

    Pagh, Rasmus

    2001-01-01

    A static dictionary is a data structure for storing subsets of a finite universe U, so that membership queries can be answered efficiently. We study this problem in a unit cost RAM model with word size Ω(log |U|), and show that for n-element subsets, constant worst case query time can be obtained...

  5. Query Classification and Study of University Students' Search Trends

    Science.gov (United States)

    Maabreh, Majdi A.; Al-Kabi, Mohammed N.; Alsmadi, Izzat M.

    2012-01-01

    Purpose: This study is an attempt to develop an automatic identification method for Arabic web queries and divide them into several query types using data mining. In addition, it seeks to evaluate the impact of the academic environment on using the internet. Design/methodology/approach: The web log files were collected from one of the higher…

  6. On the query complexity of finding a local maximum point

    NARCIS (Netherlands)

    Rastsvelaev, A.L.; Beklemishev, L.D.

    2008-01-01

    We calculate the minimal number of queries sufficient to find a local maximum point of a functiun on a discrete interval for a model with M parallel queries, M≥1. Matching upper and lower bounds are obtained. The bounds are formulated in terms of certain Fibonacci type sequences of numbers.

  7. On the Suitability of Skyline Queries for Data Exploration

    DEFF Research Database (Denmark)

    Chester, Sean; Mortensen, Michael Lind; Assent, Ira

    2014-01-01

    The skyline operator has been studied in database research for multi-criteria decision making. Until now the focus has been on the efficiency or accuracy of single queries. In practice, however, users are increasingly confronted with unknown data collections, where precise query formulation prove...

  8. How many functions can be distinguished with k quantum queries?

    CERN Document Server

    Farhi, E; Gutmann, S; Sipser, M

    1999-01-01

    Suppose an oracle is known to hold one of a given set of D two-valued functions. To successfully identify which function the oracle holds with k classical queries, it must be the case that D is at most 2^k. In this paper we derive a bound for how many functions can be distinguished with k quantum queries.

  9. Approximate furthest neighbor with application to annulus query

    DEFF Research Database (Denmark)

    Pagh, Rasmus; Silvestri, Francesco; Sivertsen, Johan von Tangen

    2016-01-01

    -dimensional Euclidean space. The method builds on the technique of Indyk (SODA 2003), storing random projections to provide sublinear query time for AFN. However, we introduce a different query algorithm, improving on Indyk׳s approximation factor and reducing the running time by a logarithmic factor. We also present...

  10. LHC collars - 12 million high technology gems

    CERN Multimedia

    2001-01-01

    Some 12 million steel collars will keep the LHC dipole magnet structures rigid. Their production has just begun. A huge job began last week: the high speed manufacturing of twelve million steel collars for the 1250 dipole magnets of the future Large Hadron Collider, LHC. The challenge is not only a matter of quantity: these collars are very high technology components because of the important role they play in the way the collider works. One of the main difficulties with the accelerator is that the magnetic field that keeps particles in orbit must have the same configuration and intensity in all the dipoles. But when the 8.33 tesla magnetic field is on -100.000 times the earth magnetic field - it produces a very strong force that can deform the 'soft' parts of the magnets, such as superconducting coils. The force loading one metre of dipole is almost comparable with the weight of a Boeing 747 - about 400 tonnes - so a huge deformation would occur without a mechanical component to keep the whole structure rigid...

  11. Interactive Graph Layout of a Million Nodes

    Directory of Open Access Journals (Sweden)

    Peng Mi

    2016-12-01

    Full Text Available Sensemaking of large graphs, specifically those with millions of nodes, is a crucial task in many fields. Automatic graph layout algorithms, augmented with real-time human-in-the-loop interaction, can potentially support sensemaking of large graphs. However, designing interactive algorithms to achieve this is challenging. In this paper, we tackle the scalability problem of interactive layout of large graphs, and contribute a new GPU-based force-directed layout algorithm that exploits graph topology. This algorithm can interactively layout graphs with millions of nodes, and support real-time interaction to explore alternative graph layouts. Users can directly manipulate the layout of vertices in a force-directed fashion. The complexity of traditional repulsive force computation is reduced by approximating calculations based on the hierarchical structure of multi-level clustered graphs. We evaluate the algorithm performance, and demonstrate human-in-the-loop layout in two sensemaking case studies. Moreover, we summarize lessons learned for designing interactive large graph layout algorithms on the GPU.

  12. Query log analysis of an electronic health record search engine.

    Science.gov (United States)

    Yang, Lei; Mei, Qiaozhu; Zheng, Kai; Hanauer, David A

    2011-01-01

    We analyzed a longitudinal collection of query logs of a full-text search engine designed to facilitate information retrieval in electronic health records (EHR). The collection, 202,905 queries and 35,928 user sessions recorded over a course of 4 years, represents the information-seeking behavior of 533 medical professionals, including frontline practitioners, coding personnel, patient safety officers, and biomedical researchers for patient data stored in EHR systems. In this paper, we present descriptive statistics of the queries, a categorization of information needs manifested through the queries, as well as temporal patterns of the users' information-seeking behavior. The results suggest that information needs in medical domain are substantially more sophisticated than those that general-purpose web search engines need to accommodate. Therefore, we envision there exists a significant challenge, along with significant opportunities, to provide intelligent query recommendations to facilitate information retrieval in EHR.

  13. Query Recommendation by Coupling Personalization with Clustering for Search Engine

    Directory of Open Access Journals (Sweden)

    Dhiliphanrajkumar.Thambidurai

    2016-11-01

    Full Text Available In the present world internet and web search engines have become an important part in one’s day-today life. For a user query, more than few thousand web pages are retrieved but most of them are irrelevant. A major problem in search engine is that the user queries are usually short and ambiguous, and they are not sufficient to satisfy the precise user needs. Also listing more number of results according to user make them worry about searching the desired results and it takes large amount of time to search from the huge list of results. To overcome all the problems, an effective approach is developed by capturing the users’ click through and bookmarking data to provide personalized query recommendation. For retrieving the results, Google API is used. Experimental results show that the proposed method is providing better query recommendation results than the existing query suggestion methods.

  14. Pareto-depth for multiple-query image retrieval.

    Science.gov (United States)

    Hsiao, Ko-Jen; Calder, Jeff; Hero, Alfred O

    2015-02-01

    Most content-based image retrieval systems consider either one single query, or multiple queries that include the same object or represent the same semantic information. In this paper, we consider the content-based image retrieval problem for multiple query images corresponding to different image semantics. We propose a novel multiple-query information retrieval algorithm that combines the Pareto front method with efficient manifold ranking. We show that our proposed algorithm outperforms state of the art multiple-query retrieval algorithms on real-world image databases. We attribute this performance improvement to concavity properties of the Pareto fronts, and prove a theoretical result that characterizes the asymptotic concavity of the fronts.

  15. Evaluation of Query Generators for Entity Search Engines

    CERN Document Server

    Endrullis, Stefan; Rahm, Erhard

    2010-01-01

    Dynamic web applications such as mashups need efficient access to web data that is only accessible via entity search engines (e.g. product or publication search engines). However, most current mashup systems and applications only support simple keyword searches for retrieving data from search engines. We propose the use of more powerful search strategies building on so-called query generators. For a given set of entities query generators are able to automatically determine a set of search queries to retrieve these entities from an entity search engine. We demonstrate the usefulness of query generators for on-demand web data integration and evaluate the effectiveness and efficiency of query generators for a challenging real-world integration scenario.

  16. Multiple k Nearest Neighbor Query Processing in Spatial Network Databases

    DEFF Research Database (Denmark)

    Xuegang, Huang; Jensen, Christian Søndergaard; Saltenis, Simonas

    2006-01-01

    This paper concerns the efficient processing of multiple k nearest neighbor queries in a road-network setting. The assumed setting covers a range of scenarios such as the one where a large population of mobile service users that are constrained to a road network issue nearest-neighbor queries...... for points of interest that are accessible via the road network. Given multiple k nearest neighbor queries, the paper proposes progressive techniques that selectively cache query results in main memory and subsequently reuse these for query processing. The paper initially proposes techniques for the case...... where an upper bound on k is known a priori and then extends the techniques to the case where this is not so. Based on empirical studies with real-world data, the paper offers insight into the circumstances under which the different proposed techniques can be used with advantage for multiple k nearest...

  17. Cross Lingual Information Retrieval With SMT And Query Mining

    Directory of Open Access Journals (Sweden)

    Suneet Kumar Gupta

    2011-10-01

    Full Text Available In this paper, we have taken the English Corpus and Queries, both translated and transliterated form. We use Statistical Machine Translator to find the result under translated and transliterated queries and then analyzed the result. These queries wise results can then be undergone mining and therefore a new list of queries is created. We have design an experimental setup followed by various steps which calculate Mean Average Precision. We have taken assistance ship of Terrier Open Source for the Information Retrieval. On the basis of created new query list, we calculate the Mean Average Precision and find a significant result i.e. 93.24% which is very close to monolingual results calculated for English language.

  18. A new approach to query expansion in information retrieval

    Institute of Scientific and Technical Information of China (English)

    Li Weijiang; Zhao Tiejun; Wang Xiangang

    2008-01-01

    To eliminate the mismatch between words of relevant documents and user's query and more serious negative effects it has on the performance of information retrieval,a method of query expansion on the basis of new terms co-occurrence representation was put forward by analyzing the process of producing query. The expansion terms were selected according to their correlation to the whole query. At the same time, the position information between terms were considered. The experimental result on test retrieval conference (TREC) data collection shows that the method proposed in the paper has made an improvement of 5%~19% all the time than the language modeling method without expansion. Compared to the popular approach of query expansion, pseudo feedback, the precision of the proposed method is competitive.

  19. Query Intent Disambiguation of Keyword-Based Semantic Entity Search in Dataspaces

    Institute of Scientific and Technical Information of China (English)

    Dan Yang; De-Rong Shen; Ge Yu; Yue Kou; Tie-Zheng Nie

    2013-01-01

    Keyword query has attracted much research attention due to its simplicity and wide applications.The inherent ambiguity of keyword query is prone to unsatisfied query results.Moreover some existing techniques on Web query,keyword query in relational databases and XML databases cannot be completely applied to keyword query in dataspaces.So we propose KeymanticES,a novel keyword-based semantic entity search mechanism in dataspaces which combines both keyword query and semantic query features.And we focus on query intent disambiguation problem and propose a novel three-step approach to resolve it.Extensive experimental results show the effectiveness and correctness of our proposed approach.

  20. Twenty-first century vaccines

    Science.gov (United States)

    Rappuoli, Rino

    2011-01-01

    In the twentieth century, vaccination has been possibly the greatest revolution in health. Together with hygiene and antibiotics, vaccination led to the elimination of many childhood infectious diseases and contributed to the increase in disability-free life expectancy that in Western societies rose from 50 to 78–85 years (Crimmins, E. M. & Finch, C. E. 2006 Proc. Natl Acad. Sci. USA 103, 498–503; Kirkwood, T. B. 2008 Nat. Med 10, 1177–1185). In the twenty-first century, vaccination will be expected to eliminate the remaining childhood infectious diseases, such as meningococcal meningitis, respiratory syncytial virus, group A streptococcus, and will address the health challenges of this century such as those associated with ageing, antibiotic resistance, emerging infectious diseases and poverty. However, for this to happen, we need to increase the public trust in vaccination so that vaccines can be perceived as the best insurance against most diseases across all ages. PMID:21893537

  1. Query-Driven Visualization and Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Ruebel, Oliver; Bethel, E. Wes; Prabhat, Mr.; Wu, Kesheng

    2012-11-01

    This report focuses on an approach to high performance visualization and analysis, termed query-driven visualization and analysis (QDV). QDV aims to reduce the amount of data that needs to be processed by the visualization, analysis, and rendering pipelines. The goal of the data reduction process is to separate out data that is "scientifically interesting'' and to focus visualization, analysis, and rendering on that interesting subset. The premise is that for any given visualization or analysis task, the data subset of interest is much smaller than the larger, complete data set. This strategy---extracting smaller data subsets of interest and focusing of the visualization processing on these subsets---is complementary to the approach of increasing the capacity of the visualization, analysis, and rendering pipelines through parallelism. This report discusses the fundamental concepts in QDV, their relationship to different stages in the visualization and analysis pipelines, and presents QDV's application to problems in diverse areas, ranging from forensic cybersecurity to high energy physics.

  2. Query by image example: The CANDID approach

    Energy Technology Data Exchange (ETDEWEB)

    Kelly, P.M.; Cannon, M. [Los Alamos National Lab., NM (United States). Computer Research and Applications Group; Hush, D.R. [Univ. of New Mexico, Albuquerque, NM (United States). Dept. of Electrical and Computer Engineering

    1995-02-01

    CANDID (Comparison Algorithm for Navigating Digital Image Databases) was developed to enable content-based retrieval of digital imagery from large databases using a query-by-example methodology. A user provides an example image to the system, and images in the database that are similar to that example are retrieved. The development of CANDID was inspired by the N-gram approach to document fingerprinting, where a ``global signature`` is computed for every document in a database and these signatures are compared to one another to determine the similarity between any two documents. CANDID computes a global signature for every image in a database, where the signature is derived from various image features such as localized texture, shape, or color information. A distance between probability density functions of feature vectors is then used to compare signatures. In this paper, the authors present CANDID and highlight two results from their current research: subtracting a ``background`` signature from every signature in a database in an attempt to improve system performance when using inner-product similarity measures, and visualizing the contribution of individual pixels in the matching process. These ideas are applicable to any histogram-based comparison technique.

  3. An Optimal Labeling Scheme for Ancestry Queries

    CERN Document Server

    Fraigniaud, Pierre

    2009-01-01

    An ancestry labeling scheme assigns labels (bit strings) to the nodes of rooted trees such that ancestry queries between any two nodes in a tree can be answered merely by looking at their corresponding labels. The quality of an ancestry labeling scheme is measured by its label size, that is the maximal number of bits in a label of a tree node. In addition to its theoretical appeal, the design of efficient ancestry labeling schemes is motivated by applications in web search engines. For this purpose, even small improvements in the label size are important. In fact, the literature about this topic is interested in the exact label size rather than just its order of magnitude. As a result, following the proposal of a simple interval-based ancestry scheme with label size $2\\log_2 n$ bits (Kannan et al., STOC '88), a considerable amount of work was devoted to improve the bound on the size of a label. The current state of the art upper bound is $\\log_2 n + O(\\sqrt{\\log n})$ bits (Abiteboul et al., SODA '02) which is...

  4. ALGORITMA RC4 DALAM PROTEKSI TRANSMISI DAN HASIL QUERY UNTUK ORDBMS POSTGRESQL

    Directory of Open Access Journals (Sweden)

    Yuri Ariyanto

    2009-01-01

    Full Text Available In this research will be worked through about how cryptography RC4's algorithm implementation in protection to query result and of query, security by encryption and descryption up to both is in network. Implementation of this research which is build software in client that function access databases that is placed by the side of server. Software that building to have facility for encryption and descryption query result and of query that is sent from client goes to server and. transmission query result and of query can secure its security. Well guaranted transmission security him of query result and of query can be told to succeed if success software can encryption query result and of query which transmission so that in the event of scanning to both, scanning will not understand data content. Conclusion of this research that is woke up software succeed encryption query and result of query which transmission between application of client and of server databases. Abstract in Bahasa Indonesia: Pada penelitian ini dibahas mengenai bagaimana mengimplementasikan algoritma kriptografi RC4 dalam proteksi terhadap query dan hasil query, pengamanan dilakukan dengan cara melakukan enkripsi dan dekripsi selama keduanya berada di dalam jaringan. Pengimplementasian dari penelitian ini yaitu membangun sebuah software yang akan diletakkan di sisi client yang berfungsi mengakses database yang diletakkan di sisi server. Software yang dibangun memiliki fasilitas untuk mengenkripsi dan mendektipsi query dan hasil query yang dikirimkan dari client ke server dan juga sebaliknya. Dengan demikian tramsmisi query dan hasil query dapat terjamin keamanannya.Terjaminnya keamanan transmisi query dan hasil query dapat dikatakan berhasil jika software berhasil mengenkripsi query dan hasil query yang ditransmisikan sehingga apabila terjadi penyadapan terhadap keduanya, penyadap tidak akan mengerti isi data tersebut. Kesimpulan dari penelitian ini yaitu software yang dibangun

  5. Keyword Query over Error-Tolerant Knowledge Bases

    Institute of Scientific and Technical Information of China (English)

    Yu-Rong Cheng; Ye Yuan; Jia-Yu Li; Lei Chen; Guo-Ren Wang

    2016-01-01

    With more and more knowledge provided by WWW, querying and mining the knowledge bases have attracted much research attention. Among all the queries over knowledge bases, which are usually modelled as graphs, a keyword query is the most widely used one. Although the problem of keyword query over graphs has been deeply studied for years, knowledge bases, as special error-tolerant graphs, lead to the results of the traditional defined keyword queries out of users’ satisfaction. Thus, in this paper, we define a new keyword query, called confident r-clique, specific for knowledge bases based on the r-clique definition for keyword query on general graphs, which has been proved to be the best one. However, as we prove in the paper, finding the confident r-cliques is #P-hard. We propose a filtering-and-verification framework to improve the search efficiency. In the filtering phase, we develop the tightest upper bound of the confident r-clique, and design an index together with its search algorithm, which suits the large scale of knowledge bases well. In the verification phase, we develop an efficient sampling method to verify the final answers from the candidates remaining in the filtering phase. Extensive experiments demonstrate that the results derived from our new definition satisfy the users’ requirement better compared with the traditional r-clique definition, and our algorithms are efficient.

  6. An Architecture for Handling Fuzzy Queries in Data Warehouses

    Science.gov (United States)

    Singh, Manu Pratap; Tiwari, Rajdev; Mahajan, Manish; Dani, Diksha

    This paper presents an augmented architecture of Data Warehouse for fuzzy query handling to improve the performance of Data Mining process. The performance of Data Mining may become worst while mining the fuzzy information from the large Data Warehouses. There are number of preprocessing steps suggested and implemented so far to support the mining process. But querying large Data warehouses for fuzzy information is still a challenging task for the researchers’ community. The model proposed here may provide a more realistic and powerful technique for handling the vague queries directly. The basic idea behind the creation of Data Warehouses is to integrate a large amount of pre-fetched data and information from the distributed sources for direct querying and analysis .But the end user’s queries contain the maximum fuzziness and to handle those queries directly may not yield the desired response. So the model proposed here will create a fuzzy extension of Data warehouse by applying Neuro-Fuzzy technique and the fuzzy queries then will get handled directly by the extension of data warehouse.

  7. Research in Mobile Database Query Optimization and Processing

    Directory of Open Access Journals (Sweden)

    Agustinus Borgy Waluyo

    2005-01-01

    Full Text Available The emergence of mobile computing provides the ability to access information at any time and place. However, as mobile computing environments have inherent factors like power, storage, asymmetric communication cost, and bandwidth limitations, efficient query processing and minimum query response time are definitely of great interest. This survey groups a variety of query optimization and processing mechanisms in mobile databases into two main categories, namely: (i query processing strategy, and (ii caching management strategy. Query processing includes both pull and push operations (broadcast mechanisms. We further classify push operation into on-demand broadcast and periodic broadcast. Push operation (on-demand broadcast relates to designing techniques that enable the server to accommodate multiple requests so that the request can be processed efficiently. Push operation (periodic broadcast corresponds to data dissemination strategies. In this scheme, several techniques to improve the query performance by broadcasting data to a population of mobile users are described. A caching management strategy defines a number of methods for maintaining cached data items in clients' local storage. This strategy considers critical caching issues such as caching granularity, caching coherence strategy and caching replacement policy. Finally, this survey concludes with several open issues relating to mobile query optimization and processing strategy.

  8. Enhancing the view of a million galaxies

    Science.gov (United States)

    2004-06-01

    Composite image hi-res Size hi-res: 851 KB Credits: ESA/Univ. of Leicester/I. Stewart and M. Watson XMM-Newton X-ray spectral colour composite image XMM-Newton X-ray spectral colour composite image of the Subaru/XMM-Newton Deep Field. The view gives an X-ray pseudo-colour representation of all the sources, coded according to their X-ray energy. More energetic sources are shown in blue and less energetic ones in red. This mosaic image, composed of 7 partially overlapping pointings, maps the full extent of the SXDF and corresponds to an exposure time exceeding one hundred hours. These data form the largest contiguous area over which deep X-ray observations have been performed. Composite image hi-res Size hi-res: 6215 KB Credits: NAOJ/Subaru Telescope XMM-Newton/Subaru colour composite image A colour composite image obtained by combining data taken with the Subaru Telescope in blue, red and near-infrared light. The image, worth over two hundred hours of exposure time, covers an area of sky seven times larger than the full moon. The images in blue light show details several hundred million times fainter than what can be seen with the naked eye. SXDS field hi-res Size hi-res: 448 KB Credits: NAOJ/Subaru Telescope SXDS field A particular of the SXDS field. The teardrop-shaped galaxy in the upper right portion of the frame is likely to have suffered from a collision with another galaxy. SXDS field hi-res Size hi-res: 358 KB Credits: NAOJ/Subaru Telescope SXDS field A particular of the SXDS field. The prominent spiral galaxy near the centre may be ineracting with a less-conspicuous dwarf galaxy to its lower right. One of the fundamental goals of modern astronomy is understanding the history of the Universe, and in particular learning about the processes that shape the formation and evolution of galaxies. To observe these processes as they unfold, astronomers must survey galaxies near and far, spanning a large enough volume of the Universe, so that local variations in the

  9. AbIx: An Approach to Content-Based Approximate Query Processing in Peer-to-Peer Data Systems

    Institute of Scientific and Technical Information of China (English)

    Chao-Kun Wang; Jian-Min Wang; Jia-Guang Sun; Sheng-Fei Shi; Hong Gao

    2007-01-01

    In recent years there has been a significant interest in peer-to-peer (P2P) environments in the community ofdata management. However, almost all work, so far, is focused on exact query processing in current P2P data systems.The autonomy of peers also is not considered enough. In addition, the system cost is very high because the informationpublishing method of shared data is based on each document instead of document set. In this paper, abstract indices (AbIx)are presented to implement content-based approximate queries in centralized, distributed and structured P2P data systems.It can be used to search as few peers as possible but get as many returns satisfying users' queries as possible on the guaranteeof high autonomy of peers. Also, abstract indices have low system cost, can improve the query processing speed, and supportvery frequent updates and the set information publishing method. In order to verify the effectiveness of abstract indices, asimulator of 10,000 peers, over 3 million documents is made, and several metrics are proposed. The experimental results showthat abstract indices work well in various P2P data systems.

  10. jQuery UI 1.7 the user interface library for jQuery

    CERN Document Server

    Wellman, Dan

    2009-01-01

    An example-based approach leads you step-by-step through the implementation and customization of each library component and its associated resources in turn. To emphasize the way that jQuery UI takes the difficulty out of user interface design and implementation, each chapter ends with a 'fun with' section that puts together what you've learned throughout the chapter to make a usable and fun page. In these sections you'll often get to experiment with the latest associated technologies like AJAX and JSON. This book is for front-end designers and developers who need to quickly learn how to use t

  11. jQuery 2.0 animation techniques beginner's guide

    CERN Document Server

    Culpepper, Adam

    2013-01-01

    This book is a guide to help you create attractive web page animations using jQuery. Written in a friendly and engaging approach this book is designed to be placed alongside your computer as a mentor.If you are a web designer or a frontend developer or if you want to learn how to animate the user interface of your web applications with jQuery, this book is for you. Experience with jQuery or Javascript would be helpful but solid knowledge base of HTML and CSS is assumed.

  12. Intelligent query processing for semantic mediation of information systems

    Directory of Open Access Journals (Sweden)

    Saber Benharzallah

    2011-11-01

    Full Text Available We propose an intelligent and an efficient query processing approach for semantic mediation of information systems. We propose also a generic multi agent architecture that supports our approach. Our approach focuses on the exploitation of intelligent agents for query reformulation and the use of a new technology for the semantic representation. The algorithm is self-adapted to the changes of the environment, offers a wide aptitude and solves the various data conflicts in a dynamic way; it also reformulates the query using the schema mediation method for the discovered systems and the context mediation for the other systems.

  13. Joint Top-K Spatial Keyword Query Processing

    DEFF Research Database (Denmark)

    Wu, Dingming; Yiu, Man Lung; Cong, Gao

    2012-01-01

    Web users and content are increasingly being geopositioned, and increased focus is being given to serving local content in response to web queries. This development calls for spatial keyword queries that take into account both the locations and textual descriptions of content. We study...... keyword queries. Empirical studies show that the proposed solution is efficient on real data sets. We also offer analytical studies on synthetic data sets to demonstrate the efficiency of the proposed solution. Index Terms IEEE Terms Electronic mail , Google , Indexes , Joints , Mobile communication...

  14. Design of Intelligent layer for flexible querying in databases

    CERN Document Server

    Nihalani, Mrs Neelu; Motwani, Dr Mahesh

    2009-01-01

    Computer-based information technologies have been extensively used to help many organizations, private companies, and academic and education institutions manage their processes and information systems hereby become their nervous centre. The explosion of massive data sets created by businesses, science and governments necessitates intelligent and more powerful computing paradigms so that users can benefit from this data. Therefore most new-generation database applications demand intelligent information management to enhance efficient interactions between database and the users. Database systems support only a Boolean query model. A selection query on SQL database returns all those tuples that satisfy the conditions in the query.

  15. Processing Constrained K Closest Pairs Query in Spatial Databases

    Institute of Scientific and Technical Information of China (English)

    LIU Xiaofeng; LIU Yunsheng; XIAO Yingyuan

    2006-01-01

    In this paper, constrained K closest pairs query is introduced, which retrieves the K closest pairs satisfying the given spatial constraint from two datasets. For data sets indexed by R-trees in spatial databases, three algorithms are presented for answering this kind of query. Among of them,two-phase Range+Join and Join+Range algorithms adopt the strategy that changes the execution order of range and closest pairs queries, and constrained heap-based algorithm utilizes extended distance functions to prune search space and minimize the pruning distance. Experimental results show that constrained heap-base algorithm has better applicability and performance than two-phase algorithms.

  16. Materialized View Selection by Query Clustering in XML Data Warehouses

    CERN Document Server

    Mahboubi, Hadj; Darmont, Jérôme

    2008-01-01

    XML data warehouses form an interesting basis for decision-support applications that exploit complex data. However, native XML database management systems currently bear limited performances and it is necessary to design strategies to optimize them. In this paper, we propose an automatic strategy for the selection of XML materialized views that exploits a data mining technique, more precisely the clustering of the query workload. To validate our strategy, we implemented an XML warehouse modeled along the XCube specifications. We executed a workload of XQuery decision-support queries on this warehouse, with and without using our strategy. Our experimental results demonstrate its efficiency, even when queries are complex.

  17. Preliminary study into query translation for patent retrieval

    DEFF Research Database (Denmark)

    Jochim, C.; Lioma, Christina; Schütze, H.

    2010-01-01

    boundaries do not hinder their accessibility. This multilingual-ity of patent collections offers opportunities for improving patent retrieval. In this work we exploit these opportunities by applying query translation to patent retrieval. We expand monolingual patent queries with their translations, using......, but not always, and without great improvement compared to standard statistical monolingual query expansion (Rocchio). The improvement is greater when the source language is English, as opposed to French or German, a finding partly due to the effect of the complex French and German morphology upon translation...

  18. WISE photometry for 400 million SDSS sources

    CERN Document Server

    Lang, Dustin; Schlegel, David J

    2014-01-01

    We present photometry of images from the Wide-Field Infrared Survey Explorer (WISE; Wright et al. 2010) of over 400 million sources detected by the Sloan Digital Sky Survey (SDSS; York et al. 2000). We use a "forced photometry" technique, using measured SDSS source positions, star-galaxy separation and galaxy profiles to define the sources whose fluxes are to be measured in the WISE images. We perform photometry with The Tractor image modeling code, working on our "unWISE" coaddds and taking account of the WISE point-spread function and a noise model. The result is a measurement of the flux of each SDSS source in each WISE band. Many sources have little flux in the WISE bands, so often the measurements we report are consistent with zero. However, for many sources we get three- or four-sigma measurements; these sources would not be reported by the WISE pipeline and will not appear in the WISE catalog, yet they can be highly informative for some scientific questions. In addition, these small-signal measurements...

  19. Statistic analysis of millions of digital photos

    Science.gov (United States)

    Wueller, Dietmar; Fageth, Reiner

    2008-02-01

    The analysis of images has always been an important aspect in the quality enhancement of photographs and photographic equipment. Due to the lack of meta data it was mostly limited to images taken by experts under predefined conditions and the analysis was also done by experts or required psychophysical tests. With digital photography and the EXIF1 meta data stored in the images, a lot of information can be gained from a semiautomatic or automatic image analysis if one has access to a large number of images. Although home printing is becoming more and more popular, the European market still has a few photofinishing companies who have access to a large number of images. All printed images are stored for a certain period of time adding up to several million images on servers every day. We have utilized the images to answer numerous questions and think that these answers are useful for increasing image quality by optimizing the image processing algorithms. Test methods can be modified to fit typical user conditions and future developments can be pointed towards ideal directions.

  20. Human Cell and Tissue Establishment Registration Public Query

    Data.gov (United States)

    U.S. Department of Health & Human Services — This application provides Human Cell and Tissue registration information for registered, inactive, and pre-registered firms. Query options are by Establishment Name,...

  1. Determinacy in Static Analysis of jQuery

    DEFF Research Database (Denmark)

    Andreasen, Esben; Møller, Anders

    2014-01-01

    Static analysis for JavaScript can potentially help programmers find errors early during development. Although much progress has been made on analysis techniques, a major obstacle is the prevalence of libraries, in particular jQuery, which apply programming patterns that have detrimental conseque......Static analysis for JavaScript can potentially help programmers find errors early during development. Although much progress has been made on analysis techniques, a major obstacle is the prevalence of libraries, in particular jQuery, which apply programming patterns that have detrimental...... present a static dataflow analysis for JavaScript that infers and exploits determinacy information on-the-fly, to enable analysis of some of the most complex parts of jQuery. The techniques are implemented in the TAJS analysis tool and evaluated on a collection of small programs that use jQuery. Our...

  2. An introduction to XML query processing and keyword search

    CERN Document Server

    Lu, Jiaheng

    2013-01-01

    This book systematically and comprehensively covers the latest advances in XML data searching. It presents an extensive overview of the current query processing and keyword search techniques on XML data.

  3. An Efficient Data Dissemination Scheme for Spatial Query Processing

    Institute of Scientific and Technical Information of China (English)

    Kwangjin Park; Hyunseung Choo; Chong-Sun Hwang

    2007-01-01

    Due to the personal portable devices and advances in wireless communication technologies, Location Dependent Information Services (LDISs) have received a lot of attention from both the industrial and academic communities.In LDISs,it is important to reduce the query response time, since a late query response may contain out-of-date information.In this paper, we study the issue of LDISs using a Voronoi Diagram.We introduce a new NN search method, called the Exponential Sequence Scheme (ESS), to support NN query processing in periodic broadcast environment.This paper aims to provide research directions towards minimizing both the access latency and energy consumption for the NN-query processing.

  4. An Adaptive Mechanism for Accurate Query Answering under Differential Privacy

    CERN Document Server

    Li, Chao

    2012-01-01

    We propose a novel mechanism for answering sets of count- ing queries under differential privacy. Given a workload of counting queries, the mechanism automatically selects a different set of "strategy" queries to answer privately, using those answers to derive answers to the workload. The main algorithm proposed in this paper approximates the optimal strategy for any workload of linear counting queries. With no cost to the privacy guarantee, the mechanism improves significantly on prior approaches and achieves near-optimal error for many workloads, when applied under (\\epsilon, \\delta)-differential privacy. The result is an adaptive mechanism which can help users achieve good utility without requiring that they reason carefully about the best formulation of their task.

  5. Comparing and Combining Methods for Automatic Query Expansion

    CERN Document Server

    Pérez-Agüera, José R

    2008-01-01

    Query expansion is a well known method to improve the performance of information retrieval systems. In this work we have tested different approaches to extract the candidate query terms from the top ranked documents returned by the first-pass retrieval. One of them is the cooccurrence approach, based on measures of cooccurrence of the candidate and the query terms in the retrieved documents. The other one, the probabilistic approach, is based on the probability distribution of terms in the collection and in the top ranked set. We compare the retrieval improvement achieved by expanding the query with terms obtained with different methods belonging to both approaches. Besides, we have developed a na\\"ive combination of both kinds of method, with which we have obtained results that improve those obtained with any of them separately. This result confirms that the information provided by each approach is of a different nature and, therefore, can be used in a combined manner.

  6. Cluster Analysis and Fuzzy Query in Ship Maintenance and Design

    Science.gov (United States)

    Che, Jianhua; He, Qinming; Zhao, Yinggang; Qian, Feng; Chen, Qi

    Cluster analysis and fuzzy query win wide-spread applications in modern intelligent information processing. In allusion to the features of ship maintenance data, a variant of hypergraph-based clustering algorithm, i.e., Correlation Coefficient-based Minimal Spanning Tree(CC-MST), is proposed to analyze the bulky data rooting in ship maintenance process, discovery the unknown rules and help ship maintainers make a decision on various device fault causes. At the same time, revising or renewing an existed design of ship or device maybe necessary to eliminate those device faults. For the sake of offering ship designers some valuable hints, a fuzzy query mechanism is designed to retrieve the useful information from large-scale complicated and reluctant ship technical and testing data. Finally, two experiments based on a real ship device fault statistical dataset validate the flexibility and efficiency of the CC-MST algorithm. A fuzzy query prototype demonstrates the usability of our fuzzy query mechanism.

  7. Capturing the Meaning of Internet Search Queries by Taxonomy Mapping

    Science.gov (United States)

    Tikk, Domonkos; Kardkovács, Zsolt T.; Bánsághi, Zoltán

    Capturing the meaning of internet search queries can significantly improve the effectiveness of search retrieval. Users often have problem to find relevant answer to their queries, particularly, when the posted query is ambiguous. The orientation of the user can be greatly facilitated, if answers are grouped into topics of a fixed subject taxonomy. In this manner, the original problem can be transformed to the labelling of queries — and consequently, the answers — with the topic names. Thus the original problem is transformed into a classification set-up. This paper introduces our Ferrety algorithm that performs topic assignment, which also works when there is no directly available training data that describes the semantics of the subject taxonomy. The approach is presented via the example of ACM KDD Cup 2005 problem, where Ferrety was awarded for precision and creativity.

  8. Human Cell and Tissue Establishment Registration Public Query

    Data.gov (United States)

    U.S. Department of Health & Human Services — This application provides Human Cell and Tissue registration information for registered, inactive, and pre-registered firms. Query options are by Establishment Name,...

  9. Optimizing Aggregate SPARQL Queries Using Materialized RDF Views

    DEFF Research Database (Denmark)

    Ibragimov, Dilshod; Hose, Katja; Pedersen, Torben Bach;

    2016-01-01

    During recent years, more and more data has been published as native RDF datasets. In this setup, both the size of the datasets and the need to process aggregate queries represent challenges for standard SPARQL query processing techniques. To overcome these limitations, materialized views can......, this paper proposes MARVEL (MAterialized Rdf Views with Entailment and incompLetness). The approach consists of a view selection algorithm based on an associated RDF-specific cost model, a view definition syntax, and an algorithm for rewriting SPARQL queries using materialized RDF views. The experimental...... be created and used as a source of precomputed partial results during query processing. However, materialized view techniques as proposed for relational databases do not support RDF specifics, such as incompleteness and the need to support implicit (derived) information. To overcome these challenges...

  10. Enhanced Distributed Dynamic Skyline Query for Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Khandakar Ahmed

    2016-02-01

    Full Text Available Dynamic skyline query is one of the most popular and significant variants of skyline query in the field of multi-criteria decision-making. However, designing a distributed dynamic skyline query possesses greater challenge, especially for the distributed data centric storage within wireless sensor networks (WSNs. In this paper, a novel Enhanced Distributed Dynamic Skyline (EDDS approach is proposed and implemented in Disk Based Data Centric Storage (DBDCS architecture. DBDCS is an adaptation of magnetic disk storage platter consisting tracks and sectors. In DBDCS, the disc track and sector analogy is used to map data locations. A distance based indexing method is used for storing and querying multi-dimensional similar data. EDDS applies a threshold based hierarchical approach, which uses temporal correlation among sectors and sector segments to calculate a dynamic skyline. The efficiency and effectiveness of EDDS has been evaluated in terms of latency, energy consumption and accuracy through a simulation model developed in Castalia.

  11. External Data Structures for Shortest Path Queries on Planar Digraphs

    DEFF Research Database (Denmark)

    Arge, Lars; Toma, Laura

    2005-01-01

    In this paper we present space-query trade-offs for external memory data structures that answer shortest path queries on planar directed graphs. For any S = Ω(N 1 + ε) and S = O(N2/B), our main result is a family of structures that use S space and answer queries in O(N2/ S B) I/Os, thus obtaining...... optimal space-query product O(N2/B). An S space structure can be constructed in O(√S · sort(N)) I/Os, where sort(N) is the number of I/Os needed to sort N elements, B is the disk block size, and N is the size of the graph....

  12. Queries, Influencers and Vocational Interests of Junior High School Students

    Science.gov (United States)

    Woal, S. Theodore

    1974-01-01

    The study, based on questionnaire results from 207 ninth grade students, investigates: student familiarity with occupations; influencers of their tentative occupational choices; post high school plans; and student queries and concerns pertinent to preparation for entry into a job. (MW)

  13. WISE Photometry for 400 Million SDSS Sources

    Science.gov (United States)

    Lang, Dustin; Hogg, David W.; Schlegel, David J.

    2016-02-01

    We present photometry of images from the Wide-Field Infrared Survey Explorer (WISE) of over 400 million sources detected by the Sloan Digital Sky Survey (SDSS). We use a “forced photometry” technique, using measured SDSS source positions, star-galaxy classification, and galaxy profiles to define the sources whose fluxes are to be measured in the WISE images. We perform photometry with The Tractor image modeling code, working on our “unWISE” coaddds and taking account of the WISE point-spread function and a noise model. The result is a measurement of the flux of each SDSS source in each WISE band. Many sources have little flux in the WISE bands, so often the measurements we report are consistent with zero given our uncertainties. However, for many sources we get 3σ or 4σ measurements; these sources would not be reported by the “official” WISE pipeline and will not appear in the WISE catalog, yet they can be highly informative for some scientific questions. In addition, these small-signal measurements can be used in stacking analyses at the catalog level. The forced photometry approach has the advantage that we measure a consistent set of sources between SDSS and WISE, taking advantage of the resolution and depth of the SDSS images to interpret the WISE images; objects that are resolved in SDSS but blended together in WISE still have accurate measurements in our photometry. Our results, and the code used to produce them, are publicly available at http://unwise.me.

  14. WISE PHOTOMETRY FOR 400 MILLION SDSS SOURCES

    Energy Technology Data Exchange (ETDEWEB)

    Lang, Dustin [Department of Astronomy and Astrophysics and Dunlap Institute, University of Toronto, 50 Saint George Street, Toronto, ON, M5S 3H4 (Canada); Hogg, David W. [Center for Cosmology and Particle Physics, Department of Physics, New York University, 4 Washington Place, New York, NY 10003 (United States); Schlegel, David J., E-mail: dstndstn@gmail.com [Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720 (United States)

    2016-02-15

    We present photometry of images from the Wide-Field Infrared Survey Explorer (WISE) of over 400 million sources detected by the Sloan Digital Sky Survey (SDSS). We use a “forced photometry” technique, using measured SDSS source positions, star–galaxy classification, and galaxy profiles to define the sources whose fluxes are to be measured in the WISE images. We perform photometry with The Tractor image modeling code, working on our “unWISE” coaddds and taking account of the WISE point-spread function and a noise model. The result is a measurement of the flux of each SDSS source in each WISE band. Many sources have little flux in the WISE bands, so often the measurements we report are consistent with zero given our uncertainties. However, for many sources we get 3σ or 4σ measurements; these sources would not be reported by the “official” WISE pipeline and will not appear in the WISE catalog, yet they can be highly informative for some scientific questions. In addition, these small-signal measurements can be used in stacking analyses at the catalog level. The forced photometry approach has the advantage that we measure a consistent set of sources between SDSS and WISE, taking advantage of the resolution and depth of the SDSS images to interpret the WISE images; objects that are resolved in SDSS but blended together in WISE still have accurate measurements in our photometry. Our results, and the code used to produce them, are publicly available at http://unwise.me.

  15. Alternative Development for Data Migration Using Dynamic Query Generation

    Directory of Open Access Journals (Sweden)

    Romero-Ramírez Johan Alfredo

    2016-05-01

    Full Text Available This article presents an ETL (Extract, Transform, Load prototype called Valery as alternative approach to migration process which includes a compiler for dynamic generation of SQL queries. Its main features involve: SQL dynamic generation, set of configuration commands and environment for file uploading. The tests use the Northwind academic database and an individual environment. The model implementation uses flat files and SQL as query language. Finally, there is an analysis of the results obtained.

  16. MOCQL: A Declarative Language for Ad-Hoc Model Querying

    DEFF Research Database (Denmark)

    Störrle, Harald

    2013-01-01

    This paper starts from the observation that existing model query facilities are not easy to use, and are thus not suitable for users without substantial IT/Computer Science background. In an attempt to highlight this issue and explore alternatives, we have created the Model Constraint and Query L...... with MOCQL than when working with OCL. While MOCQL is currently only implemented and validated for the different notations defined by UML, its concepts should be universally applicable....

  17. Study on consistent query answering in inconsistent databases

    Institute of Scientific and Technical Information of China (English)

    XIE Dong; YANG Luming

    2007-01-01

    Consistent query answering is an approach to retrieving consistent answers over databases that might be inconsistent with respect to some given integrity constraints The approach is based on a concept of repair.This paper surveys several recent researches on obtaining consistent information from inconsistent databases,such as the underlying semantic model,a number of approaches to computing consistent query answers and the computational complexity of this problem.Furthermore,the work outlines potential research directions in this area.

  18. Two Dimensional Range Minimum Queries and Fibonacci Lattices

    DEFF Research Database (Denmark)

    Brodal, Gerth Stølting; Davoodi, Pooya; Lewenstein, Moshe;

    2012-01-01

    technique—the discrepancy properties of Fibonacci lattices—we give an indexing data structure for 2D-RMQs that uses O(N/c) bits additional space with O(clogc(loglogc)2) query time, for any parameter c, 4 ≤ c ≤ N. Also, when the entries of the input matrix are from {0,1}, we show that the query time can...

  19. A distributed query execution engine of big attributed graphs.

    Science.gov (United States)

    Batarfi, Omar; Elshawi, Radwa; Fayoumi, Ayman; Barnawi, Ahmed; Sakr, Sherif

    2016-01-01

    A graph is a popular data model that has become pervasively used for modeling structural relationships between objects. In practice, in many real-world graphs, the graph vertices and edges need to be associated with descriptive attributes. Such type of graphs are referred to as attributed graphs. G-SPARQL has been proposed as an expressive language, with a centralized execution engine, for querying attributed graphs. G-SPARQL supports various types of graph querying operations including reachability, pattern matching and shortest path where any G-SPARQL query may include value-based predicates on the descriptive information (attributes) of the graph edges/vertices in addition to the structural predicates. In general, a main limitation of centralized systems is that their vertical scalability is always restricted by the physical limits of computer systems. This article describes the design, implementation in addition to the performance evaluation of DG-SPARQL, a distributed, hybrid and adaptive parallel execution engine of G-SPARQL queries. In this engine, the topology of the graph is distributed over the main memory of the underlying nodes while the graph data are maintained in a relational store which is replicated on the disk of each of the underlying nodes. DG-SPARQL evaluates parts of the query plan via SQL queries which are pushed to the underlying relational stores while other parts of the query plan, as necessary, are evaluated via indexless memory-based graph traversal algorithms. Our experimental evaluation shows the efficiency and the scalability of DG-SPARQL on querying massive attributed graph datasets in addition to its ability to outperform the performance of Apache Giraph, a popular distributed graph processing system, by orders of magnitudes.

  20. Online Query Scheduling on Source Permutation for Big Data Integration

    OpenAIRE

    Yuan, Zimu; Guo, Shusheng

    2015-01-01

    Big data integration could involve a large number of sources with unpredictable redundancy information between them. The approach of building a central warehousing to integrate big data from all sources then becomes infeasible because of so large number of sources and continuous updates happening. A practical approach is to apply online query scheduling that inquires data from sources at runtime upon receiving a query. In this paper, we address the Time-Cost Minimization Problem for online qu...

  1. The Query Complexity of Finding a Hidden Permutation

    DEFF Research Database (Denmark)

    Afshani, Peyman; Afrawal, Manindra; Benjamin, Doerr;

    2012-01-01

    the score fz(x) defined as fz(x):=maxi[0n]ji:z(j)=x(j); i.e., the length of the longest common prefix of x and z with respect to . The goal is to minimize the number of queries asked. Our main result are matching upper and lower bounds for this problem, both for deterministic and randomized query schemes...

  2. Optimizing Aggregate SPARQL Queries Using Materialized RDF Views

    DEFF Research Database (Denmark)

    Ibragimov, Dilshod; Hose, Katja; Pedersen, Torben Bach

    2016-01-01

    , this paper proposes MARVEL (MAterialized Rdf Views with Entailment and incompLetness). The approach consists of a view selection algorithm based on an associated RDF-specific cost model, a view definition syntax, and an algorithm for rewriting SPARQL queries using materialized RDF views. The experimental...... evaluation shows that MARVEL can improve query response time by more than an order of magnitude while effectively handling RDF specifics....

  3. NCBI GEO: mining tens of millions of expression profiles--database and tools update.

    Science.gov (United States)

    Barrett, Tanya; Troup, Dennis B; Wilhite, Stephen E; Ledoux, Pierre; Rudnev, Dmitry; Evangelista, Carlos; Kim, Irene F; Soboleva, Alexandra; Tomashevsky, Maxim; Edgar, Ron

    2007-01-01

    The Gene Expression Omnibus (GEO) repository at the National Center for Biotechnology Information (NCBI) archives and freely disseminates microarray and other forms of high-throughput data generated by the scientific community. The database has a minimum information about a microarray experiment (MIAME)-compliant infrastructure that captures fully annotated raw and processed data. Several data deposit options and formats are supported, including web forms, spreadsheets, XML and Simple Omnibus Format in Text (SOFT). In addition to data storage, a collection of user-friendly web-based interfaces and applications are available to help users effectively explore, visualize and download the thousands of experiments and tens of millions of gene expression patterns stored in GEO. This paper provides a summary of the GEO database structure and user facilities, and describes recent enhancements to database design, performance, submission format options, data query and retrieval utilities. GEO is accessible at http://www.ncbi.nlm.nih.gov/geo/

  4. Privately Releasing Conjunctions and the Statistical Query Barrier

    CERN Document Server

    Gupta, Anupam; Roth, Aaron; Ullman, Jonathan

    2010-01-01

    Suppose we would like to know all answers to a set of statistical queries C on a data set up to small error, but we can only access the data itself using statistical queries. A trivial solution is to exhaustively ask all queries in C. Can we do any better? + We show that the number of statistical queries necessary and sufficient for this task is---up to polynomial factors---equal to the agnostic learning complexity of C in Kearns' statistical query (SQ) model. This gives a complete answer to the question when running time is not a concern. + We then show that the problem can be solved efficiently (allowing arbitrary error on a small fraction of queries) whenever the answers to C can be described by a submodular function. This includes many natural concept classes, such as graph cuts and Boolean disjunctions and conjunctions. In doing so we also give a new learning algorithm for submodular functions that improves upon recent results in a different context. While interesting from a learning theoretic point of v...

  5. Largest Empty Circle Centered on a Query Line

    CERN Document Server

    Augustine, John; Roy, Sasanka

    2008-01-01

    The Largest Empty Circle problem seeks the largest circle centered within the convex hull of a set $P$ of $n$ points in $\\mathbb{R}^2$ and devoid of points from $P$. In this paper, we introduce a query version of this well-studied problem. In our query version, we are required to preprocess $P$ so that when given a query line $Q$, we can quickly compute the largest empty circle centered at some point on $Q$ and within the convex hull of $P$. We present solutions for two special cases and the general case; all our queries run in $O(\\log n)$ time. We restrict the query line to be horizontal in the first special case, which we preprocess in $O(n \\alpha(n) \\log n)$ time and space, where $\\alpha(n)$ is the slow growing inverse of the Ackermann's function. When the query line is restricted to pass through a fixed point, the second special case, our preprocessing takes $O(n \\alpha(n)^{O(\\alpha(n))} \\log n)$ time and space. We use insights from the two special cases to solve the general version of the problem with pr...

  6. A comparison of peer-to-peer query response modes

    CERN Document Server

    Hoschek, W

    2002-01-01

    In a large distributed system spanning many administrative domains such as a Grid (Foster et al., 2001), it is desirable to maintain and query dynamic and timely information about active participants such as services, resources and user communities. However, in such a database system, the set of information tuples in the universe is partitioned over one or more distributed nodes, for reasons including autonomy, scalability, availability, performance and security. This suggests the use of peer-to-peer (P2P) query technology. A variety of query response modes can be used to return matching query results from P2P nodes to an originator. Although from the functional perspective all response modes are equivalent, no mode is optimal under all circumstances. Which query response modes allow to express suitable trade-offs for a wide range ofP2P application? We answer this question by systematically describing and characterizing four query response modes for the unified peer-to-peer database framework (UPDF) proposed ...

  7. QUESEM: Towards building a Meta Search Service utilizing Query Semantics

    Directory of Open Access Journals (Sweden)

    Neelam Duhan

    2011-01-01

    Full Text Available Current Web Search Engines are built to serve needs of all users, independent of the special needs of any individual. The documents are returned by matching their queries with available documents, with no emphasis on the semantics of query. As a result, the generated information is often very large and inaccurate that results in increased user perceived latency. In this paper, a Semantic Search Service is being developed to help users gather relevant documents more efficiently unlike traditional Web search engines. The approach relies on the online web resource such as dictionary based sites to retrieve possible semantics of the query keywords, which are stored in a definition repository. The service works as a meta-layer above the keyword-based search engine to generate sub-queries based on different meanings of user query, which in turn are sent to the keyword-based search engine to perform Web search. This approach relieves the user in finding the desired information content and improves the search quality for certain types of complex queries. Experiments depict its efficiency as it results in reduced search space.

  8. Efficient external memory structures for range-aggregate queries

    DEFF Research Database (Denmark)

    Agarwal, P.K.; Yang, J.; Arge, L.;

    2013-01-01

    We present external memory data structures for efficiently answering range-aggregate queries. The range-aggregate problem is defined as follows: Given a set of weighted points in Rd, compute the aggregate of the weights of the points that lie inside a d-dimensional orthogonal query rectangle....... The aggregates we consider in this paper include count, sum, and max. First, we develop a structure for answering two-dimensional range-count queries that uses O(N/B) disk blocks and answers a query in View the MathML source I/Os, where N is the number of input points and B is the disk block size. The structure...... can be extended to obtain a near-linear-size structure for answering range-sum queries using View the MathML source I/Os, and a linear-size structure for answering range-max queries in View the MathML source I/Os. Our structures can be made dynamic and extended to higher dimensions....

  9. Improving query services of web map by web mining

    Science.gov (United States)

    Huang, Maojun

    2007-11-01

    Web map is the hybrid of map and the World Wide Web (known as Web). It is usually created with WebGIS techniques. With the rapid social development, web maps oriented the public are facing pressure that dissatisfy the increased demanding. The geocoding database plays a key role in supporting query services effectively. The traditional geocoding method is laborious and time-consuming. And there is much online spatial information, which would be the supplementary information source for geocoding. Therefore, this paper discusses how to improve query services by web mining. The improvement can be described from three facets: first, improving location query by discovering and extracting address information from the Web to extend geocoding database. Second, enhancing the ability of optimum path query of public traffic and buffer query by spatial analyzing and reasoning on the extended geocoding database. Third, adjusting strategies of collecting data according to patterns discovered by web map query mining. Finally, this paper presents the designing of the application system and experimental results.

  10. Measuring the achievable error of query sets under differential privacy

    CERN Document Server

    Li, Chao

    2012-01-01

    A common goal of privacy research is to release synthetic data that satisfies a formal privacy guarantee and can be used by an analyst in place of the original data. To achieve reasonable accuracy, a synthetic data set must be tuned to support a specified set of queries accurately, sacrificing fidelity for other queries. This work considers methods for producing synthetic data under differential privacy and investigates what makes a set of queries "easy" or "hard" to answer. We consider answering sets of linear counting queries using the matrix mechanism, a recent differentially-private mechanism that can reduce error by adding complex correlated noise adapted to a specified workload. Our main result is a novel lower bound on the minimum total error required to simultaneously release answers to a set of workload queries. The bound reveals that the hardness of a query workload is related to the spectral properties of the workload when it is represented in matrix form. The bound is tight and, because it satisfi...

  11. Twenty-year trends in the prevalence of Down syndrome and other trisomies in Europe

    DEFF Research Database (Denmark)

    Loane, Maria; Morris, Joan K; Addor, Marie-Claude

    2013-01-01

    This study examines trends and geographical differences in total and live birth prevalence of trisomies 21, 18 and 13 with regard to increasing maternal age and prenatal diagnosis in Europe. Twenty-one population-based EUROCAT registries covering 6.1 million births between 1990 and 2009 participa...

  12. Teaching Middle School Language Arts: Incorporating Twenty-First Century Literacies

    Science.gov (United States)

    Small Roseboro, Anna J.

    2010-01-01

    "Teaching Middle School Language Arts" is the first book on teaching middle school language arts for multiple intelligences and related twenty-first-century literacies in technologically and ethnically diverse communities. More than 670,000 middle school teachers (grades six through eight) are responsible for educating nearly 13 million students…

  13. Complex dynamics of our economic life on different scales: insights from search engine query data.

    Science.gov (United States)

    Preis, Tobias; Reith, Daniel; Stanley, H Eugene

    2010-12-28

    Search engine query data deliver insight into the behaviour of individuals who are the smallest possible scale of our economic life. Individuals are submitting several hundred million search engine queries around the world each day. We study weekly search volume data for various search terms from 2004 to 2010 that are offered by the search engine Google for scientific use, providing information about our economic life on an aggregated collective level. We ask the question whether there is a link between search volume data and financial market fluctuations on a weekly time scale. Both collective 'swarm intelligence' of Internet users and the group of financial market participants can be regarded as a complex system of many interacting subunits that react quickly to external changes. We find clear evidence that weekly transaction volumes of S&P 500 companies are correlated with weekly search volume of corresponding company names. Furthermore, we apply a recently introduced method for quantifying complex correlations in time series with which we find a clear tendency that search volume time series and transaction volume time series show recurring patterns.

  14. Image-based query-by-example for big databases of galaxy images

    Science.gov (United States)

    Shamir, Lior; Kuminski, Evan

    2017-01-01

    Very large astronomical databases containing millions or even billions of galaxy images have been becoming increasingly important tools in astronomy research. However, in many cases the very large size makes it more difficult to analyze these data manually, reinforcing the need for computer algorithms that can automate the data analysis process. An example of such task is the identification of galaxies of a certain morphology of interest. For instance, if a rare galaxy is identified it is reasonable to expect that more galaxies of similar morphology exist in the database, but it is virtually impossible to manually search these databases to identify such galaxies. Here we describe computer vision and pattern recognition methodology that receives a galaxy image as an input, and searches automatically a large dataset of galaxies to return a list of galaxies that are visually similar to the query galaxy. The returned list is not necessarily complete or clean, but it provides a substantial reduction of the original database into a smaller dataset, in which the frequency of objects visually similar to the query galaxy is much higher. Experimental results show that the algorithm can identify rare galaxies such as ring galaxies among datasets of 10,000 astronomical objects.

  15. Restricted natural language based querying of clinical databases.

    Science.gov (United States)

    Safari, Leila; Patrick, Jon D

    2014-12-01

    To elevate the level of care to the community it is essential to provide usable tools for healthcare professionals to extract knowledge from clinical data. In this paper a generic translation algorithm is proposed to translate a restricted natural language query (RNLQ) to a standard query language like SQL (Structured Query Language). A special purpose clinical data analytics language (CliniDAL) has been introduced which provides scheme of six classes of clinical questioning templates. A translation algorithm is proposed to translate the RNLQ of users to SQL queries based on a similarity-based Top-k algorithm which is used in the mapping process of CliniDAL. Also a two layer rule-based method is used to interpret the temporal expressions of the query, based on the proposed temporal model. The mapping and translation algorithms are generic and thus able to work with clinical databases in three data design models, including Entity-Relationship (ER), Entity-Attribute-Value (EAV) and XML, however it is only implemented for ER and EAV design models in the current work. It is easy to compose a RNLQ via CliniDAL's interface in which query terms are automatically mapped to the underlying data models of a Clinical Information System (CIS) with an accuracy of more than 84% and the temporal expressions of the query comprising absolute times, relative times or relative events can be automatically mapped to time entities of the underlying CIS and to normalized temporal comparative values. The proposed solution of CliniDAL using the generic mapping and translation algorithms which is enhanced by a temporal analyzer component provides a simple mechanism for composing RNLQ for extracting knowledge from CISs with different data design models for analytics purposes. Copyright © 2014 Elsevier Inc. All rights reserved.

  16. 500 Million Yuan Textile Machinery Project Located in Laiyang

    Institute of Scientific and Technical Information of China (English)

    2012-01-01

    Textile machinery project with 500 million Yuan investment was formally signed in Yantai Laiyang Development Zone on November 15, which is the l Oth project in the development zone with more than 100 million Yuan investment, and the 5th project in the development zone with more than 500 million Yuan investment.

  17. Evaluating XML-Extended OLAP Queries Based on a Physical Algebra

    DEFF Research Database (Denmark)

    Yin, Xuepeng; Pedersen, Torben Bach

    2006-01-01

    . In this paper, we extend previous work on the logical federation of OLAP and XML data sources by presenting a simplified query semantics, a physical query algebra and a robust OLAP-XML query engine as well as the query evaluation techniques. Performance experiments with a prototypical implementation suggest...

  18. Constraint-based query distribution framework for an integrated global schema

    DEFF Research Database (Denmark)

    Malik, Ahmad Kamran; Qadir, Muhammad Abdul; Iftikhar, Nadeem

    2009-01-01

    Distributed heterogeneous data sources need to be queried uniformly using global schema. Query on global schema is reformulated so that it can be executed on local data sources. Constraints in global schema and mappings are used for source selection, query optimization, and querying partitioned...

  19. Accelerating SPARQL Queries and Analytics on RDF Data

    KAUST Repository

    Al-Harbi, Razen

    2016-11-09

    The complexity of SPARQL queries and RDF applications poses great challenges on distributed RDF management systems. SPARQL workloads are dynamic and con- sist of queries with variable complexities. Hence, systems that use static partitioning su↵er from communication overhead for workloads that generate excessive communi- cation. Concurrently, RDF applications are becoming more sophisticated, mandating analytical operations that extend beyond SPARQL queries. Being primarily designed and optimized to execute SPARQL queries, which lack procedural capabilities, exist- ing systems are not suitable for rich RDF analytics. This dissertation tackles the problem of accelerating SPARQL queries and RDF analytics on distributed shared-nothing RDF systems. First, a distributed RDF en- gine, coined AdPart, is introduced. AdPart uses lightweight hash partitioning for sharding triples using their subject values; rendering its startup overhead very low. The locality-aware query optimizer of AdPart takes full advantage of the partition- ing to (i) support the fully parallel processing of join patterns on subjects and (ii) minimize data communication for general queries by applying hash distribution of intermediate results instead of broadcasting, wherever possible. By exploiting hash- based locality, AdPart achieves better or comparable performance to systems that employ sophisticated partitioning schemes. To cope with workloads dynamism, AdPart is extended to dynamically adapt to workload changes. AdPart monitors the data access patterns and dynamically redis- tributes and replicates the instances of the most frequent patterns among workers.Consequently, the communication cost for future queries is drastically reduced or even eliminated. Experiments with synthetic and real data verify that AdPart starts faster than all existing systems and gracefully adapts to the query load. Finally, to support and accelerate rich RDF analytical tasks, a vertex-centric RDF analytics framework is

  20. A Semantic Cache Framework for Secure XML Queries

    Institute of Scientific and Technical Information of China (English)

    Jian-Hua Feng; Guo-Liang Li; Na Ta

    2008-01-01

    Secure XML query answering to protect data privacy and semantic cache to speed up XML query answering are two hot spots in current research areas of XML database systems. While both issues are explored respectively in depth,they have not been studied together, that is, the problem of semantic cache for secure XML query answering has not been addressed yet. In this paper, we present an interesting joint of these two aspects and propose an efficient framework of semantic cache for secure XML query answering, which can improve the performance of XML database systems under secure circumstances. Our framework combines access control, user privilege management over XML data and the state-of-the-art semantic XML query cache techniques, to ensure that data are presented only to authorized users in an efficient way. To the best of our knowledge, the approach we propose here is among the first beneficial efforts in a novel perspective of combining caching and security for XML database to improve system performance. The efficiency of our framework is verified by comprehensive experiments.

  1. Ontology-based geospatial data query and integration

    Science.gov (United States)

    Zhao, T.; Zhang, C.; Wei, M.; Peng, Z.-R.

    2008-01-01

    Geospatial data sharing is an increasingly important subject as large amount of data is produced by a variety of sources, stored in incompatible formats, and accessible through different GIS applications. Past efforts to enable sharing have produced standardized data format such as GML and data access protocols such as Web Feature Service (WFS). While these standards help enabling client applications to gain access to heterogeneous data stored in different formats from diverse sources, the usability of the access is limited due to the lack of data semantics encoded in the WFS feature types. Past research has used ontology languages to describe the semantics of geospatial data but ontology-based queries cannot be applied directly to legacy data stored in databases or shapefiles, or to feature data in WFS services. This paper presents a method to enable ontology query on spatial data available from WFS services and on data stored in databases. We do not create ontology instances explicitly and thus avoid the problems of data replication. Instead, user queries are rewritten to WFS getFeature requests and SQL queries to database. The method also has the benefits of being able to utilize existing tools of databases, WFS, and GML while enabling query based on ontology semantics. ?? 2008 Springer-Verlag Berlin Heidelberg.

  2. Fast Private Data Release Algorithms for Sparse Queries

    CERN Document Server

    Blum, Avrim

    2011-01-01

    We revisit the problem of accurately answering large classes of statistical queries while preserving differential privacy. Previous approaches to this problem have either been very general but have not had run-time polynomial in the size of the database, have applied only to very limited classes of queries, or have relaxed the notion of worst-case error guarantees. In this paper we consider the large class of sparse queries, which take non-zero values on only polynomially many universe elements. We give efficient query release algorithms for this class, in both the interactive and the non-interactive setting. Our algorithms also achieve better accuracy bounds than previous general techniques do when applied to sparse queries: our bounds are independent of the universe size. In fact, even the runtime of our interactive mechanism is independent of the universe size, and so can be implemented in the "infinite universe" model in which no finite universe need be specified by the data curator.

  3. Evaluation methodology for query-based scene understanding systems

    Science.gov (United States)

    Huster, Todd P.; Ross, Timothy D.; Culbertson, Jared L.

    2015-05-01

    In this paper, we are proposing a method for the principled evaluation of scene understanding systems in a query-based framework. We can think of a query-based scene understanding system as a generalization of typical sensor exploitation systems where instead of performing a narrowly defined task (e.g., detect, track, classify, etc.), the system can perform general user-defined tasks specified in a query language. Examples of this type of system have been developed as part of DARPA's Mathematics of Sensing, Exploitation, and Execution (MSEE) program. There is a body of literature on the evaluation of typical sensor exploitation systems, but the open-ended nature of the query interface introduces new aspects to the evaluation problem that have not been widely considered before. In this paper, we state the evaluation problem and propose an approach to efficiently learn about the quality of the system under test. We consider the objective of the evaluation to be to build a performance model of the system under test, and we rely on the principles of Bayesian experiment design to help construct and select optimal queries for learning about the parameters of that model.

  4. Parallel Index and Query for Large Scale Data Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Chou, Jerry; Wu, Kesheng; Ruebel, Oliver; Howison, Mark; Qiang, Ji; Prabhat,; Austin, Brian; Bethel, E. Wes; Ryne, Rob D.; Shoshani, Arie

    2011-07-18

    Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies are critical for facilitating interactive exploration of large datasets, but numerous challenges remain in terms of designing a system for process- ing general scientific datasets. The system needs to be able to run on distributed multi-core platforms, efficiently utilize underlying I/O infrastructure, and scale to massive datasets. We present FastQuery, a novel software framework that address these challenges. FastQuery utilizes a state-of-the-art index and query technology (FastBit) and is designed to process mas- sive datasets on modern supercomputing platforms. We apply FastQuery to processing of a massive 50TB dataset generated by a large scale accelerator modeling code. We demonstrate the scalability of the tool to 11,520 cores. Motivated by the scientific need to search for inter- esting particles in this dataset, we use our framework to reduce search time from hours to tens of seconds.

  5. Using medline queries to generate image retrieval tasks for benchmarking.

    Science.gov (United States)

    Müller, Henning; Kalpathy-Cramer, Jayashree; Hersh, William; Geissbuhler, Antoine

    2008-01-01

    Medical visual information retrieval has been a very active research area over the past ten years as an increasing amount of images is produced digitally and made available in the electronic patient record. Tools are required to give access to the images and exploit the information inherently stored in medical cases including images. To compare image retrieval techniques of research prototypes based on the same data and tasks, ImageCLEF was started in 2003 and a medical task was added in 2004. Since then, every year a database was distributed, tasks developed, and systems compared based on realistic search tasks and large databases. For the year 2007 a set of almost 68,000 images was distributed among 38 research groups registered for the medical retrieval task. Realistic query topics were developed based on a log file of Medline. This log file contains the queries performed on Pubmed during 24 hours. Most queries could not be used as search topics directly as they do not contain image-related themes, but a few thousand do. Other types of queries had to be filtered out as well, as many stated information needs are very vague; for evaluation on the other hand clear and focused topics are necessary to obtain a limited number of relevant documents and limit ambiguity in the evaluation process. In the end, 30 queries were developed and 13 research groups submitted a total of 149 runs using a large variety of techniques, from textual to purely visual retrieval and multi-modal approaches.

  6. Secure Nearest Neighbor Query on Crowd-Sensing Data.

    Science.gov (United States)

    Cheng, Ke; Wang, Liangmin; Zhong, Hong

    2016-09-22

    Nearest neighbor queries are fundamental in location-based services, and secure nearest neighbor queries mainly focus on how to securely and quickly retrieve the nearest neighbor in the outsourced cloud server. However, the previous big data system structure has changed because of the crowd-sensing data. On the one hand, sensing data terminals as the data owner are numerous and mistrustful, while, on the other hand, in most cases, the terminals find it difficult to finish many safety operation due to computation and storage capability constraints. In light of they Multi Owners and Multi Users (MOMU) situation in the crowd-sensing data cloud environment, this paper presents a secure nearest neighbor query scheme based on the proxy server architecture, which is constructed by protocols of secure two-party computation and secure Voronoi diagram algorithm. It not only preserves the data confidentiality and query privacy but also effectively resists the collusion between the cloud server and the data owners or users. Finally, extensive theoretical and experimental evaluations are presented to show that our proposed scheme achieves a superior balance between the security and query performance compared to other schemes.

  7. Secure Nearest Neighbor Query on Crowd-Sensing Data

    Directory of Open Access Journals (Sweden)

    Ke Cheng

    2016-09-01

    Full Text Available Nearest neighbor queries are fundamental in location-based services, and secure nearest neighbor queries mainly focus on how to securely and quickly retrieve the nearest neighbor in the outsourced cloud server. However, the previous big data system structure has changed because of the crowd-sensing data. On the one hand, sensing data terminals as the data owner are numerous and mistrustful, while, on the other hand, in most cases, the terminals find it difficult to finish many safety operation due to computation and storage capability constraints. In light of they Multi Owners and Multi Users (MOMU situation in the crowd-sensing data cloud environment, this paper presents a secure nearest neighbor query scheme based on the proxy server architecture, which is constructed by protocols of secure two-party computation and secure Voronoi diagram algorithm. It not only preserves the data confidentiality and query privacy but also effectively resists the collusion between the cloud server and the data owners or users. Finally, extensive theoretical and experimental evaluations are presented to show that our proposed scheme achieves a superior balance between the security and query performance compared to other schemes.

  8. Efficient Execution of Multiple Queries on Deep Memory Hierarchy

    Institute of Scientific and Technical Information of China (English)

    Yan Zhang; Zhi-Feng Chen; Yuan-Yuan Zhou

    2007-01-01

    This paper proposes a complementary novel idea, called MiniTasking to further reduce the number of cachemisses by improving the data temporal locality for multiple concurrent queries. Our idea is based on the observation that, inmany workloads such as decision support systems (DSS), there is usually significant amount of data sharing among differentconcurrent queries. MiniTasking exploits such data sharing to improve data temporal locality by scheduling query executionat three levels: query level batching, operator level grouping and mini-task level scheduling. The experimental results withvarious types of concurrent TPC-H query workloads show that, with the traditional N-ary Storage Model (NSM) layout,MiniTasking significantly reduces the L2 cache misses by up to 83%, and thereby achieves 24% reduction in execution time.With the Partition Attributes Across (PAX) layout, MiniTasking further reduces the cache misses by 65% and the executiontime by 9%. For the TPC-H throughput test workload, MiniTasking improves the end performance up to 20%.

  9. Twenty Questions Games Always End With Yes

    CERN Document Server

    Gill, John T

    2010-01-01

    Huffman coding is often presented as the optimal solution to Twenty Questions. However, a caveat is that Twenty Questions games always end with a reply of "Yes," whereas Huffman codewords need not obey this constraint. We bring resolution to this issue, and prove that the average number of questions still lies between H(X) and H(X)+1.

  10. Capital in the Twenty-First Century

    DEFF Research Database (Denmark)

    Hansen, Per H.

    2014-01-01

    Review essay on: Capital in the Twenty-First Century. By Thomas Piketty . Translated by Arthur Goldhammer . Cambridge, Mass.: The Belknap Press of Harvard University Press, 2014. viii + 685 pp......Review essay on: Capital in the Twenty-First Century. By Thomas Piketty . Translated by Arthur Goldhammer . Cambridge, Mass.: The Belknap Press of Harvard University Press, 2014. viii + 685 pp...

  11. Differential Privacy and the Fat-Shattering Dimension of Linear Queries

    CERN Document Server

    Roth, Aaron

    2010-01-01

    In this paper, we consider the task of answering linear queries under the constraint of differential privacy. This is a general and well-studied class of queries that captures other commonly studied classes, including predicate queries and histogram queries. We show that the accuracy to which a set of linear queries can be answered is closely related to its fat-shattering dimension, a property that characterizes the learnability of real-valued functions in the agnostic-learning setting.

  12. ConQuR-Bio: Consensus Ranking with Query Reformulation for Biological Data

    OpenAIRE

    2014-01-01

    International audience; This paper introduces ConQuR-Bio which aims at assisting scientists when they query public biological databases. Various reformu-lations of the user query are generated using medical terminologies. Such alternative reformulations are then used to rank the query results using a new consensus ranking strategy. The originality of our approach thus lies in using consensus ranking techniques within the context of query reformulation. The ConQuR-Bio system is able to query t...

  13. Graphical Models for Recovering Probabilistic and Causal Queries from Missing Data

    Science.gov (United States)

    2014-11-01

    queries of the form P (y|do(x)). We show that causal queries may be recoverable even when the factors in their identifying estimands are not...well as causal queries of the form P(yjdo(x)). We show that causal queries may be recoverable even when the factors in their identifying estimands are...Graphical Models for Recovering Probabilistic and Causal Queries from Missing Data Karthika Mohan and Judea Pearl Cognitive Systems Laboratory

  14. Optimization Query Process of Mediators Interrogation Based On Combinatorial Storage

    Directory of Open Access Journals (Sweden)

    L. Cherrat

    2013-05-01

    Full Text Available In the distributed environment where a query involves several heterogeneous sources, communication costs must be taken into consideration. In this paper we describe a query optimization approach using dynamic programming technique for set integrated heterogeneous sources. The objective of the optimization is to minimize the total processing time including load processing, request rewriting and communication costs, to facilitate communication inter-sites and to optimize the time of data transfer from site to others. Moreover, the ability to store data in more than one centre site provides more flexibility in terms of Security/Safety and overload of the network. In contrast to optimizers which are considered a restricted search space, the proposed optimizer searches the closed subsets of sources and independency relationship which may be deep laniary or hierarchical trees. Especially the execution of the queries can start traversal anywhere over any subset and not only from a specific source.

  15. Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

    DEFF Research Database (Denmark)

    Jakobsen, Kim Ahlstrøm; Andersen, Alex B.; Hose, Katja

    2015-01-01

    data warehouses and data cubes. Today, external data sources are essential for analytics and, as the Semantic Web gains popularity, more and more external sources are available in native RDF. With the recent SPARQL 1.1 standard, performing analytical queries over RDF data sources has finally become......In today’s data-driven world, analytical querying, typically based on the data cube concept, is the cornerstone of answering important business questions and making data-driven decisions. Traditionally, the underlying analytical data was mostly internal to the organization and stored in relational...... feasible. However, unlike their relational counterparts, RDF data cubes stores lack optimizations that enable fast querying. In this paper, we present an approach to optimizing RDF data cubes that is based on three novel cube patterns that optimize RDF data cubes, as well as associated algorithms...

  16. A natural language user interface for fuzzy scope queries

    Institute of Scientific and Technical Information of China (English)

    黄艳; 俞宏峰; 耿卫东; 潘云鹤

    2003-01-01

    This paper presents a two-agent framework to build a natural language query interface for IC information system, focusing more on scope queries in a single English sentence. The first agent, parsing agent, syntactically processes and semantically interprets natural language sentence to construct a fuzzy structured query language (SQL) statement. The second agent, defuzzifying agent, defuzzifies the imprecise part of the fuzzy SQL statement into its equivalent executable precise SQL statement based on fuzzy rules. The first agent can also actively ask the user some necessary questions when it manages to disambiguate the vague retrieval requirements. The adaptive defuzzification approach employed in the defuzzifying agent is discussed in detail. A prototype interface has been implemented to demonstrate the effectiveness.

  17. Generating and Executing Complex Natural Language Queries across Linked Data.

    Science.gov (United States)

    Hamon, Thierry; Mougin, Fleur; Grabar, Natalia

    2015-01-01

    With the recent and intensive research in the biomedical area, the knowledge accumulated is disseminated through various knowledge bases. Links between these knowledge bases are needed in order to use them jointly. Linked Data, SPARQL language, and interfaces in Natural Language question-answering provide interesting solutions for querying such knowledge bases. We propose a method for translating natural language questions in SPARQL queries. We use Natural Language Processing tools, semantic resources, and the RDF triples description. The method is designed on 50 questions over 3 biomedical knowledge bases, and evaluated on 27 questions. It achieves 0.78 F-measure on the test set. The method for translating natural language questions into SPARQL queries is implemented as Perl module available at http://search.cpan.org/ thhamon/RDF-NLP-SPARQLQuery.

  18. A Quantum Query Expansion Approach for Session Search

    Directory of Open Access Journals (Sweden)

    Peng Zhang

    2016-04-01

    Full Text Available Recently, Quantum Theory (QT has been employed to advance the theory of Information Retrieval (IR. Various analogies between QT and IR have been established. Among them, a typical one is applying the idea of photon polarization in IR tasks, e.g., for document ranking and query expansion. In this paper, we aim to further extend this work by constructing a new superposed state of each document in the information need space, based on which we can incorporate the quantum interference idea in query expansion. We then apply the new quantum query expansion model to session search, which is a typical Web search task. Empirical evaluation on the large-scale Clueweb12 dataset has shown that the proposed model is effective in the session search tasks, demonstrating the potential of developing novel and effective IR models based on intuitions and formalisms of QT.

  19. A Distributed DB Architecture for Processing cPIR Queries

    Directory of Open Access Journals (Sweden)

    Sultan.M

    2013-06-01

    Full Text Available Information Retrieval is the Process of obtaining materials, usually documents from unstructured huge volume of data. Several Protocols are available to retrieve bit information available in the distributed databases. A Cloud framework provides a platform for private information retrieval. In this article, we combine the artifacts of the distributed system with Cloud framework for extracting information from unstructured databases. The process involves distributing the database to a number of co-operative peers which will reduce the response of the query by influencing computational resources in the peer. A single query is subdivided into multiple queries and processed in parallel across the distributed sites. Our Simulation results using Cloud Sim shows that this distributed database architecture reduces the cost of computational Private Information Retrieval with reduced response time and processor overload in peer sites.

  20. Determinacy in Static Analysis of jQuery

    DEFF Research Database (Denmark)

    Andreasen, Esben; Møller, Anders

    2014-01-01

    Static analysis for JavaScript can potentially help programmers find errors early during development. Although much progress has been made on analysis techniques, a major obstacle is the prevalence of libraries, in particular jQuery, which apply programming patterns that have detrimental...... present a static dataflow analysis for JavaScript that infers and exploits determinacy information on-the-fly, to enable analysis of some of the most complex parts of jQuery. The techniques are implemented in the TAJS analysis tool and evaluated on a collection of small programs that use jQuery. Our...... consequences on the analysis precision and performance. Previous work on dynamic determinacy analysis has demonstrated how information about program expressions that always resolve to a fixed value in some call context may lead to significant scalability improvements of static analysis for such code. We...

  1. On the Fly Query Entity Decomposition Using Snippets

    CERN Document Server

    Brenes, David J; Garcia, Rodrigo

    2010-01-01

    One of the most important issues in Information Retrieval is inferring the intents underlying users' queries. Thus, any tool to enrich or to better contextualized queries can proof extremely valuable. Entity extraction, provided it is done fast, can be one of such tools. Such techniques usually rely on a prior training phase involving large datasets. That training is costly, specially in environments which are increasingly moving towards real time scenarios where latency to retrieve fresh informacion should be minimal. In this paper an `on-the-fly' query decomposition method is proposed. It uses snippets which are mined by means of a na\\"ive statistical algorithm. An initial evaluation of such a method is provided, in addition to a discussion on its applicability to different scenarios.

  2. Query Translation on the Fly in Deep Web Integration

    Institute of Scientific and Technical Information of China (English)

    JIANG Fangjiao; JIA Linlin; MENG Xiaofeng

    2007-01-01

    To facilitate users to access the desired information,many researches have dedicated to the Deep Web (i.e. Web databases) integration. We focus on query translation which is an important part of the Deep Web integration. Our aim is to construct automatically a set of constraints mapping rules so that the system can translate the query from the integrated interface to the Web database interfaces based on them. We construct a concept hierarchy for the attributes of the query interfaces, especially, store the synonyms and the types (e.g. Number, Text, etc.) for every concept.At the same time, we construct the data hierarchies for some concepts if necessary. Then we present an algorithm to generate the constraint mapping rules based on these hierarchies. The approach is suitable for the scalability of such application and can be extended easily from one domain to another for its domain independent feature. The results of experiment show its effectiveness and efficiency.

  3. An Optimal Dynamic Data Structure for Stabbing-Semigroup Queries

    DEFF Research Database (Denmark)

    Agarwal, Pankaj K.; Arge, Lars; Kaplan, Haim;

    2012-01-01

    {R}$, the stabbing-semigroup query asks for computing $\\sum_{s \\in S(q)} \\omega(s)$. We propose a linear-size dynamic data structure, under the pointer-machine model, that answers queries in worst-case $O(\\log n)$ time and supports both insertions and deletions of intervals in amortized $O(\\log n)$ time....... It is the first data structure that attains the optimal $O(\\log n)$ bound for all three operations. Furthermore, our structure can easily be adapted to external memory, where we obtain a linear-size structure that answers queries and supports updates in $O(\\log_B n)$ I/Os, where B is the disk block size...

  4. Nowcasting Mobile Games Ranking Using Web Search Query Data

    Directory of Open Access Journals (Sweden)

    Yoones A. Sekhavat

    2016-01-01

    Full Text Available In recent years, the Internet has become embedded into the purchasing decision of consumers. The purpose of this paper is to study whether the Internet behavior of users correlates with their actual behavior in computer games market. Rather than proposing the most accurate model for computer game sales, we aim to investigate to what extent web search query data can be exploited to nowcast (contraction of “now” and “forecasting” referring to techniques used to make short-term forecasts (predict the present status of the ranking of mobile games in the world. Google search query data is used for this purpose, since this data can provide a real-time view on the topics of interest. Various statistical techniques are used to show the effectiveness of using web search query data to nowcast mobile games ranking.

  5. Investigation in Query System Framework for High Energy Physics

    CERN Document Server

    Jatuphattharachat, Thanat

    2017-01-01

    We summarize an investigation in query system framework for HEP (High Energy Physics). Our work was an investigation on distributed server part of Femtocode, which is a query language that provides the ability for physicists to make plots and other aggregations in real-time. To make the system more robust and capable of processing large amount of data quickly, it is necessary to deploy the system on a redundant and distributed computing cluster. This project aims to investigate third party coordination and resource management frameworks which fit into the design of real-time distributed query system. Zookeeper, Mesos and Marathon are the main frameworks for this investigation. The results indicate that Zookeeper is good for job coordinator and job tracking as it provides robust, fast, simple and transparent read and write process for all connecting client across distributed Zookeeper server. Furthermore, it also supports high availability access and consistency guarantee within specific time bound.

  6. On (dynamic) range minimum queries in external memory

    DEFF Research Database (Denmark)

    Arge, L.; Fischer, Johannes; Sanders, Peter

    2013-01-01

    We study the one-dimensional range minimum query (RMQ) problem in the external memory model. We provide the first space-optimal solution to the batched static version of the problem. On an instance with N elements and Q queries, our solution takes Θ(sort(N + Q)) = Θ( N+QB log M /B N+QB ) I....../O complexity and O(N + Q) space, where M is the size of the main memory and B is the block size. This is a factor of O(log M /B N) improvement in space complexity over the previous solutions. We also show that an instance of the batched dynamic RMQ problem with N updates and Q queries can be solved in O ( N...

  7. Regular paths in SparQL: querying the NCI Thesaurus.

    Science.gov (United States)

    Detwiler, Landon T; Suciu, Dan; Brinkley, James F

    2008-11-06

    OWL, the Web Ontology Language, provides syntax and semantics for representing knowledge for the semantic web. Many of the constructs of OWL have a basis in the field of description logics. While the formal underpinnings of description logics have lead to a highly computable language, it has come at a cognitive cost. OWL ontologies are often unintuitive to readers lacking a strong logic background. In this work we describe GLEEN, a regular path expression library, which extends the RDF query language SparQL to support complex path expressions over OWL and other RDF-based ontologies. We illustrate the utility of GLEEN by showing how it can be used in a query-based approach to defining simpler, more intuitive views of OWL ontologies. In particular we show how relatively simple GLEEN-enhanced SparQL queries can create views of the OWL version of the NCI Thesaurus that match the views generated by the web-based NCI browser.

  8. Evolutionary Multiobjective Query Workload Optimization of Cloud Data Warehouses

    Science.gov (United States)

    Dokeroglu, Tansel; Sert, Seyyit Alper; Cinar, Muhammet Serkan

    2014-01-01

    With the advent of Cloud databases, query optimizers need to find paretooptimal solutions in terms of response time and monetary cost. Our novel approach minimizes both objectives by deploying alternative virtual resources and query plans making use of the virtual resource elasticity of the Cloud. We propose an exact multiobjective branch-and-bound and a robust multiobjective genetic algorithm for the optimization of distributed data warehouse query workloads on the Cloud. In order to investigate the effectiveness of our approach, we incorporate the devised algorithms into a prototype system. Finally, through several experiments that we have conducted with different workloads and virtual resource configurations, we conclude remarkable findings of alternative deployments as well as the advantages and disadvantages of the multiobjective algorithms we propose. PMID:24892048

  9. Evolutionary Multiobjective Query Workload Optimization of Cloud Data Warehouses

    Directory of Open Access Journals (Sweden)

    Tansel Dokeroglu

    2014-01-01

    Full Text Available With the advent of Cloud databases, query optimizers need to find paretooptimal solutions in terms of response time and monetary cost. Our novel approach minimizes both objectives by deploying alternative virtual resources and query plans making use of the virtual resource elasticity of the Cloud. We propose an exact multiobjective branch-and-bound and a robust multiobjective genetic algorithm for the optimization of distributed data warehouse query workloads on the Cloud. In order to investigate the effectiveness of our approach, we incorporate the devised algorithms into a prototype system. Finally, through several experiments that we have conducted with different workloads and virtual resource configurations, we conclude remarkable findings of alternative deployments as well as the advantages and disadvantages of the multiobjective algorithms we propose.

  10. On (dynamic) range minimum queries in external memory

    DEFF Research Database (Denmark)

    Arge, L.; Fischer, Johannes; Sanders, Peter

    2013-01-01

    /O complexity and O(N + Q) space, where M is the size of the main memory and B is the block size. This is a factor of O(log M /B N) improvement in space complexity over the previous solutions. We also show that an instance of the batched dynamic RMQ problem with N updates and Q queries can be solved in O ( N......We study the one-dimensional range minimum query (RMQ) problem in the external memory model. We provide the first space-optimal solution to the batched static version of the problem. On an instance with N elements and Q queries, our solution takes Θ(sort(N + Q)) = Θ( N+QB log M /B N+QB ) I...

  11. Semantic Annotations and Querying of Web Data Sources

    Science.gov (United States)

    Hornung, Thomas; May, Wolfgang

    A large part of the Web, actually holding a significant portion of the useful information throughout the Web, consists of views on hidden databases, provided by numerous heterogeneous interfaces that are partly human-oriented via Web forms ("Deep Web"), and partly based on Web Services (only machine accessible). In this paper we present an approach for annotating these sources in a way that makes them citizens of the Semantic Web. We illustrate how queries can be stated in terms of the ontology, and how the annotations are used to selected and access appropriate sources and to answer the queries.

  12. Tag cloud generation for results of multiple keywords queries

    DEFF Research Database (Denmark)

    2013-01-01

    In this paper we study tag cloud generation for retrieved results of multiple keyword queries. It is motivated by many real world scenarios such as personalization tasks, surveillance systems and information retrieval tasks defined with multiple keywords. We adjust the state-of-the-art tag cloud...... generation techniques for multiple keywords query results. Consequently, we conduct the extensive evaluation on top of three distinct collaborative tagging systems. The graph-based methods perform significantly better for the Movielens and Bibsonomy datasets. Tag cloud generation based on maximal coverage...

  13. Instant jQuery Flot visual data analysis

    CERN Document Server

    Peiris, Brian

    2013-01-01

    Filled with practical, step-by-step instructions and clear explanations for the most important and useful tasks. A quick, instruction-based guide full of examples that details on the various aspects of Flot and how users can apply it to data groups for interactive data representation techniques.If you are a data visualization developer, mapping and presentation software developer, or anyone with an interest in jQuery visualization, this book is ideal for you. If you have a working knowledge of jQuery and JavaScript, you can use this book to add sophisticated visualizations to your web applicat

  14. Exponential Lower Bounds and Separation for Query Rewriting

    CERN Document Server

    Kikot, Stanislav; Podolskii, Vladimir; Zakharyaschev, Michael

    2012-01-01

    We establish connections between the size of circuits and formulas computing monotone Boolean functions and the size of first-order and nonrecursive Datalog rewritings for conjunctive queries over OWL 2 QL ontologies. We use known lower bounds and separation results from circuit complexity to prove similar results for the size of rewritings that do not use non-signature constants. For example, we show that, in the worst case, positive existential and nonrecursive Datalog rewritings are exponentially longer than the original queries; nonrecursive Datalog rewritings are in general exponentially more succinct than positive existential rewritings; while first-order rewritings can be superpolynomially more succinct than positive existential rewritings.

  15. Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries

    DEFF Research Database (Denmark)

    Jakobsen, Kim Ahlstrøm; Andersen, Alex B.; Hose, Katja

    2015-01-01

    data warehouses and data cubes. Today, external data sources are essential for analytics and, as the Semantic Web gains popularity, more and more external sources are available in native RDF. With the recent SPARQL 1.1 standard, performing analytical queries over RDF data sources has finally become......In today’s data-driven world, analytical querying, typically based on the data cube concept, is the cornerstone of answering important business questions and making data-driven decisions. Traditionally, the underlying analytical data was mostly internal to the organization and stored in relational...

  16. Approximate ad-hoc query engine for simulation data

    Energy Technology Data Exchange (ETDEWEB)

    Abdulla, G; Baldwin, C; Critchlow, T; Kamimura, R; Lozares, I; Musick, R; Tang, N; Lee, B S; Snapp, R

    2001-02-01

    In this paper, we describe AQSim, an ongoing effort to design and implement a system to manage terabytes of scientific simulation data. The goal of this project is to reduce data storage requirements and access times while permitting ad-hoc queries using statistical and mathematical models of the data. In order to facilitate data exchange between models based on different representations, we are evaluating using the ASCI common data model which is comprised of several layers of increasing semantic complexity. To support queries over the spatial-temporal mesh structured data we are in the process of defining and implementing a grammar for MeshSQL.

  17. I/O-Efficient Dynamic Planar Range Skyline Queries

    DEFF Research Database (Denmark)

    Kejlberg-Rasmussen, Casper; Tsakalidis, Konstantinos; Tsichlas, Kostas

    We present the first fully dynamic worst case I/O-efficient data structures that support planar orthogonal \\textit{3-sided range skyline reporting queries} in $\\bigO (\\log_{2B^\\epsilon} n + \\frac{t}{B^{1-\\epsilon}})$ I/Os and updates in $\\bigO (\\log_{2B^\\epsilon} n)$ I/Os, using $\\bigO (\\frac......O(\\log^{\\bigO(1)}n +t)$ worst case time must occupy $\\Omega(n \\frac{\\log n}{\\log \\log n})$ space, by adapting a similar lower bounding argument for planar 4-sided range reporting queries....

  18. Application of Bees Algorithm in Multi-Join Query Optimization

    Directory of Open Access Journals (Sweden)

    Mohammad Alamery

    2012-09-01

    Full Text Available Multi-join query optimization is an important technique for designing and implementing database management system. It is a crucial factor that affects the capability of database. This paper proposes a Bees algorithm that simulates the foraging behavior of honey bee swarm to solve Multi-join query optimization problem. The performance of the Bees algorithm and Ant Colony Optimization algorithm are compared with respect to computational time and the simulation result indicates that Bees algorithm is more effective and efficient.

  19. Modeling Large Time Series for Efficient Approximate Query Processing

    DEFF Research Database (Denmark)

    Perera, Kasun S; Hahmann, Martin; Lehner, Wolfgang

    2015-01-01

    Evolving customer requirements and increasing competition force business organizations to store increasing amounts of data and query them for information at any given time. Due to the current growth of data volumes, timely extraction of relevant information becomes more and more difficult...... these issues, compression techniques have been introduced in many areas of data processing. In this paper, we outline a new system that does not query complete datasets but instead utilizes models to extract the requested information. For time series data we use Fourier and Cosine transformations and piece...

  20. Efficient Path Query and Reasoning Method Based on Rare Axis

    Institute of Scientific and Technical Information of China (English)

    姜洋; 冯志勇; 王鑫马晓宁

    2015-01-01

    A new concept of rare axis based on statistical facts is proposed, and an evaluation algorithm is designed thereafter. For the nested regular expressions containing rare axes, the proposed algorithm can reduce its evaluation complexity from polynomial time to nearly linear time. The distributed technique is also employed to construct the navigation axis indexes for resource description framework (RDF) graph data. Experiment results in DrugBank and BioGRID show that this method can improve the query efficiency significantly while ensuring the accuracy and meet the query requirements on Web-scale RDF graph data.

  1. Mobile Database System: Role of Mobility on the Query Processing

    CERN Document Server

    Sharma, Samidha Dwivedi

    2010-01-01

    The rapidly expanding technology of mobile communication will give mobile users capability of accessing information from anywhere and any time. The wireless technology has made it possible to achieve continuous connectivity in mobile environment. When the query is specified as continuous, the requesting mobile user can obtain continuously changing result. In order to provide accurate and timely outcome to requesting mobile user, the locations of moving object has to be closely monitored. The objective of paper is to discuss the problem related to the role of personal and terminal mobility and query processing in the mobile environment.

  2. Analysis of DNS cache effects on query distribution.

    Science.gov (United States)

    Wang, Zheng

    2013-01-01

    This paper studies the DNS cache effects that occur on query distribution at the CN top-level domain (TLD) server. We first filter out the malformed DNS queries to purify the log data pollution according to six categories. A model for DNS resolution, more specifically DNS caching, is presented. We demonstrate the presence and magnitude of DNS cache effects and the cache sharing effects on the request distribution through analytic model and simulation. CN TLD log data results are provided and analyzed based on the cache model. The approximate TTL distribution for domain name is inferred quantificationally.

  3. 应用分布式索引提高海量数据查询性能%Improve Big Data Query Performance by Applying Distributed Indexing

    Institute of Scientific and Technical Information of China (English)

    窦晓峰; 陈胜; 王熠航; 麦联叨; 由建宏

    2014-01-01

    In the field of telecommunications precision marketing and ad-hoc query, there are a lot of random queries scenarios on one or more wide-tables (which have more than 50 fields). In the traditional system (the queries are performed on the database directly), the query response time can be optimized less than a few seconds to tens of seconds when the database records size is under 10 million. When the data size reaches tens of millions, hundreds of millions or even more than one billion records, whatever optimization including changing indexing mechanism are unable to meet the second-level concurrency query requirements. In the new query system, we introduce the Solr distributed index layer to solve these problems. The layer will index the database records firstly and queries will access the Solr index layer and not perform on the database directly, therefore, the performance will be improved highly. After a comparison of the two processing patterns in same environment, for the data of 50 million, 20 per concurrent access query scenario, the traditional accessing queries all are timeout; while the other’s queries can be returned within 2 seconds and all are success.%在电信领域的精准化营销、即席查询业务中,存在着大量针对一张宽表或几张宽表(超过50字段)的随机查询场景。传统处理模式(直接查询数据库)在数据量不大(<1000万)时,查询响应时间可优化到几秒至数十秒级,而当数据量到达几千万、上亿甚至十亿记录以上时,此处理模式无论如何优化或更改索引机制,都无法满足秒级并发查询要求。新的处理模式通过引入分布式 Solr 索引层解决上述问题。索引层预先对数据库记录建立索引,查询不再作用于数据库而直接查询索引层,如此,可大幅提高查询性能。经过对两种处理模式的对比验证,在相同环境下,数据量到达5000万,每秒20并发访问的宽表查询场景,传统处理

  4. Annual Offshore Oil Yield Tops 10 Million Tons

    Institute of Scientific and Technical Information of China (English)

    1996-01-01

    @@ China's offshore oil output has already exceeded 10million tons in September, 1996, compared with last year's total 8.7 million tons. Oil industry executives said production for the whole year is likely to exceed 13 million tons or even 14 million tons. That means the China National Offshore Oil Corp. (CNOOC). established in 1982. will set a record in reaching such an annual output.The United States and the former Soviet Union spend 20and 25 years respectively toreach a similar output.

  5. RESEARCH ON EXTENSION OF SPARQL ONTOLOGY QUERY LANGUAGE CONSIDERING THE COMPUTATION OF INDOOR SPATIAL RELATIONS

    Directory of Open Access Journals (Sweden)

    C. Li

    2015-05-01

    Full Text Available A method suitable for indoor complex semantic query considering the computation of indoor spatial relations is provided According to the characteristics of indoor space. This paper designs ontology model describing the space related information of humans, events and Indoor space objects (e.g. Storey and Room as well as their relations to meet the indoor semantic query. The ontology concepts are used in IndoorSPARQL query language which extends SPARQL syntax for representing and querying indoor space. And four types specific primitives for indoor query, "Adjacent", "Opposite", "Vertical" and "Contain", are defined as query functions in IndoorSPARQL used to support quantitative spatial computations. Also a method is proposed to analysis the query language. Finally this paper adopts this method to realize indoor semantic query on the study area through constructing the ontology model for the study building. The experimental results show that the method proposed in this paper can effectively support complex indoor space semantic query.

  6. Research on Extension of Sparql Ontology Query Language Considering the Computation of Indoor Spatial Relations

    Science.gov (United States)

    Li, C.; Zhu, X.; Guo, W.; Liu, Y.; Huang, H.

    2015-05-01

    A method suitable for indoor complex semantic query considering the computation of indoor spatial relations is provided According to the characteristics of indoor space. This paper designs ontology model describing the space related information of humans, events and Indoor space objects (e.g. Storey and Room) as well as their relations to meet the indoor semantic query. The ontology concepts are used in IndoorSPARQL query language which extends SPARQL syntax for representing and querying indoor space. And four types specific primitives for indoor query, "Adjacent", "Opposite", "Vertical" and "Contain", are defined as query functions in IndoorSPARQL used to support quantitative spatial computations. Also a method is proposed to analysis the query language. Finally this paper adopts this method to realize indoor semantic query on the study area through constructing the ontology model for the study building. The experimental results show that the method proposed in this paper can effectively support complex indoor space semantic query.

  7. A Knowledge Based Approach for Query Optimization in Preferential Mapping Relational Databases

    Directory of Open Access Journals (Sweden)

    P.Ranjani

    2014-10-01

    Full Text Available Relational query databases provide a high level declarative interface to access data stored in relational databases. Two key components of the query evaluation component of a SQL database system are the query optimizer and the query execution engine. System R optimization framework since this was a remarkably elegant approach that helped fuel much of the subsequent work in optimization. Transparent and efficient evaluations of preferential queries are allowed by relational database systems. This results in experimenting extensive evaluation on two real world data sets which illustrates the feasibility and advantages of the framework. Early pruning of results based on score or confidence during query processing are enabled by combining the prefer operator with the rank and rank join operators. During preference evaluation, both the conditional and the scoring part of a preference are used. The conditional part acts as a soft constraint that determines which records are scored without disqualifying any duplicates from the query result. To introduce a preferences mapping relational data model that extends database with profile preferences for query optimizing and an extended algebra that captures the essence of processing queries with ranking method. Based on a set of algebraic properties and a cost model that to propose, to provide several query optimization strategies for extended query plans. To describe a query execution algorithm that blends preference evaluation with query execution, while making effective use of the native query engine.

  8. 'What's in the NIDDK CDR?'--public query tools for the NIDDK central data repository.

    Science.gov (United States)

    Pan, Huaqin; Ardini, Mary-Anne; Bakalov, Vesselina; DeLatte, Michael; Eggers, Paul; Ganapathi, Laxminarayana; Hollingsworth, Craig R; Levy, Joshua; Li, Sheping; Pratt, Joseph; Pugh, Norma; Qin, Ying; Rasooly, Rebekah; Ray, Helen; Richardson, Jean E; Flynn Riley, Amanda; Rogers, Susan M; Tan, Sylvia; Turner, Charles F; White, Stacie; Cooley, Philip C

    2013-01-01

    The National Institute of Diabetes and Digestive Disease (NIDDK) Central Data Repository (CDR) is a web-enabled resource available to researchers and the general public. The CDR warehouses clinical data and study documentation from NIDDK funded research, including such landmark studies as The Diabetes Control and Complications Trial (DCCT, 1983-93) and the Epidemiology of Diabetes Interventions and Complications (EDIC, 1994-present) follow-up study which has been ongoing for more than 20 years. The CDR also houses data from over 7 million biospecimens representing 2 million subjects. To help users explore the vast amount of data stored in the NIDDK CDR, we developed a suite of search mechanisms called the public query tools (PQTs). Five individual tools are available to search data from multiple perspectives: study search, basic search, ontology search, variable summary and sample by condition. PQT enables users to search for information across studies. Users can search for data such as number of subjects, types of biospecimens and disease outcome variables without prior knowledge of the individual studies. This suite of tools will increase the use and maximize the value of the NIDDK data and biospecimen repositories as important resources for the research community. Database URL: https://www.niddkrepository.org/niddk/home.do.

  9. Most Recent Match Queries in On-Line Suffix Trees

    DEFF Research Database (Denmark)

    Larsson, N. Jesper

    2014-01-01

    A suffix tree is able to efficiently locate a pattern in an indexed string, but not in general the most recent copy of the pattern in an online stream, which is desirable in some applications. We study the most general version of the problem of locating a most recent match: supporting queries...

  10. Query recommendation in the information domain of children

    NARCIS (Netherlands)

    Duarte Torres, Sergio Raúl; Hiemstra, Djoerd; Weber, Ingmar; Serdyukov, Pavel

    2014-01-01

    Children represent an increasing part of web users. One of the key problems that hamper their search experience is their limited vocabulary, their difficulty to use the right keywords, and the inappropriateness of general- purpose query suggestions. In this work, we propose a method that uses tags f

  11. The Islands Approach to Nearest Neighbor Querying in Spatial Networks

    DEFF Research Database (Denmark)

    Huang, Xuegang; Jensen, Christian Søndergaard; Saltenis, Simonas

    2005-01-01

    Much research has recently been devoted to the data management foundations of location-based mobile services. In one important scenario, the service users are constrained to a transportation network. As a result, query processing in spatial road networks is of interest. We propose a versatile app...

  12. Using clinicians' search query data to monitor influenza epidemics.

    Science.gov (United States)

    Santillana, Mauricio; Nsoesie, Elaine O; Mekaru, Sumiko R; Scales, David; Brownstein, John S

    2014-11-15

    Search query information from a clinician's database, UpToDate, is shown to predict influenza epidemics in the United States in a timely manner. Our results show that digital disease surveillance tools based on experts' databases may be able to provide an alternative, reliable, and stable signal for accurate predictions of influenza outbreaks.

  13. Optimizing XML Information Retrieval Query Execution at the Physical Level

    NARCIS (Netherlands)

    Os, van R.

    2007-01-01

    XML is emerging as a standard format for information interchange and storage of structured information. The wide-spread use of XML has sparked the interest of both the database and information retrieval research communities. XML databases are designed to store and query large volumes of XML data. St

  14. Storing and Querying Probabilistic XML Using a Probabilistic Relational DBMS

    NARCIS (Netherlands)

    Hollander, E.S.; Keulen, van M.

    2010-01-01

    This work explores the feasibility of storing and querying probabilistic XML in a probabilistic relational database. Our approach is to adapt known techniques for mapping XML to relational data such that the possible worlds are preserved. We show that this approach can work for any XML-to-relational

  15. Effcient Data Access for Location-Dependent Spatial Queries

    Institute of Scientific and Technical Information of China (English)

    Kwangjin Park

    2014-01-01

    When the mobile environment consists of light-weight devices, the loss of network connectivity and scarce resources, e.g., low battery power and limited memory, become primary issues of concern in order to effciently support portable wireless devices. In this paper, we propose an index-based peer-to-peer data access method that uses a new Hierarchical Location-Based Sequential (HLBS) index. We then propose a novel distributed Nearest First Broadcast (NFB) algorithm. Both HLBS and NFB are specifically designed for mobile peer-to-peer service in wireless broadcast environments. The system has a lower response time, because the client only contacts a qualified service provider by accessing the HLBS and quickly retrieves the data to answer the query by using NFB. HLBS and NFB design the index for spatial ob jects according to the positions of individual clients and transfer the index in the order arranged so that the spatial query can be processed even after the user tunes the partial index. Hence, this design can support rapid and energy-efficient service. A performance evaluation is conducted to compare the proposed algorithms with algorithms based on R-tree and Hilbert-curve air indexes. The results show that the proposed data dissemination algorithm with the HLBS index is scalable and energy efficient in both range queries and nearest neighbor queries.

  16. Using Clinicians’ Search Query Data to Monitor Influenza Epidemics

    Science.gov (United States)

    Santillana, Mauricio; Nsoesie, Elaine O.; Mekaru, Sumiko R.; Scales, David; Brownstein, John S.

    2014-01-01

    Search query information from a clinician's database, UpToDate, is shown to predict influenza epidemics in the United States in a timely manner. Our results show that digital disease surveillance tools based on experts' databases may be able to provide an alternative, reliable, and stable signal for accurate predictions of influenza outbreaks. PMID:25115873

  17. Efficient Mining of Frequent Closed XML Query Pattern

    Institute of Scientific and Technical Information of China (English)

    Jian-Hua Feng; Qian Qian; Jian-Yong Wang; Li-Zhu Zhou

    2007-01-01

    Previous research works have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. Upon discovery of frequent closed XML query patterns, indexing and caching can be effectively adopted for query performance enhancement. Most of the previous algorithms for finding frequent patterns basically introduced a straightforward generate-and-test strategy. In this paper, we present SOLARIA*, an efficient algorithm for mining frequent closed XML query patterns without candidate maintenance and costly tree-containment checking. Efficient algorithm of sequence mining is involved in discovering frequent tree-structured patterns, which aims at replacing expensive containment testing with cheap parent-child checking in sequences. SOLARIA* deeply prunes unrelated search space for frequent pattern enumeration by parent-child relationship constraint. By a thorough experimental study on various real-life data, we demonstrate the efficiency and scalability of SOLARIA* over the previous known alternative. SOLARIA* is also linearly scalable in terms of XML queries' size.

  18. Approaches for parallel data loading and data querying

    Directory of Open Access Journals (Sweden)

    Vlad DIACONITA

    2015-07-01

    Full Text Available This paper aims to bring contributions in data loading and data querying using products from the Apache Hadoop ecosystem. Currently, we talk about Big Data at up to zettabytes scale (10^21 bytes. Research in this area is usually interdisciplinary combining elements from statistics, system integration, parallel processing and cloud computing.

  19. MRA Based Efficient Database Storing and Fast Querying Technique

    Directory of Open Access Journals (Sweden)

    Mitko Kostov

    2017-02-01

    Full Text Available In this paper we consider a specific way of organizing 1D signals or 2D image databases, such that a more efficient storage and faster querying is achieved. A multiresolution technique of data processing is used in order of saving the most significant processed data.

  20. A Simple Blueprint for Automatic Boolean Query Processing.

    Science.gov (United States)

    Salton, G.

    1988-01-01

    Describes a new Boolean retrieval environment in which an extended soft Boolean logic is used to automatically construct queries from original natural language formulations provided by users. Experimental results that compare the retrieval effectiveness of this method to conventional Boolean and vector processing are discussed. (27 references)…

  1. QUrdPro: Query processing system for Urdu Language

    Directory of Open Access Journals (Sweden)

    Rukhsana Thaker,

    2015-06-01

    Full Text Available The tremendous increase in the multilingual data on the internet has increased the demand for efficient retrieval of information. Urdu is one of the widely spoken and written languages of south Asia. Due to unstructured format of Urdu language information retrieval of information is a big challenge. Question Answering systems aims to retrieve point-to-point answers rather than flooding with documents. It is needed when the user gets an in depth knowledge in a particular domain. When user needs some information, it must give the relevant answer. The question-answer retrieval of ontology knowledge base provides a convenient way to obtain knowledge for use, but the natural language need to be mapped to the query statement of ontology. This paper describes a query processing system QUrdPro based on ontology. This system is a combination of NLP and Ontology. It makes use of ontology in several phases for efficient query processing. Our focus is on the knowledge derived from the concepts used in the ontology and the relationship between these concepts. In this paper we describe the architecture of QUrdPro ,query processing system for Urdu and process model for the system is also discussed in detail.

  2. A Dynamic Extension of ATLAS Run Query Service

    CERN Document Server

    Buliga, Alexandru

    2015-01-01

    The ATLAS RunQuery is a primarily web-based service for the ATLAS community to access meta information about the data taking in a concise format. In order to provide a better user experience, the service was moved to use a new technology, involving concepts such as: Web Sockets, on demand data, client-side scripting, memory caching and parallelizing execution.

  3. Developing responsive web applications with Ajax and jQuery

    CERN Document Server

    Patel, Sandeep Kumar

    2014-01-01

    This book is a standard tutorial for web application developers presented in a comprehensive, step-by-step manner to explain the nuances involved. It has an abundance of code and examples supporting explanations of each feature. This book is intended for Java developers wanting to create rich and responsive applications using AJAX. Basic experience of using jQuery is assumed.

  4. Tag cloud generation for results of multiple keywords queries

    DEFF Research Database (Denmark)

    2013-01-01

    In this paper we study tag cloud generation for retrieved results of multiple keyword queries. It is motivated by many real world scenarios such as personalization tasks, surveillance systems and information retrieval tasks defined with multiple keywords. We adjust the state-of-the-art tag cloud...

  5. Boolean Queries and Term Dependencies in Probabilistic Retrieval Models.

    Science.gov (United States)

    Croft, W. Bruce

    1986-01-01

    Proposes approach to integrating Boolean and statistical systems where Boolean queries are interpreted as a means of specifying term dependencies in relevant set of documents. Highlights include series of retrieval experiments designed to test retrieval strategy based on term dependence model and relation of results to other work. (18 references)…

  6. VIGOR: Interactive Visual Exploration of Graph Query Results.

    Science.gov (United States)

    Pienta, Robert; Hohman, Fred; Endert, Alex; Tamersoy, Acar; Roundy, Kevin; Gates, Chris; Navathe, Shamkant; Chau, Duen Horng

    2017-08-29

    Finding patterns in graphs has become a vital challenge in many domains from biological systems, network security, to finance (e.g., finding money laundering rings of bankers and business owners). While there is significant interest in graph databases and querying techniques, less research has focused on helping analysts make sense of underlying patterns within a group of subgraph results. Visualizing graph query results is challenging, requiring effective summarization of a large number of subgraphs, each having potentially shared node-values, rich node features, and flexible structure across queries. We present VIGOR, a novel interactive visual analytics system, for exploring and making sense of query results. VIGOR uses multiple coordinated views, leveraging different data representations and organizations to streamline analysts sensemaking process. VIGOR contributes: (1) an exemplar-based interaction technique, where an analyst starts with a specific result and relaxes constraints to find other similar results or starts with only the structure (i.e., without node value constraints), and adds constraints to narrow in on specific results; and (2) a novel feature-aware subgraph result summarization. Through a collaboration with Symantec, we demonstrate how VIGOR helps tackle real-world problems through the discovery of security blindspots in a cybersecurity dataset with over 11,000 incidents. We also evaluate VIGOR with a within-subjects study, demonstrating VIGOR's ease of use over a leading graph database management system, and its ability to help analysts understand their results at higher speed and make fewer errors.

  7. Just-In-Time Data Distribution for Analytical Query Processing

    NARCIS (Netherlands)

    Ivanova, M.; Kersten, M.; Groffen, F.

    2012-01-01

    Distributed processing commonly requires data spread across machines using a priori static or hash-based data allocation. In this paper, we explore an alternative approach that starts from a master node in control of the complete database, and a variable number of worker nodes for delegated query pr

  8. A Survey of Query Auto Completion in Information Retrieval

    NARCIS (Netherlands)

    Cai, F.; de Rijke, M.

    2016-01-01

    In information retrieval, query auto completion (QAC), also known as type-ahead [Xiao et al., 2013, Cai et al., 2014b] and auto-complete suggestion [Jain and Mishne, 2010], refers to the following functionality: given a prefix consisting of a number of characters entered into a search box, the user i

  9. Applying Genetic Algorithms To Query Optimization in Document Retrieval.

    Science.gov (United States)

    Horng, Jorng-Tzong; Yeh, Ching-Chang

    2000-01-01

    Proposes a novel approach to automatically retrieve keywords and then uses genetic algorithms to adapt the keyword weights. Discusses Chinese text retrieval, term frequency rating formulas, vector space models, bigrams, the PAT-tree structure for information retrieval, query vectors, and relevance feedback. (Author/LRW)

  10. In-route skyline querying for location-based services

    DEFF Research Database (Denmark)

    Xuegang, Huang; Jensen, Kristian S.

    2005-01-01

    With the emergence of an infrastructure for location-aware mobile services, the processing of advanced, location-based queries that are expected to underlie such services is gaining in relevance, While much work has assumed that users move in Euclidean space, this paper assumes that movement is c...

  11. Accelerating Network Traffic Analytics Using Query-DrivenVisualization

    Energy Technology Data Exchange (ETDEWEB)

    Bethel, E. Wes; Campbell, Scott; Dart, Eli; Stockinger, Kurt; Wu,Kesheng

    2006-07-29

    Realizing operational analytics solutions where large and complex data must be analyzed in a time-critical fashion entails integrating many different types of technology. This paper focuses on an interdisciplinary combination of scientific data management and visualization/analysis technologies targeted at reducing the time required for data filtering, querying, hypothesis testing and knowledge discovery in the domain of network connection data analysis. We show that use of compressed bitmap indexing can quickly answer queries in an interactive visual data analysis application, and compare its performance with two alternatives for serial and parallel filtering/querying on 2.5 billion records worth of network connection data collected over a period of 42 weeks. Our approach to visual network connection data exploration centers on two primary factors: interactive ad-hoc and multiresolution query formulation and execution over n dimensions and visual display of then-dimensional histogram results. This combination is applied in a case study to detect a distributed network scan and to then identify the set of remote hosts participating in the attack. Our approach is sufficiently general to be applied to a diverse set of data understanding problems as well as used in conjunction with a diverse set of analysis and visualization tools.

  12. Project Lefty: More Bang for the Search Query

    Science.gov (United States)

    Varnum, Ken

    2010-01-01

    This article describes the Project Lefty, a search system that, at a minimum, adds a layer on top of traditional federated search tools that will make the wait for results more worthwhile for researchers. At best, Project Lefty improves search queries and relevance rankings for web-scale discovery tools to make the results themselves more relevant…

  13. Project Lefty: More Bang for the Search Query

    Science.gov (United States)

    Varnum, Ken

    2010-01-01

    This article describes the Project Lefty, a search system that, at a minimum, adds a layer on top of traditional federated search tools that will make the wait for results more worthwhile for researchers. At best, Project Lefty improves search queries and relevance rankings for web-scale discovery tools to make the results themselves more relevant…

  14. Storage, Querying and Visualization of Clinical Dental Records

    Science.gov (United States)

    2001-10-25

    extractions for orthodontic treatment), the design of the TOOTH table allows the quadrant to be ignored and therefore search for the presence of the first...query to find all records where all four 4’s (first permanent premolars) have been extracted but other teeth are present (possibly indicating

  15. Learning from the History of Distributed Query Processing

    DEFF Research Database (Denmark)

    Betz, Heiko; Gropengießer, Francis; Hose, Katja

    2012-01-01

    The vision of the Semantic Web has triggered the development of various new applications and opened up new directions in research. Recently, much effort has been put into the development of techniques for query processing over Linked Data. Being based upon techniques originally developed for dist...

  16. Algebra-Based Optimization of XML-Extended OLAP Queries

    DEFF Research Database (Denmark)

    Yin, Xuepeng; Pedersen, Torben Bach

    2006-01-01

    In today’s OLAP systems, integrating fast changing data physically into a cube is complex and time-consuming. Our solution, the “OLAP-XML Federation System,” makes it possible to reference the fast changing data in XML format in OLAP queries without physical integration. In this paper, we introdu...

  17. Restructuring Large Data Hierarchies for Scientific Query Tools

    Energy Technology Data Exchange (ETDEWEB)

    Thomas, M

    2005-02-08

    Today's large-scale scientific simulations produce data sets tens to hundreds of terabytes in size. The DataFoundry project is developing querying and analysis tools for these data sets. The Approximate Ad-Hoc Query Engine for Simulation Data (AQSIM) uses a multi-resolution, tree-shaped data structure that allows users to place runtime limits on queries over scientific simulation data. In this AQSIM data hierarchy, each node in the tree contains an abstract model describing all of the information contained in the subtree below that node. AQSIM is able to create the data hierarchy in a single pass. However, the nodes in the hierarchy frequently have low node fanout, which leads to inefficient I/O behavior during query processing. Low node fanout is a common problem in tree-shaped indices. This paper presents a set of one-pass tree ''pruning'' algorithms that efficiently restructure the data hierarchy by removing inner nodes, thereby increasing node fanout. As our experimental results show, the best approach is a combination of two algorithms, one that focuses on increasing node fanout and one that attempts to reduce the maximum tree height.

  18. Relaxing rdf queries based on user and domain preferences

    DEFF Research Database (Denmark)

    Dolog, Peter; Stueckenschmidt, Heiner; Wache, Holger

    2009-01-01

    knowledge and user preferences. We describe a framework for information access that combines query refinement and relaxation in order to provide robust, personalized access to heterogeneous resource description framework data as well as an implementation in terms of rewriting rules and explain its...

  19. A Foundation for Efficient Indoor Distance-Aware Query Processing

    DEFF Research Database (Denmark)

    Lu, Hua; Cao, Xin; Jensen, Christian Søndergaard

    2012-01-01

    indoor distances. However, existing indoor space models do not account well for indoor distances. To address this shortcoming, we propose a data management infrastructure that captures indoor distance and facilitates distance-aware query processing. In particular, we propose a distance-aware indoor space...

  20. Adapting to the Shifting Intent of Search Queries

    CERN Document Server

    Syed, Umar; Mishra, Nina

    2010-01-01

    Search engines today present results that are often oblivious to abrupt shifts in intent. For example, the query `independence day' usually refers to a US holiday, but the intent of this query abruptly changed during the release of a major film by that name. While no studies exactly quantify the magnitude of intent-shifting traffic, studies suggest that news events, seasonal topics, pop culture, etc account for 50% of all search queries. This paper shows that the signals a search engine receives can be used to both determine that a shift in intent has happened, as well as find a result that is now more relevant. We present a meta-algorithm that marries a classifier with a bandit algorithm to achieve regret that depends logarithmically on the number of query impressions, under certain assumptions. We provide strong evidence that this regret is close to the best achievable. Finally, via a series of experiments, we demonstrate that our algorithm outperforms prior approaches, particularly as the amount of intent-...