WorldWideScience

Sample records for identifying related queries

  1. Improving accuracy for identifying related PubMed queries by an integrated approach.

    Science.gov (United States)

    Lu, Zhiyong; Wilbur, W John

    2009-10-01

    PubMed is the most widely used tool for searching biomedical literature online. As with many other online search tools, a user often types a series of multiple related queries before retrieving satisfactory results to fulfill a single information need. Meanwhile, it is also a common phenomenon to see a user type queries on unrelated topics in a single session. In order to study PubMed users' search strategies, it is necessary to be able to automatically separate unrelated queries and group together related queries. Here, we report a novel approach combining both lexical and contextual analyses for segmenting PubMed query sessions and identifying related queries and compare its performance with the previous approach based solely on concept mapping. We experimented with our integrated approach on sample data consisting of 1539 pairs of consecutive user queries in 351 user sessions. The prediction results of 1396 pairs agreed with the gold-standard annotations, achieving an overall accuracy of 90.7%. This demonstrates that our approach is significantly better than the previously published method. By applying this approach to a one day query log of PubMed, we found that a significant proportion of information needs involved more than one PubMed query, and that most of the consecutive queries for the same information need are lexically related. Finally, the proposed PubMed distance is shown to be an accurate and meaningful measure for determining the contextual similarity between biological terms. The integrated approach can play a critical role in handling real-world PubMed query log data as is demonstrated in our experiments.

  2. Relative aggregation operator in database fuzzy querying

    Directory of Open Access Journals (Sweden)

    Luminita DUMITRIU

    2005-12-01

    Full Text Available Fuzzy selection criteria querying relational databases include vague terms; they usually refer linguistic values form the attribute linguistic domains, defined as fuzzy sets. Generally, when a vague query is processed, the definitions of vague terms must already exist in a knowledge base. But there are also cases when vague terms must be dynamically defined, when a particular operation is used to aggregate simple criteria in a complex selection. The paper presents a new aggregation operator and the corresponding algorithm to evaluate the fuzzy query.

  3. A Relational Algebra Query Language for Programming Relational Databases

    Science.gov (United States)

    McMaster, Kirby; Sambasivam, Samuel; Anderson, Nicole

    2011-01-01

    In this paper, we describe a Relational Algebra Query Language (RAQL) and Relational Algebra Query (RAQ) software product we have developed that allows database instructors to teach relational algebra through programming. Instead of defining query operations using mathematical notation (the approach commonly taken in database textbooks), students…

  4. Persistent Identifiers for Improved Accessibility for Linked Data Querying

    Science.gov (United States)

    Shepherd, A.; Chandler, C. L.; Arko, R. A.; Fils, D.; Jones, M. B.; Krisnadhi, A.; Mecum, B.

    2016-12-01

    The adoption of linked open data principles within the geosciences has increased the amount of accessible information available on the Web. However, this data is difficult to consume for those who are unfamiliar with Semantic Web technologies such as Web Ontology Language (OWL), Resource Description Framework (RDF) and SPARQL - the RDF query language. Consumers would need to understand the structure of the data and how to efficiently query it. Furthermore, understanding how to query doesn't solve problems of poor precision and recall in search results. For consumers unfamiliar with the data, full-text searches are most accessible, but not ideal as they arrest the advantages of data disambiguation and co-reference resolution efforts. Conversely, URI searches across linked data can deliver improved search results, but knowledge of these exact URIs may remain difficult to obtain. The increased adoption of Persistent Identifiers (PIDs) can lead to improved linked data querying by a wide variety of consumers. Because PIDs resolve to a single entity, they are an excellent data point for disambiguating content. At the same time, PIDs are more accessible and prominent than a single data provider's linked data URI. When present in linked open datasets, PIDs provide balance between the technical and social hurdles of linked data querying as evidenced by the NSF EarthCube GeoLink project. The GeoLink project, funded by NSF's EarthCube initiative, have brought together data repositories include content from field expeditions, laboratory analyses, journal publications, conference presentations, theses/reports, and funding awards that span scientific studies from marine geology to marine ecosystems and biogeochemistry to paleoclimatology.

  5. Group-by Skyline Query Processing in Relational Engines

    DEFF Research Database (Denmark)

    Yiu, Man Lung; Luk, Ming-Hay; Lo, Eric

    2009-01-01

    the missing cost model for the BBS algorithm. Experimental results show that our techniques are able to devise the best query plans for a variety of group-by skyline queries. Our focus is on algorithms that can be directly implemented in today's commercial database systems without the addition of new access......The skyline operator was first proposed in 2001 for retrieving interesting tuples from a dataset. Since then, 100+ skyline-related papers have been published; however, we discovered that one of the most intuitive and practical type of skyline queries, namely, group-by skyline queries remains...

  6. Representing and querying now-relative relational medical data.

    Science.gov (United States)

    Anselma, Luca; Piovesan, Luca; Stantic, Bela; Terenziani, Paolo

    2018-03-01

    Temporal information plays a crucial role in medicine. Patients' clinical records are intrinsically temporal. Thus, in Medical Informatics there is an increasing need to store, support and query temporal data (particularly in relational databases), in order, for instance, to supplement decision-support systems. In this paper, we show that current approaches to relational data have remarkable limitations in the treatment of "now-relative" data (i.e., data holding true at the current time). This can severely compromise their applicability in general, and specifically in the medical context, where "now-relative" data are essential to assess the current status of the patients. We propose a theoretically grounded and application-independent relational approach to cope with now-relative data (which can be paired, e.g., with different decision support systems) overcoming such limitations. We propose a new temporal relational representation, which is the first relational model coping with the temporal indeterminacy intrinsic in now-relative data. We also propose new temporal algebraic operators to query them, supporting the distinction between possible and necessary time, and Allen's temporal relations between data. We exemplify the impact of our approach, and study the theoretical and computational properties of the new representation and algebra. Copyright © 2018 Elsevier B.V. All rights reserved.

  7. Scale-Independent Relational Query Processing

    Science.gov (United States)

    2013-10-04

    source options are also available, including Postgresql, MySQL , and SQLite. These mod- ern relational databases are generally very complex software systems...and Their Application to Data Stream Management. IGI Global, 2010. [68] George Reese. Database Programming with JDBC and Java , Second Edition. Ed. by

  8. Web page sorting algorithm based on query keyword distance relation

    Science.gov (United States)

    Yang, Han; Cui, Hong Gang; Tang, Hao

    2017-08-01

    In order to optimize the problem of page sorting, according to the search keywords in the web page in the relationship between the characteristics of the proposed query keywords clustering ideas. And it is converted into the degree of aggregation of the search keywords in the web page. Based on the PageRank algorithm, the clustering degree factor of the query keyword is added to make it possible to participate in the quantitative calculation. This paper proposes an improved algorithm for PageRank based on the distance relation between search keywords. The experimental results show the feasibility and effectiveness of the method.

  9. Keyword Query Expansion Paradigm Based on Recommendation and Interpretation in Relational Databases

    Directory of Open Access Journals (Sweden)

    Yingqi Wang

    2017-01-01

    Full Text Available Due to the ambiguity and impreciseness of keyword query in relational databases, the research on keyword query expansion has attracted wide attention. Existing query expansion methods expose users’ query intention to a certain extent, but most of them cannot balance the precision and recall. To address this problem, a novel two-step query expansion approach is proposed based on query recommendation and query interpretation. First, a probabilistic recommendation algorithm is put forward by constructing a term similarity matrix and Viterbi model. Second, by using the translation algorithm of triples and construction algorithm of query subgraphs, query keywords are translated to query subgraphs with structural and semantic information. Finally, experimental results on a real-world dataset demonstrate the effectiveness and rationality of the proposed method.

  10. Age-related differences in the accuracy of web query-based predictions of influenza-like illness.

    Directory of Open Access Journals (Sweden)

    Alexander Domnich

    Full Text Available Web queries are now widely used for modeling, nowcasting and forecasting influenza-like illness (ILI. However, given that ILI attack rates vary significantly across ages, in terms of both magnitude and timing, little is known about whether the association between ILI morbidity and ILI-related queries is comparable across different age-groups. The present study aimed to investigate features of the association between ILI morbidity and ILI-related query volume from the perspective of age.Since Google Flu Trends is unavailable in Italy, Google Trends was used to identify entry terms that correlated highly with official ILI surveillance data. All-age and age-class-specific modeling was performed by means of linear models with generalized least-square estimation. Hold-out validation was used to quantify prediction accuracy. For purposes of comparison, predictions generated by exponential smoothing were computed.Five search terms showed high correlation coefficients of > .6. In comparison with exponential smoothing, the all-age query-based model correctly predicted the peak time and yielded a higher correlation coefficient with observed ILI morbidity (.978 vs. .929. However, query-based prediction of ILI morbidity was associated with a greater error. Age-class-specific query-based models varied significantly in terms of prediction accuracy. In the 0-4 and 25-44-year age-groups, these did well and outperformed exponential smoothing predictions; in the 15-24 and ≥ 65-year age-classes, however, the query-based models were inaccurate and highly overestimated peak height. In all but one age-class, peak timing predicted by the query-based models coincided with observed timing.The accuracy of web query-based models in predicting ILI morbidity rates could differ among ages. Greater age-specific detail may be useful in flu query-based studies in order to account for age-specific features of the epidemiology of ILI.

  11. Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation.

    Science.gov (United States)

    Huang, Chung-Chi; Lu, Zhiyong

    2016-01-01

    Identifying relevant papers from the literature is a common task in biocuration. Most current biomedical literature search systems primarily rely on matching user keywords. Semantic search, on the other hand, seeks to improve search accuracy by understanding the entities and contextual relations in user keywords. However, past research has mostly focused on semantically identifying biological entities (e.g. chemicals, diseases and genes) with little effort on discovering semantic relations. In this work, we aim to discover biomedical semantic relations in PubMed queries in an automated and unsupervised fashion. Specifically, we focus on extracting and understanding the contextual information (or context patterns) that is used by PubMed users to represent semantic relations between entities such as 'CHEMICAL-1 compared to CHEMICAL-2' With the advances in automatic named entity recognition, we first tag entities in PubMed queries and then use tagged entities as knowledge to recognize pattern semantics. More specifically, we transform PubMed queries into context patterns involving participating entities, which are subsequently projected to latent topics via latent semantic analysis (LSA) to avoid the data sparseness and specificity issues. Finally, we mine semantically similar contextual patterns or semantic relations based on LSA topic distributions. Our two separate evaluation experiments of chemical-chemical (CC) and chemical-disease (CD) relations show that the proposed approach significantly outperforms a baseline method, which simply measures pattern semantics by similarity in participating entities. The highest performance achieved by our approach is nearly 0.9 and 0.85 respectively for the CC and CD task when compared against the ground truth in terms of normalized discounted cumulative gain (nDCG), a standard measure of ranking quality. These results suggest that our approach can effectively identify and return related semantic patterns in a ranked order

  12. Usage of the Jess Engine, Rules and Ontology to Query a Relational Database

    Science.gov (United States)

    Bak, Jaroslaw; Jedrzejek, Czeslaw; Falkowski, Maciej

    We present a prototypical implementation of a library tool, the Semantic Data Library (SDL), which integrates the Jess (Java Expert System Shell) engine, rules and ontology to query a relational database. The tool extends functionalities of previous OWL2Jess with SWRL implementations and takes full advantage of the Jess engine, by separating forward and backward reasoning. The optimization of integration of all these technologies is an advancement over previous tools. We discuss the complexity of the query algorithm. As a demonstration of capability of the SDL library, we execute queries using crime ontology which is being developed in the Polish PPBW project.

  13. An algorithm to transform natural language into SQL queries for relational databases

    Directory of Open Access Journals (Sweden)

    Garima Singh

    2016-09-01

    Full Text Available Intelligent interface, to enhance efficient interactions between user and databases, is the need of the database applications. Databases must be intelligent enough to make the accessibility faster. However, not every user familiar with the Structured Query Language (SQL queries as they may not aware of structure of the database and they thus require to learn SQL. So, non-expert users need a system to interact with relational databases in their natural language such as English. For this, Database Management System (DBMS must have an ability to understand Natural Language (NL. In this research, an intelligent interface is developed using semantic matching technique which translates natural language query to SQL using set of production rules and data dictionary. The data dictionary consists of semantics sets for relations and attributes. A series of steps like lower case conversion, tokenization, speech tagging, database element and SQL element extraction is used to convert Natural Language Query (NLQ to SQL Query. The transformed query is executed and the results are obtained by the user. Intelligent Interface is the need of database applications to enhance efficient interaction between user and DBMS.

  14. Generalized query-based active learning to identify differentially methylated regions in DNA.

    Science.gov (United States)

    Haque, Md Muksitul; Holder, Lawrence B; Skinner, Michael K; Cook, Diane J

    2013-01-01

    Active learning is a supervised learning technique that reduces the number of examples required for building a successful classifier, because it can choose the data it learns from. This technique holds promise for many biological domains in which classified examples are expensive and time-consuming to obtain. Most traditional active learning methods ask very specific queries to the Oracle (e.g., a human expert) to label an unlabeled example. The example may consist of numerous features, many of which are irrelevant. Removing such features will create a shorter query with only relevant features, and it will be easier for the Oracle to answer. We propose a generalized query-based active learning (GQAL) approach that constructs generalized queries based on multiple instances. By constructing appropriately generalized queries, we can achieve higher accuracy compared to traditional active learning methods. We apply our active learning method to find differentially DNA methylated regions (DMRs). DMRs are DNA locations in the genome that are known to be involved in tissue differentiation, epigenetic regulation, and disease. We also apply our method on 13 other data sets and show that our method is better than another popular active learning technique.

  15. CellMiner: a relational database and query tool for the NCI-60 cancer cell lines

    Directory of Open Access Journals (Sweden)

    Reinhold William C

    2009-06-01

    Full Text Available Abstract Background Advances in the high-throughput omic technologies have made it possible to profile cells in a large number of ways at the DNA, RNA, protein, chromosomal, functional, and pharmacological levels. A persistent problem is that some classes of molecular data are labeled with gene identifiers, others with transcript or protein identifiers, and still others with chromosomal locations. What has lagged behind is the ability to integrate the resulting data to uncover complex relationships and patterns. Those issues are reflected in full form by molecular profile data on the panel of 60 diverse human cancer cell lines (the NCI-60 used since 1990 by the U.S. National Cancer Institute to screen compounds for anticancer activity. To our knowledge, CellMiner is the first online database resource for integration of the diverse molecular types of NCI-60 and related meta data. Description CellMiner enables scientists to perform advanced querying of molecular information on NCI-60 (and additional types through a single web interface. CellMiner is a freely available tool that organizes and stores raw and normalized data that represent multiple types of molecular characterizations at the DNA, RNA, protein, and pharmacological levels. Annotations for each project, along with associated metadata on the samples and datasets, are stored in a MySQL database and linked to the molecular profile data. Data can be queried and downloaded along with comprehensive information on experimental and analytic methods for each data set. A Data Intersection tool allows selection of a list of genes (proteins in common between two or more data sets and outputs the data for those genes (proteins in the respective sets. In addition to its role as an integrative resource for the NCI-60, the CellMiner package also serves as a shell for incorporation of molecular profile data on other cell or tissue sample types. Conclusion CellMiner is a relational database tool for

  16. Relationship Between Time Consumption and Quality of Responses to Drug-related Queries

    DEFF Research Database (Denmark)

    Amundstuen Reppe, Linda; Lydersen, Stian; Schjøtt, Jan

    2016-01-01

    in score, –0.05 per hour of work; 95% CI, –0.08 to –0.01; P = 0.005). No such associations were found for the internal experts’ assessment. Implications To our knowledge, this is the first study of the association between time consumption and quality of responses to drug-related queries in DICs......Purpose The aims of this study were to assess the quality of responses produced by drug information centers (DICs) in Scandinavia, and to study the association between time consumption processing queries and the quality of the responses. Methods We posed six identical drug-related queries to seven...... DICs in Scandinavia, and the time consumption required for processing them was estimated. Clinical pharmacologists (internal experts) and general practitioners (external experts) reviewed responses individually. We used mixed model linear regression analyses to study the associations between time...

  17. Relative expressive power of navigational querying on graphs using transitive closure

    OpenAIRE

    Surinx, Dimitri; Fletcher, George H. L.; Gyssens, Marc; Leinders, Dirk; Van den Bussche, Jan; Van Gucht, Dirk; Vansummeren, Stijn; Wu, Yuqing

    2015-01-01

    Motivated by both established and new applications, we study navigational query languages for graphs (binary relations). The simplest language has only the two operators union and composition, together with the identity relation. We make more powerful languages by adding any of the following operators: intersection; set difference; projection; coprojection; converse; transitive closure; and the diversity relation. All these operators map binary relations to binary relations. We compare the ex...

  18. Efficient Storage and Querying of Horizontal Tables Using a PIVOT Operation in Commercial Relational DBMSs

    Science.gov (United States)

    Shin, Sung-Hyun; Moon, Yang-Sae; Kim, Jinho; Kim, Sang-Wook

    In recent years, a horizontal table with a large number of attributes is widely used in OLAP or e-business applications to analyze multidimensional data efficiently. For efficient storing and querying of horizontal tables, recent works have tried to transform a horizontal table to a traditional vertical table. Existing works, however, have the drawback of not considering an optimized PIVOT operation provided (or to be provided) in recent commercial RDBMSs. In this paper we propose a formal approach that exploits the optimized PIVOT operation of commercial RDBMSs for storing and querying of horizontal tables. To achieve this goal, we first provide an overall framework that stores and queries a horizontal table using an equivalent vertical table. Under the proposed framework, we then formally define 1) a method that stores a horizontal table in an equivalent vertical table and 2) a PIVOT operation that converts a stored vertical table to an equivalent horizontal view. Next, we propose a novel method that transforms a user-specified query on horizontal tables to an equivalent PIVOT-included query on vertical tables. In particular, by providing transformation rules for all five elementary operations in relational algebra as theorems, we prove our method is theoretically applicable to commercial RDBMSs. Experimental results show that, compared with the earlier work, our method reduces storage space significantly and also improves average performance by several orders of magnitude. These results indicate that our method provides an excellent framework to maximize performance in handling horizontal tables by exploiting the optimized PIVOT operation in commercial RDBMSs.

  19. Querying clinical data in HL7 RIM based relational model with morph-RDB.

    Science.gov (United States)

    Priyatna, Freddy; Alonso-Calvo, Raul; Paraiso-Medina, Sergio; Corcho, Oscar

    2017-10-05

    Semantic interoperability is essential when carrying out post-genomic clinical trials where several institutions collaborate, since researchers and developers need to have an integrated view and access to heterogeneous data sources. One possible approach to accommodate this need is to use RDB2RDF systems that provide RDF datasets as the unified view. These RDF datasets may be materialized and stored in a triple store, or transformed into RDF in real time, as virtual RDF data sources. Our previous efforts involved materialized RDF datasets, hence losing data freshness. In this paper we present a solution that uses an ontology based on the HL7 v3 Reference Information Model and a set of R2RML mappings that relate this ontology to an underlying relational database implementation, and where morph-RDB is used to expose a virtual, non-materialized SPARQL endpoint over the data. By applying a set of optimization techniques on the SPARQL-to-SQL query translation algorithm, we can now issue SPARQL queries to the underlying relational data with generally acceptable performance.

  20. A high performance, ad-hoc, fuzzy query processing system for relational databases

    Science.gov (United States)

    Mansfield, William H., Jr.; Fleischman, Robert M.

    1992-01-01

    Database queries involving imprecise or fuzzy predicates are currently an evolving area of academic and industrial research. Such queries place severe stress on the indexing and I/O subsystems of conventional database environments since they involve the search of large numbers of records. The Datacycle architecture and research prototype is a database environment that uses filtering technology to perform an efficient, exhaustive search of an entire database. It has recently been modified to include fuzzy predicates in its query processing. The approach obviates the need for complex index structures, provides unlimited query throughput, permits the use of ad-hoc fuzzy membership functions, and provides a deterministic response time largely independent of query complexity and load. This paper describes the Datacycle prototype implementation of fuzzy queries and some recent performance results.

  1. Query deforestation

    OpenAIRE

    Grust, Torsten; Scholl, Marc H.

    1998-01-01

    The construction of a declarative query engine for a DBMS includes the challenge of compiling algebraic queries into efficient execution plans that can be run on top of the persistent storage. This work pursues the goal of employing foldr-build deforestation for the derivation of efficient streaming programs - programs that do not allocate intermediate data structures to perform their task - from algebraic (combinator) query plans. The query engine is based on the insertion representation of ...

  2. Superfund Query

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Superfund Query allows users to retrieve data from the Comprehensive Environmental Response, Compensation, and Liability Information System (CERCLIS) database.

  3. [Limiting a Medline/PubMed query to the "best" articles using the JCR relative impact factor].

    Science.gov (United States)

    Avillach, P; Kerdelhué, G; Devos, P; Maisonneuve, H; Darmoni, S J

    2014-12-01

    Medline/PubMed is the most frequently used medical bibliographic research database. The aim of this study was to propose a new generic method to limit any Medline/PubMed query based on the relative impact factor and the A & B categories of the SIGAPS score. The entire PubMed corpus was used for the feasibility study, then ten frequent diseases in terms of PubMed indexing and the citations of four Nobel prize winners. The relative impact factor (RIF) was calculated by medical specialty defined in Journal Citation Reports. The two queries, which included all the journals in category A (or A OR B), were added to any Medline/PubMed query as a central point of the feasibility study. Limitation using the SIGAPS category A was larger than the when using the Core Clinical Journals (CCJ): 15.65% of PubMed corpus vs 8.64% for CCJ. The response time of this limit applied to the entire PubMed corpus was less than two seconds. For five diseases out of ten, limiting the citations with the RIF was more effective than with the CCJ. For the four Nobel prize winners, limiting the citations with the RIF was more effective than the CCJ. The feasibility study to apply a new filter based on the relative impact factor on any Medline/PubMed query was positive. Copyright © 2014 Elsevier Masson SAS. All rights reserved.

  4. AmbientDB: relational query processing in a P2P network

    NARCIS (Netherlands)

    P.A. Boncz (Peter); C. Treijtel

    2003-01-01

    textabstractA new generation of applications running on a network of nodes, that share data on an ad-hoc basis, will benefit from data management services including powerful querying facilities. In this paper, we introduce the goals, assumptions and architecture of AmbientDB, a new peer-to-peer

  5. AmbientDB : relational query processing in a P2P network

    NARCIS (Netherlands)

    P.A. Boncz (Peter); C. Treijtel

    2003-01-01

    textabstractA new generation of applications running on a network of nodes, that share data on an ad-hoc basis, will benefit from data management services including powerful querying facilities. In this paper, we introduce the goals, assumptions and architecture of AmbientDB, a new peer-to-peer

  6. Comparing relational model transformation technologies: implementing Query/View/Transformation with Triple Graph Grammars

    DEFF Research Database (Denmark)

    Greenyer, Joel; Kindler, Ekkart

    2010-01-01

    and for model-based software engineering approaches in general. QVT (Query/View/Transformation) is the transformation technology recently proposed for this purpose by the OMG. TGGs (Triple Graph Grammars) are another transformation technology proposed in the mid-nineties, used for example in the FUJABA CASE...

  7. Query responses

    Directory of Open Access Journals (Sweden)

    Paweł Łupkowski

    2017-05-01

    Full Text Available In this article we consider the phenomenon of answering a query with a query. Although such answers are common, no large scale, corpus-based characterization exists, with the exception of clarification requests. After briefly reviewing different theoretical approaches on this subject, we present a corpus study of query responses in the British National Corpus and develop a taxonomy for query responses. We point at a variety of response categories that have not been formalized in previous dialogue work, particularly those relevant to adversarial interaction. We show that different response categories have significantly different rates of subsequent answer provision. We provide a formal analysis of the response categories in the framework of KoS.

  8. Query optimization over crowdsourced data

    KAUST Repository

    Park, Hyunjung; Widom, Jennifer

    2013-01-01

    Deco is a comprehensive system for answering declarative queries posed over stored relational data together with data obtained on-demand from the crowd. In this paper we describe Deco's cost-based query optimizer, building on Deco's data model

  9. Fingerprinting Keywords in Search Queries over Tor

    Directory of Open Access Journals (Sweden)

    Oh Se Eun

    2017-10-01

    Full Text Available Search engine queries contain a great deal of private and potentially compromising information about users. One technique to prevent search engines from identifying the source of a query, and Internet service providers (ISPs from identifying the contents of queries is to query the search engine over an anonymous network such as Tor.

  10. Knowledge Query Language (KQL)

    Science.gov (United States)

    2016-02-12

    described as a sparse, distributed multidimensional sorted map. Unlike a relational database , BigTable has no multicolumn primary keys or constraints. The...in query languages such as SQL. Figure 3. Address expression-based querying. Each circled step in Figure 3 is described below. Datastore/ Database ...implementation we describe in later sections stores the instance of registry ontology in JSON files. 7 Throughout the rest of this report, we use the

  11. Query optimization over crowdsourced data

    KAUST Repository

    Park, Hyunjung

    2013-08-26

    Deco is a comprehensive system for answering declarative queries posed over stored relational data together with data obtained on-demand from the crowd. In this paper we describe Deco\\'s cost-based query optimizer, building on Deco\\'s data model, query language, and query execution engine presented earlier. Deco\\'s objective in query optimization is to find the best query plan to answer a query, in terms of estimated monetary cost. Deco\\'s query semantics and plan execution strategies require several fundamental changes to traditional query optimization. Novel techniques incorporated into Deco\\'s query optimizer include a cost model distinguishing between "free" existing data versus paid new data, a cardinality estimation algorithm coping with changes to the database state during query execution, and a plan enumeration algorithm maximizing reuse of common subplans in a setting that makes reuse challenging. We experimentally evaluate Deco\\'s query optimizer, focusing on the accuracy of cost estimation and the efficiency of plan enumeration.

  12. Multi-weighted tree based query optimization method for parallel relational database systems%基于多重加权树的并行关系数据库系统的查询优化方法

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    The author investigates the query optimization problem for parallel relational databases. A multi-weighted tree based query optimization method is proposed. The method consists of a multi-weighted tree based parallel query plan model, a cost model for parallel qury plans and a query optimizer. The parallel query plan model is the first one to model all basic relational operations, all three types of parallelism of query execution, processor and memory allocation to operations, memory allocation to the buffers between operations in pipelines and data redistribution among processors.The cost model takes the waiting time of the operations in pipelining execution into consideration and is computable in a bottom-up fashion. The query optimizer addresses the query optimization problem in the context of Select-Project-Join queries that are widely used in commercial DBMSs. Several heuristics determining the processor allocation to operations are derived and used in the query optimizer. The query optimizer is aware of memory resources in order to generate good-quality plans. It includes the heuristics for determining the memory allocation to operations and buffers between operations in pipelines so that the memory resourse is fully exploit. In addition, multiple algorithms for implementing join operations are consided in the query optimizer. The query optimizer can make an optimal choice of join algorithm for each join operation in a query. The proposed query optimization method has been used in a prototype parallel database management system designed and implemented by the author.

  13. A method to implement fine-grained access control for personal health records through standard relational database queries.

    Science.gov (United States)

    Sujansky, Walter V; Faus, Sam A; Stone, Ethan; Brennan, Patricia Flatley

    2010-10-01

    Online personal health records (PHRs) enable patients to access, manage, and share certain of their own health information electronically. This capability creates the need for precise access-controls mechanisms that restrict the sharing of data to that intended by the patient. The authors describe the design and implementation of an access-control mechanism for PHR repositories that is modeled on the eXtensible Access Control Markup Language (XACML) standard, but intended to reduce the cognitive and computational complexity of XACML. The authors implemented the mechanism entirely in a relational database system using ANSI-standard SQL statements. Based on a set of access-control rules encoded as relational table rows, the mechanism determines via a single SQL query whether a user who accesses patient data from a specific application is authorized to perform a requested operation on a specified data object. Testing of this query on a moderately large database has demonstrated execution times consistently below 100ms. The authors include the details of the implementation, including algorithms, examples, and a test database as Supplementary materials. Copyright © 2010 Elsevier Inc. All rights reserved.

  14. Clever generation of rich SPARQL queries from annotated relational schema: application to Semantic Web Service creation for biological databases.

    Science.gov (United States)

    Wollbrett, Julien; Larmande, Pierre; de Lamotte, Frédéric; Ruiz, Manuel

    2013-04-15

    In recent years, a large amount of "-omics" data have been produced. However, these data are stored in many different species-specific databases that are managed by different institutes and laboratories. Biologists often need to find and assemble data from disparate sources to perform certain analyses. Searching for these data and assembling them is a time-consuming task. The Semantic Web helps to facilitate interoperability across databases. A common approach involves the development of wrapper systems that map a relational database schema onto existing domain ontologies. However, few attempts have been made to automate the creation of such wrappers. We developed a framework, named BioSemantic, for the creation of Semantic Web Services that are applicable to relational biological databases. This framework makes use of both Semantic Web and Web Services technologies and can be divided into two main parts: (i) the generation and semi-automatic annotation of an RDF view; and (ii) the automatic generation of SPARQL queries and their integration into Semantic Web Services backbones. We have used our framework to integrate genomic data from different plant databases. BioSemantic is a framework that was designed to speed integration of relational databases. We present how it can be used to speed the development of Semantic Web Services for existing relational biological databases. Currently, it creates and annotates RDF views that enable the automatic generation of SPARQL queries. Web Services are also created and deployed automatically, and the semantic annotations of our Web Services are added automatically using SAWSDL attributes. BioSemantic is downloadable at http://southgreen.cirad.fr/?q=content/Biosemantic.

  15. Clever generation of rich SPARQL queries from annotated relational schema: application to Semantic Web Service creation for biological databases

    Science.gov (United States)

    2013-01-01

    Background In recent years, a large amount of “-omics” data have been produced. However, these data are stored in many different species-specific databases that are managed by different institutes and laboratories. Biologists often need to find and assemble data from disparate sources to perform certain analyses. Searching for these data and assembling them is a time-consuming task. The Semantic Web helps to facilitate interoperability across databases. A common approach involves the development of wrapper systems that map a relational database schema onto existing domain ontologies. However, few attempts have been made to automate the creation of such wrappers. Results We developed a framework, named BioSemantic, for the creation of Semantic Web Services that are applicable to relational biological databases. This framework makes use of both Semantic Web and Web Services technologies and can be divided into two main parts: (i) the generation and semi-automatic annotation of an RDF view; and (ii) the automatic generation of SPARQL queries and their integration into Semantic Web Services backbones. We have used our framework to integrate genomic data from different plant databases. Conclusions BioSemantic is a framework that was designed to speed integration of relational databases. We present how it can be used to speed the development of Semantic Web Services for existing relational biological databases. Currently, it creates and annotates RDF views that enable the automatic generation of SPARQL queries. Web Services are also created and deployed automatically, and the semantic annotations of our Web Services are added automatically using SAWSDL attributes. BioSemantic is downloadable at http://southgreen.cirad.fr/?q=content/Biosemantic. PMID:23586394

  16. CellMiner: a relational database and query tool for the NCI-60 cancer cell lines

    OpenAIRE

    Shankavaram, Uma T; Varma, Sudhir; Kane, David; Sunshine, Margot; Chary, Krishna K; Reinhold, William C; Pommier, Yves; Weinstein, John N

    2009-01-01

    Abstract Background Advances in the high-throughput omic technologies have made it possible to profile cells in a large number of ways at the DNA, RNA, protein, chromosomal, functional, and pharmacological levels. A persistent problem is that some classes of molecular data are labeled with gene identifiers, others with transcript or protein identifiers, and still others with chromosomal locations. What has lagged behind is the ability to integrate the resulting data to uncover complex relatio...

  17. Integrating query of relational and textual data in clinical databases: a case study.

    Science.gov (United States)

    Fisk, John M; Mutalik, Pradeep; Levin, Forrest W; Erdos, Joseph; Taylor, Caroline; Nadkarni, Prakash

    2003-01-01

    The authors designed and implemented a clinical data mart composed of an integrated information retrieval (IR) and relational database management system (RDBMS). Using commodity software, which supports interactive, attribute-centric text and relational searches, the mart houses 2.8 million documents that span a five-year period and supports basic IR features such as Boolean searches, stemming, and proximity and fuzzy searching. Results are relevance-ranked using either "total documents per patient" or "report type weighting." Non-curated medical text has a significant degree of malformation with respect to spelling and punctuation, which creates difficulties for text indexing and searching. Presently, the IR facilities of RDBMS packages lack the features necessary to handle such malformed text adequately. A robust IR+RDBMS system can be developed, but it requires integrating RDBMSs with third-party IR software. RDBMS vendors need to make their IR offerings more accessible to non-programmers.

  18. The CMS DBS query language

    International Nuclear Information System (INIS)

    Kuznetsov, Valentin; Riley, Daniel; Afaq, Anzar; Sekhri, Vijay; Guo Yuyi; Lueking, Lee

    2010-01-01

    The CMS experiment has implemented a flexible and powerful system enabling users to find data within the CMS physics data catalog. The Dataset Bookkeeping Service (DBS) comprises a database and the services used to store and access metadata related to CMS physics data. To this, we have added a generalized query system in addition to the existing web and programmatic interfaces to the DBS. This query system is based on a query language that hides the complexity of the underlying database structure by discovering the join conditions between database tables. This provides a way of querying the system that is simple and straightforward for CMS data managers and physicists to use without requiring knowledge of the database tables or keys. The DBS Query Language uses the ANTLR tool to build the input query parser and tokenizer, followed by a query builder that uses a graph representation of the DBS schema to construct the SQL query sent to underlying database. We will describe the design of the query system, provide details of the language components and overview of how this component fits into the overall data discovery system architecture.

  19. Optimizing Temporal Queries: Efficient Handling of Duplicates

    DEFF Research Database (Denmark)

    Toman, David; Bowman, Ivan Thomas

    2001-01-01

    , these query languages are implemented by translating temporal queries into standard relational queries. However, the compiled queries are often quite cumbersome and expensive to execute even using state-of-the- art relational products. This paper presents an optimization technique that produces more efficient...... translated SQL queries by taking into account the properties of the encoding used for temporal attributes. For concreteness, this translation technique is presented in the context of SQL/TP; however, these techniques are also applicable to other temporal query languages....

  20. Identifying genetic relatives without compromising privacy.

    Science.gov (United States)

    He, Dan; Furlotte, Nicholas A; Hormozdiari, Farhad; Joo, Jong Wha J; Wadia, Akshay; Ostrovsky, Rafail; Sahai, Amit; Eskin, Eleazar

    2014-04-01

    The development of high-throughput genomic technologies has impacted many areas of genetic research. While many applications of these technologies focus on the discovery of genes involved in disease from population samples, applications of genomic technologies to an individual's genome or personal genomics have recently gained much interest. One such application is the identification of relatives from genetic data. In this application, genetic information from a set of individuals is collected in a database, and each pair of individuals is compared in order to identify genetic relatives. An inherent issue that arises in the identification of relatives is privacy. In this article, we propose a method for identifying genetic relatives without compromising privacy by taking advantage of novel cryptographic techniques customized for secure and private comparison of genetic information. We demonstrate the utility of these techniques by allowing a pair of individuals to discover whether or not they are related without compromising their genetic information or revealing it to a third party. The idea is that individuals only share enough special-purpose cryptographically protected information with each other to identify whether or not they are relatives, but not enough to expose any information about their genomes. We show in HapMap and 1000 Genomes data that our method can recover first- and second-order genetic relationships and, through simulations, show that our method can identify relationships as distant as third cousins while preserving privacy.

  1. Manchester visual query language

    Science.gov (United States)

    Oakley, John P.; Davis, Darryl N.; Shann, Richard T.

    1993-04-01

    We report a database language for visual retrieval which allows queries on image feature information which has been computed and stored along with images. The language is novel in that it provides facilities for dealing with feature data which has actually been obtained from image analysis. Each line in the Manchester Visual Query Language (MVQL) takes a set of objects as input and produces another, usually smaller, set as output. The MVQL constructs are mainly based on proven operators from the field of digital image analysis. An example is the Hough-group operator which takes as input a specification for the objects to be grouped, a specification for the relevant Hough space, and a definition of the voting rule. The output is a ranked list of high scoring bins. The query could be directed towards one particular image or an entire image database, in the latter case the bins in the output list would in general be associated with different images. We have implemented MVQL in two layers. The command interpreter is a Lisp program which maps each MVQL line to a sequence of commands which are used to control a specialized database engine. The latter is a hybrid graph/relational system which provides low-level support for inheritance and schema evolution. In the paper we outline the language and provide examples of useful queries. We also describe our solution to the engineering problems associated with the implementation of MVQL.

  2. Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea.

    Science.gov (United States)

    Woo, Hyekyung; Cho, Youngtae; Shim, Eunyoung; Lee, Jong-Koo; Lee, Chang-Gun; Kim, Seong Hwan

    2016-07-04

    As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; Psearch queries to enhance influenza surveillance in South Korea. In addition, an approach for query selection using social media data seems ideal for supporting influenza surveillance based on search query data.

  3. Identifying modular relations in complex brain networks

    DEFF Research Database (Denmark)

    Andersen, Kasper Winther; Mørup, Morten; Siebner, Hartwig

    2012-01-01

    We evaluate the infinite relational model (IRM) against two simpler alternative nonparametric Bayesian models for identifying structures in multi subject brain networks. The models are evaluated for their ability to predict new data and infer reproducible structures. Prediction and reproducibility...... and obtains comparable reproducibility and predictability. For resting state functional magnetic resonance imaging data from 30 healthy controls the IRM model is also superior to the two simpler alternatives, suggesting that brain networks indeed exhibit universal complex relational structure...

  4. Identifying Adverse Drug Events by Relational Learning.

    Science.gov (United States)

    Page, David; Costa, Vítor Santos; Natarajan, Sriraam; Barnard, Aubrey; Peissig, Peggy; Caldwell, Michael

    2012-07-01

    The pharmaceutical industry, consumer protection groups, users of medications and government oversight agencies are all strongly interested in identifying adverse reactions to drugs. While a clinical trial of a drug may use only a thousand patients, once a drug is released on the market it may be taken by millions of patients. As a result, in many cases adverse drug events (ADEs) are observed in the broader population that were not identified during clinical trials. Therefore, there is a need for continued, post-marketing surveillance of drugs to identify previously-unanticipated ADEs. This paper casts this problem as a reverse machine learning task , related to relational subgroup discovery and provides an initial evaluation of this approach based on experiments with an actual EMR/EHR and known adverse drug events.

  5. External phenome analysis enables a rational federated query strategy to detect changing rates of treatment-related complications associated with multiple myeloma.

    Science.gov (United States)

    Warner, Jeremy L; Alterovitz, Gil; Bodio, Kelly; Joyce, Robin M

    2013-01-01

    Electronic health records (EHRs) are increasingly useful for health services research. For relatively uncommon conditions, such as multiple myeloma (MM) and its treatment-related complications, a combination of multiple EHR sources is essential for such research. The Shared Health Research Information Network (SHRINE) enables queries for aggregate results across participating institutions. Development of a rational search strategy in SHRINE may be augmented through analysis of pre-existing databases. We developed a SHRINE query for likely non-infectious treatment-related complications of MM, based upon an analysis of the Multiparameter Intelligent Monitoring in Intensive Care (MIMIC II) database. Using this query strategy, we found that the rate of likely treatment-related complications significantly increased from 2001 to 2007, by an average of 6% a year (p=0.01), across the participating SHRINE institutions. This finding is in keeping with increasingly aggressive strategies in the treatment of MM. This proof of concept demonstrates that a staged approach to federated queries, using external EHR data, can yield potentially clinically meaningful results.

  6. Approximate dictionary queries

    DEFF Research Database (Denmark)

    Brodal, Gerth Stølting; Gasieniec, Leszek

    1996-01-01

    Given a set of n binary strings of length m each. We consider the problem of answering d-queries. Given a binary query string of length m, a d-query is to report if there exists a string in the set within Hamming distance d of . We present a data structure of size O(nm) supporting 1-queries in ti...

  7. Multi-Dimensional Path Queries

    DEFF Research Database (Denmark)

    Bækgaard, Lars

    1998-01-01

    to create nested path structures. We present an SQL-like query language that is based on path expressions and we show how to use it to express multi-dimensional path queries that are suited for advanced data analysis in decision support environments like data warehousing environments......We present the path-relationship model that supports multi-dimensional data modeling and querying. A path-relationship database is composed of sets of paths and sets of relationships. A path is a sequence of related elements (atoms, paths, and sets of paths). A relationship is a binary path...

  8. Truth Space Method for Caching Database Queries

    Directory of Open Access Journals (Sweden)

    S. V. Mosin

    2015-01-01

    Full Text Available We propose a new method of client-side data caching for relational databases with a central server and distant clients. Data are loaded into the client cache based on queries executed on the server. Every query has the corresponding DB table – the result of the query execution. These queries have a special form called "universal relational query" based on three fundamental Relational Algebra operations: selection, projection and natural join. We have to mention that such a form is the closest one to the natural language and the majority of database search queries can be expressed in this way. Besides, this form allows us to analyze query correctness by checking lossless join property. A subsequent query may be executed in a client’s local cache if we can determine that the query result is entirely contained in the cache. For this we compare truth spaces of the logical restrictions in a new user’s query and the results of the queries execution in the cache. Such a comparison can be performed analytically , without need in additional Database queries. This method may be used to define lacking data in the cache and execute the query on the server only for these data. To do this the analytical approach is also used, what distinguishes our paper from the existing technologies. We propose four theorems for testing the required conditions. The first and the third theorems conditions allow us to define the existence of required data in cache. The second and the fourth theorems state conditions to execute queries with cache only. The problem of cache data actualizations is not discussed in this paper. However, it can be solved by cataloging queries on the server and their serving by triggers in background mode. The article is published in the author’s wording.

  9. Learning via Query Synthesis

    KAUST Repository

    Alabdulmohsin, Ibrahim

    2017-01-01

    Active learning is a subfield of machine learning that has been successfully used in many applications. One of the main branches of active learning is query synthe- sis, where the learning agent constructs artificial queries from scratch in order

  10. Incremental Query Rewriting with Resolution

    Science.gov (United States)

    Riazanov, Alexandre; Aragão, Marcelo A. T.

    We address the problem of semantic querying of relational databases (RDB) modulo knowledge bases using very expressive knowledge representation formalisms, such as full first-order logic or its various fragments. We propose to use a resolution-based first-order logic (FOL) reasoner for computing schematic answers to deductive queries, with the subsequent translation of these schematic answers to SQL queries which are evaluated using a conventional relational DBMS. We call our method incremental query rewriting, because an original semantic query is rewritten into a (potentially infinite) series of SQL queries. In this chapter, we outline the main idea of our technique - using abstractions of databases and constrained clauses for deriving schematic answers, and provide completeness and soundness proofs to justify the applicability of this technique to the case of resolution for FOL without equality. The proposed method can be directly used with regular RDBs, including legacy databases. Moreover, we propose it as a potential basis for an efficient Web-scale semantic search technology.

  11. Learning semantic query suggestions

    NARCIS (Netherlands)

    Meij, E.; Bron, M.; Hollink, L.; Huurnink, B.; de Rijke, M.

    2009-01-01

    An important application of semantic web technology is recognizing human-defined concepts in text. Query transformation is a strategy often used in search engines to derive queries that are able to return more useful search results than the original query and most popular search engines provide

  12. Recommending Multidimensional Queries

    Science.gov (United States)

    Giacometti, Arnaud; Marcel, Patrick; Negre, Elsa

    Interactive analysis of datacube, in which a user navigates a cube by launching a sequence of queries is often tedious since the user may have no idea of what the forthcoming query should be in his current analysis. To better support this process we propose in this paper to apply a Collaborative Work approach that leverages former explorations of the cube to recommend OLAP queries. The system that we have developed adapts Approximate String Matching, a technique popular in Information Retrieval, to match the current analysis with the former explorations and help suggesting a query to the user. Our approach has been implemented with the open source Mondrian OLAP server to recommend MDX queries and we have carried out some preliminary experiments that show its efficiency for generating effective query recommendations.

  13. Unemployment Insurance Query (UIQ)

    Data.gov (United States)

    Social Security Administration — The Unemployment Insurance Query (UIQ) provides State Unemployment Insurance agencies real-time online access to SSA data. This includes SSN verification and Title...

  14. Querying Natural Logic Knowledge Bases

    DEFF Research Database (Denmark)

    Andreasen, Troels; Bulskov, Henrik; Jensen, Per Anker

    2017-01-01

    This paper describes the principles of a system applying natural logic as a knowledge base language. Natural logics are regimented fragments of natural language employing high level inference rules. We advocate the use of natural logic for knowledge bases dealing with querying of classes...... in ontologies and class-relationships such as are common in life-science descriptions. The paper adopts a version of natural logic with recursive restrictive clauses such as relative clauses and adnominal prepositional phrases. It includes passive as well as active voice sentences. We outline a prototype...... for partial translation of natural language into natural logic, featuring further querying and conceptual path finding in natural logic knowledge bases....

  15. Development of a user friendly interface for database querying in natural language by using concepts and means related to artificial intelligence

    International Nuclear Information System (INIS)

    Pujo, Pascal

    1989-01-01

    This research thesis reports the development of a user-friendly interface in natural language for querying a relational database. The developed system differs from usual approaches for its integrated architecture as the relational model management is totally controlled by the interface. The author first addresses the way to store data in order to make them accessible through an interface in natural language, and more precisely to store data with an organisation which would result in the less possible constraints in query formulation. The author then briefly presents techniques related to automatic processing in natural language, and discusses the implications of a better user-friendliness and for error processing. The next part reports the study of the developed interface: selection of data processing tools, interface development, data management at the interface level, information input by the user. The last chapter proposes an overview of possible evolutions for the interface: use of deductive functionalities, use of an extensional base and of an intentional base to deduce facts from knowledge stores in the extensional base, and handling of complex objects [fr

  16. Optimizing Temporal Queries

    DEFF Research Database (Denmark)

    Toman, David; Bowman, Ivan Thomas

    2003-01-01

    Recent research in the area of temporal databases has proposed a number of query languages that vary in their expressive power and the semantics they provide to users. These query languages represent a spectrum of solutions to the tension between clean semantics and efficient evaluation. Often, t...

  17. Mastering jQuery

    CERN Document Server

    Libby, Alex

    2015-01-01

    If you are a developer who is already familiar with using jQuery and wants to push your skill set further, then this book is for you. The book assumes an intermediate knowledge level of jQuery, JavaScript, HTML5, and CSS.

  18. Query recommendation for children

    NARCIS (Netherlands)

    Duarte Torres, Sergio; Hiemstra, Djoerd; Weber, Ingmar; Serdyukov, Pavel

    2012-01-01

    One of the biggest problems that children experience while searching the web occurs during the query formulation process. Children have been found to struggle formulating queries based on keywords given their limited vocabulary and their difficulty to choose the right keywords. In this work we

  19. Collective spatial keyword querying

    DEFF Research Database (Denmark)

    Cao, Xin; Cong, Gao; Jensen, Christian S.

    2011-01-01

    With the proliferation of geo-positioning and geo-tagging, spatial web objects that possess both a geographical location and a textual description are gaining in prevalence, and spatial keyword queries that exploit both location and textual description are gaining in prominence. However, the quer......With the proliferation of geo-positioning and geo-tagging, spatial web objects that possess both a geographical location and a textual description are gaining in prevalence, and spatial keyword queries that exploit both location and textual description are gaining in prominence. However......, the queries studied so far generally focus on finding individual objects that each satisfy a query rather than finding groups of objects where the objects in a group collectively satisfy a query. We define the problem of retrieving a group of spatial web objects such that the group's keywords cover the query......'s keywords and such that objects are nearest to the query location and have the lowest inter-object distances. Specifically, we study two variants of this problem, both of which are NP-complete. We devise exact solutions as well as approximate solutions with provable approximation bounds to the problems. We...

  20. Range-clustering queries

    NARCIS (Netherlands)

    Abrahamsen, M.; de Berg, M.T.; Buchin, K.A.; Mehr, M.; Mehrabi, A.D.

    2017-01-01

    In a geometric k -clustering problem the goal is to partition a set of points in R d into k subsets such that a certain cost function of the clustering is minimized. We present data structures for orthogonal range-clustering queries on a point set S : given a query box Q and an integer k>2 , compute

  1. Querying Workflow Logs

    Directory of Open Access Journals (Sweden)

    Yan Tang

    2018-01-01

    Full Text Available A business process or workflow is an assembly of tasks that accomplishes a business goal. Business process management is the study of the design, configuration/implementation, enactment and monitoring, analysis, and re-design of workflows. The traditional methodology for the re-design and improvement of workflows relies on the well-known sequence of extract, transform, and load (ETL, data/process warehousing, and online analytical processing (OLAP tools. In this paper, we study the ad hoc queryiny of process enactments for (data-centric business processes, bypassing the traditional methodology for more flexibility in querying. We develop an algebraic query language based on “incident patterns” with four operators inspired from Business Process Model and Notation (BPMN representation, allowing the user to formulate ad hoc queries directly over workflow logs. A formal semantics of this query language, a preliminary query evaluation algorithm, and a group of elementary properties of the operators are provided.

  2. Indexing for summary queries

    DEFF Research Database (Denmark)

    Yi, Ke; Wang, Lu; Wei, Zhewei

    2014-01-01

    ), of a particular attribute of these records. Aggregation queries are especially useful in business intelligence and data analysis applications where users are interested not in the actual records, but some statistics of them. They can also be executed much more efficiently than reporting queries, by embedding...... returned by reporting queries. In this article, we design indexing techniques that allow for extracting a statistical summary of all the records in the query. The summaries we support include frequent items, quantiles, and various sketches, all of which are of central importance in massive data analysis....... Our indexes require linear space and extract a summary with the optimal or near-optimal query cost. We illustrate the efficiency and usefulness of our designs through extensive experiments and a system demonstration....

  3. SPARQL Query Re-writing Using Partonomy Based Transformation Rules

    Science.gov (United States)

    Jain, Prateek; Yeh, Peter Z.; Verma, Kunal; Henson, Cory A.; Sheth, Amit P.

    Often the information present in a spatial knowledge base is represented at a different level of granularity and abstraction than the query constraints. For querying ontology's containing spatial information, the precise relationships between spatial entities has to be specified in the basic graph pattern of SPARQL query which can result in long and complex queries. We present a novel approach to help users intuitively write SPARQL queries to query spatial data, rather than relying on knowledge of the ontology structure. Our framework re-writes queries, using transformation rules to exploit part-whole relations between geographical entities to address the mismatches between query constraints and knowledge base. Our experiments were performed on completely third party datasets and queries. Evaluations were performed on Geonames dataset using questions from National Geographic Bee serialized into SPARQL and British Administrative Geography Ontology using questions from a popular trivia website. These experiments demonstrate high precision in retrieval of results and ease in writing queries.

  4. A general approach to query flattening

    NARCIS (Netherlands)

    van Ruth, J.

    The translation of queries from complex data models to simpler data models is a recurring theme in the construction of efficient data management systems. In this paper we propose a general framework to guide the translation from data models with nested types to a flat relational model (query

  5. jQuery cookbook

    CERN Document Server

    2010-01-01

    jQuery simplifies building rich, interactive web frontends. Getting started with this JavaScript library is easy, but it can take years to fully realize its breadth and depth; this cookbook shortens the learning curve considerably. With these recipes, you'll learn patterns and practices from 19 leading developers who use jQuery for everything from integrating simple components into websites and applications to developing complex, high-performance user interfaces. Ideal for newcomers and JavaScript veterans alike, jQuery Cookbook starts with the basics and then moves to practical use cases w

  6. KoralQuery -- A General Corpus Query Protocol

    DEFF Research Database (Denmark)

    Bingel, Joachim; Diewald, Nils

    2015-01-01

    . In this paper, we present KoralQuery, a JSON-LD based general corpus query protocol, aiming to be independent of particular QLs, tasks and corpus formats. In addition to describing the system of types and operations that KoralQuery is built on, we exemplify the representation of corpus queries in the serialized...

  7. Query containment in entity SQL

    OpenAIRE

    Rull Fort, Guillem; Bernstein, Philip A.; Garcia dos Santos, Ivo; Katsis, Yannis; Melnik, Sergey; Teniente López, Ernest

    2013-01-01

    We describe a software architecture we have developed for a constructive containment checker of Entity SQL queries defined over extended ER schemas expressed in Microsoft's Entity Data Model. Our application of interest is compilation of object-to-relational mappings for Microsoft's ADO.NET Entity Framework, which has been shipping since 2007. The supported language includes several features which have been individually addressed in the past but, to the best of our knowledge, they have not be...

  8. jQuery Mobile

    CERN Document Server

    Reid, Jon

    2011-01-01

    Native apps have distinct advantages, but the future belongs to mobile web apps that function on a broad range of smartphones and tablets. Get started with jQuery Mobile, the touch-optimized framework for creating apps that look and behave consistently across many devices. This concise book provides HTML5, CSS3, and JavaScript code examples, screen shots, and step-by-step guidance to help you build a complete working app with jQuery Mobile. If you're already familiar with the jQuery JavaScript library, you can use your existing skills to build cross-platform mobile web apps right now. This b

  9. Code query by example

    Science.gov (United States)

    Vaucouleur, Sebastien

    2011-02-01

    We introduce code query by example for customisation of evolvable software products in general and of enterprise resource planning systems (ERPs) in particular. The concept is based on an initial empirical study on practices around ERP systems. We motivate our design choices based on those empirical results, and we show how the proposed solution helps with respect to the infamous upgrade problem: the conflict between the need for customisation and the need for upgrade of ERP systems. We further show how code query by example can be used as a form of lightweight static analysis, to detect automatically potential defects in large software products. Code query by example as a form of lightweight static analysis is particularly interesting in the context of ERP systems: it is often the case that programmers working in this field are not computer science specialists but more of domain experts. Hence, they require a simple language to express custom rules.

  10. User perspectives on query difficulty

    DEFF Research Database (Denmark)

    Lioma, Christina; Larsen, Birger; Schütze, Hinrich

    2011-01-01

    be difficult for the system to address? (2) Are users aware of specific features in their query (e.g., domain-specificity, vagueness) that may render their query difficult for an IR system to address? A study of 420 queries from a Web search engine query log that are pre-categorised as easy, medium, hard...

  11. A Methodolgy, Based on Analytical Modeling, for the Design of Parallel and Distributed Architectures for Relational Database Query Processors.

    Science.gov (United States)

    1987-12-01

    Application Programs Intelligent Disk Database Controller Manangement System Operating System Host .1’ I% Figure 2. Intelligent Disk Controller Application...8217. /- - • Database Control -% Manangement System Disk Data Controller Application Programs Operating Host I"" Figure 5. Processor-Per- Head data. Therefore, the...However. these ad- ditional properties have been proven in classical set and relation theory [75]. These additional properties are described here

  12. Flexible Query Answering Systems

    DEFF Research Database (Denmark)

    This book constitutes the refereed proceedings of the 10th International Conference on Flexible Query Answering Systems, FQAS 2013, held in Granada, Spain, in September 2013. The 59 full papers included in this volume were carefully reviewed and selected from numerous submissions. The papers...... are organized in a general session train and a parallel special session track. The general session train covers the following topics: querying-answering systems; semantic technology; patterns and classification; personalization and recommender systems; searching and ranking; and Web and human...

  13. Learning jQuery

    CERN Document Server

    Chaffer, Jonathan

    2013-01-01

    Step through each of the core concepts of the jQuery library, building an overall picture of its capabilities. Once you have thoroughly covered the basics, the book returns to each concept to cover more advanced examples and techniques.This book is for web designers who want to create interactive elements for their designs, and for developers who want to create the best user interface for their web applications. Basic JavaScript programming and knowledge of HTML and CSS is required. No knowledge of jQuery is assumed, nor is experience with any other JavaScript libraries.

  14. Spatial Keyword Query Processing

    DEFF Research Database (Denmark)

    Chen, Lisi; Jensen, Christian S.; Wu, Dingming

    2013-01-01

    Geo-textual indices play an important role in spatial keyword query- ing. The existing geo-textual indices have not been compared sys- tematically under the same experimental framework. This makes it difficult to determine which indexing technique best supports specific functionality. We provide...... an all-around survey of 12 state- of-the-art geo-textual indices. We propose a benchmark that en- ables the comparison of the spatial keyword query performance. We also report on the findings obtained when applying the bench- mark to the indices, thus uncovering new insights that may guide index...

  15. Clean Air Markets - Allowances Query Wizard

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Allowances Query Wizard is part of a suite of Clean Air Markets-related tools that are accessible at http://camddataandmaps.epa.gov/gdm/index.cfm. The Allowances...

  16. Clean Air Markets - Compliance Query Wizard

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Compliance Query Wizard is part of a suite of Clean Air Markets-related tools that are accessible at http://ampd.epa.gov/ampd/. The Compliance module provides...

  17. Querying temporal databases via OWL 2 QL

    CSIR Research Space (South Africa)

    Klarman, S

    2014-06-01

    Full Text Available SQL:2011, the most recently adopted version of the SQL query language, has unprecedentedly standardized the representation of temporal data in relational databases. Following the successful paradigm of ontology-based data access, we develop a...

  18. QUERY RESPONSE TIME COMPARISON NOSQLDB MONGODB WITH SQLDB ORACLE

    Directory of Open Access Journals (Sweden)

    Humasak T. A. Simanjuntak

    2015-01-01

    Full Text Available Penyimpanan data saat ini terdapat dua jenis yakni relational database dan non-relational database. Kedua jenis DBMS (Database Managemnet System tersebut berbeda dalam berbagai aspek seperti per-formansi eksekusi query, scalability, reliability maupun struktur penyimpanan data. Kajian ini memiliki tujuan untuk mengetahui perbandingan performansi DBMS antara Oracle sebagai jenis relational data-base dan MongoDB sebagai jenis non-relational database dalam mengolah data terstruktur. Eksperimen dilakukan untuk mengetahui perbandingan performansi kedua DBMS tersebut untuk operasi insert, select, update dan delete dengan menggunakan query sederhana maupun kompleks pada database Northwind. Untuk mencapai tujuan eksperimen, 18 query yang terdiri dari 2 insert query, 10 select query, 2 update query dan 2 delete query dieksekusi. Query dieksekusi melalui sebuah aplikasi .Net yang dibangun sebagai perantara antara user dengan basis data. Eksperimen dilakukan pada tabel dengan atau tanpa relasi pada Oracle dan embedded atau bukan embedded dokumen pada MongoDB. Response time untuk setiap eksekusi query dibandingkan dengan menggunakan metode statistik. Eksperimen menunjukkan response time query untuk proses select, insert, dan update pada MongoDB lebih cepatdaripada Oracle. MongoDB lebih cepat 64.8 % untuk select query;MongoDB lebihcepat 72.8 % untuk insert query dan MongoDB lebih cepat 33.9 % untuk update query. Pada delete query, Oracle lebih cepat 96.8 % daripada MongoDB untuk table yang berelasi, tetapi MongoDB lebih cepat 83.8 % daripada Oracle untuk table yang tidak memiliki relasi.Untuk query kompleks dengan Map Reduce pada MongoDB lebih lambat 97.6% daripada kompleks query dengan aggregate function pada Oracle.

  19. Spatial Keyword Querying

    DEFF Research Database (Denmark)

    Cao, Xin; Chen, Lisi; Cong, Gao

    2012-01-01

    The web is increasingly being used by mobile users. In addition, it is increasingly becoming possible to accurately geo-position mobile users and web content. This development gives prominence to spatial web data management. Specifically, a spatial keyword query takes a user location and user-sup...... different kinds of functionality as well as the ideas underlying their definition....

  20. Approximating terminological queries

    NARCIS (Netherlands)

    Stuckenschmidt, Heiner; Van Harmelen, Frank

    2002-01-01

    Current proposals for languages to encode terminological knowledge in intelligent systems support logical reasoning for answering user queries about objects and classes. An application of these languages on the World Wide Web, however, is hampered by the limitations of logical reasoning in terms

  1. Flexible Query Answering Systems

    DEFF Research Database (Denmark)

    This book constitutes the refereed proceedings of the 12th International Conference on Flexible Query Answering Systems, FQAS 2017, held in London, UK, in June 2017. The 21 full papers presented in this book together with 4 short papers were carefully reviewed and selected from 43 submissions...

  2. Cumulative query method for influenza surveillance using search engine data.

    Science.gov (United States)

    Seo, Dong-Woo; Jo, Min-Woo; Sohn, Chang Hwan; Shin, Soo-Yong; Lee, JaeHo; Yu, Maengsoo; Kim, Won Young; Lim, Kyoung Soo; Lee, Sang-Il

    2014-12-16

    Internet search queries have become an important data source in syndromic surveillance system. However, there is currently no syndromic surveillance system using Internet search query data in South Korea. The objective of this study was to examine correlations between our cumulative query method and national influenza surveillance data. Our study was based on the local search engine, Daum (approximately 25% market share), and influenza-like illness (ILI) data from the Korea Centers for Disease Control and Prevention. A quota sampling survey was conducted with 200 participants to obtain popular queries. We divided the study period into two sets: Set 1 (the 2009/10 epidemiological year for development set 1 and 2010/11 for validation set 1) and Set 2 (2010/11 for development Set 2 and 2011/12 for validation Set 2). Pearson's correlation coefficients were calculated between the Daum data and the ILI data for the development set. We selected the combined queries for which the correlation coefficients were .7 or higher and listed them in descending order. Then, we created a cumulative query method n representing the number of cumulative combined queries in descending order of the correlation coefficient. In validation set 1, 13 cumulative query methods were applied, and 8 had higher correlation coefficients (min=.916, max=.943) than that of the highest single combined query. Further, 11 of 13 cumulative query methods had an r value of ≥.7, but 4 of 13 combined queries had an r value of ≥.7. In validation set 2, 8 of 15 cumulative query methods showed higher correlation coefficients (min=.975, max=.987) than that of the highest single combined query. All 15 cumulative query methods had an r value of ≥.7, but 6 of 15 combined queries had an r value of ≥.7. Cumulative query method showed relatively higher correlation with national influenza surveillance data than combined queries in the development and validation set.

  3. Learning via Query Synthesis

    KAUST Repository

    Alabdulmohsin, Ibrahim Mansour

    2017-05-07

    Active learning is a subfield of machine learning that has been successfully used in many applications. One of the main branches of active learning is query synthe- sis, where the learning agent constructs artificial queries from scratch in order to reveal sensitive information about the underlying decision boundary. It has found applications in areas, such as adversarial reverse engineering, automated science, and computational chemistry. Nevertheless, the existing literature on membership query synthesis has, generally, focused on finite concept classes or toy problems, with a limited extension to real-world applications. In this thesis, I develop two spectral algorithms for learning halfspaces via query synthesis. The first algorithm is a maximum-determinant convex optimization method while the second algorithm is a Markovian method that relies on Khachiyan’s classical update formulas for solving linear programs. The general theme of these methods is to construct an ellipsoidal approximation of the version space and to synthesize queries, afterward, via spectral decomposition. Moreover, I also describe how these algorithms can be extended to other settings as well, such as pool-based active learning. Having demonstrated that halfspaces can be learned quite efficiently via query synthesis, the second part of this thesis proposes strategies for mitigating the risk of reverse engineering in adversarial environments. One approach that can be used to render query synthesis algorithms ineffective is to implement a randomized response. In this thesis, I propose a semidefinite program (SDP) for learning a distribution of classifiers, subject to the constraint that any individual classifier picked at random from this distributions provides reliable predictions with a high probability. This algorithm is, then, justified both theoretically and empirically. A second approach is to use a non-parametric classification method, such as similarity-based classification. In this

  4. QUERY SUPPORT FOR GMZ

    Directory of Open Access Journals (Sweden)

    A. Khandelwal

    2017-07-01

    Full Text Available Generic text-based compression models are simple and fast but there are two issues that needs to be addressed. They cannot leverage the structure that exists in data to achieve better compression and there is an unnecessary decompression step before the user can actually use the data. To address these issues, we came up with GMZ, a lossless compression model aimed at achieving high compression ratios. The decision to design GMZ (Khandelwal and Rajan, 2017 exclusively for GML's Simple Features Profile (SFP seems fair because of the high use of SFP in WFS and that it facilitates high optimisation of the compression model. This is an extension of our work on GMZ. In a typical server-client model such as Web Feature Service, the server is the primary creator and provider of GML, and therefore, requires compression and query capabilities. On the other hand, the client is the primary consumer of GML, and therefore, requires decompression and visualisation capabilities. In the first part of our work, we demonstrated compression using a python script that can be plugged in a server architecture, and decompression and visualisation in a web browser using a Firefox addon. The focus of this work is to develop the already existing tools to provide query capability to server. Our model provides the ability to decompress individual features in isolation, which is an essential requirement for realising query in compressed state. We con - struct an R-Tree index for spatial data and a custom index for non-spatial data and store these in a separate index file to prevent alter - ing the compression model. This facilitates independent use of compressed GMZ file where index can be constructed when required. The focus of this work is the bounding-box or range query commonly used in webGIS with provision for other spatial and non-spatial queries. The decrement in compression ratios due to the new index file is in the range of 1–3 percent which is trivial considering

  5. Analysis of queries sent to PubMed at the point of care: Observation of search behaviour in a medical teaching hospital

    Science.gov (United States)

    Hoogendam, Arjen; Stalenhoef, Anton FH; Robbé, Pieter F de Vries; Overbeke, A John PM

    2008-01-01

    Background The use of PubMed to answer daily medical care questions is limited because it is challenging to retrieve a small set of relevant articles and time is restricted. Knowing what aspects of queries are likely to retrieve relevant articles can increase the effectiveness of PubMed searches. The objectives of our study were to identify queries that are likely to retrieve relevant articles by relating PubMed search techniques and tools to the number of articles retrieved and the selection of articles for further reading. Methods This was a prospective observational study of queries regarding patient-related problems sent to PubMed by residents and internists in internal medicine working in an Academic Medical Centre. We analyzed queries, search results, query tools (Mesh, Limits, wildcards, operators), selection of abstract and full-text for further reading, using a portal that mimics PubMed. Results PubMed was used to solve 1121 patient-related problems, resulting in 3205 distinct queries. Abstracts were viewed in 999 (31%) of these queries, and in 126 (39%) of 321 queries using query tools. The average term count per query was 2.5. Abstracts were selected in more than 40% of queries using four or five terms, increasing to 63% if the use of four or five terms yielded 2–161 articles. Conclusion Queries sent to PubMed by physicians at our hospital during daily medical care contain fewer than three terms. Queries using four to five terms, retrieving less than 161 article titles, are most likely to result in abstract viewing. PubMed search tools are used infrequently by our population and are less effective than the use of four or five terms. Methods to facilitate the formulation of precise queries, using more relevant terms, should be the focus of education and research. PMID:18816391

  6. Algebraic Optimization of Recursive Database Queries

    DEFF Research Database (Denmark)

    Hansen, Michael Reichhardt

    1988-01-01

    Queries are expressed by relational algebra expressions including a fixpoint operation. A condition is presented under which a natural join commutes with a fixpoint operation. This condition is a simple check of attribute sets of sub-expressions of the query. The work may be considered a generali......Queries are expressed by relational algebra expressions including a fixpoint operation. A condition is presented under which a natural join commutes with a fixpoint operation. This condition is a simple check of attribute sets of sub-expressions of the query. The work may be considered...... a generalization of Aho and Ullman, (1979). The result is interpreted in function free logic database terms as a transformation of the recursively defined predicate involving: (a) elimination of an argument, and (b) propagation of selections (instantiations) to the extensionally defined predicates. A collection...

  7. Evaluating Trajectory Queries over Imprecise Location Data

    DEFF Research Database (Denmark)

    Xie, Scott, Xike; Cheng, Reynold; Yiu, Man Lung

    2012-01-01

    Trajectory queries, which retrieve nearby objects for every point of a given route, can be used to identify alerts of potential threats along a vessel route, or monitor the adjacent rescuers to a travel path. However, the locations of these objects (e.g., threats, succours) may not be precisely...... obtained due to hardware limitations of measuring devices, as well as the constantly-changing nature of the external environment. Ignoring data uncertainty can render low query quality, and cause undesirable consequences such as missing alerts of threats and poor response time in rescue operations. Also......, the query is quite time-consuming, since all the points on the trajectory are considered. In this paper, we study how to efficiently evaluate trajectory queries over imprecise location data, by proposing a new concept called the u-bisector. In general, the u-bisector is an extension of bisector to handle...

  8. Google BigQuery analytics

    CERN Document Server

    Tigani, Jordan

    2014-01-01

    How to effectively use BigQuery, avoid common mistakes, and execute sophisticated queries against large datasets Google BigQuery Analytics is the perfect guide for business and data analysts who want the latest tips on running complex queries and writing code to communicate with the BigQuery API. The book uses real-world examples to demonstrate current best practices and techniques, and also explains and demonstrates streaming ingestion, transformation via Hadoop in Google Compute engine, AppEngine datastore integration, and using GViz with Tableau to generate charts of query results. In addit

  9. Querying on Federated Sensor Networks

    Directory of Open Access Journals (Sweden)

    Zuhal Can

    2016-09-01

    Full Text Available A Federated Sensor Network (FSN is a network of geographically distributed Wireless Sensor Networks (WSNs called islands. For querying on an FSN, we introduce the Layered Federated Sensor Network (L-FSN Protocol. For layered management, L-FSN provides communication among islands by its inter-island querying protocol by which a query packet routing path is determined according to some path selection policies. L-FSN allows autonomous management of each island by island-specific intra-island querying protocols that can be selected according to island properties. We evaluate the applicability of L-FSN and compare the L-FSN protocol with various querying protocols running on the flat federation model. Flat federation is a method to federate islands by running a single querying protocol on an entire FSN without distinguishing communication among and within islands. For flat federation, we select a querying protocol from geometrical, hierarchical cluster-based, hash-based, and tree-based WSN querying protocol categories. We found that a layered federation of islands by L-FSN increases the querying performance with respect to energy-efficiency, query resolving distance, and query resolving latency. Moreover, L-FSN’s flexibility of choosing intra-island querying protocols regarding the island size brings advantages on energy-efficiency and query resolving latency.

  10. Conceptual querying through ontologies

    DEFF Research Database (Denmark)

    Andreasen, Troels; Bulskov, Henrik

    2009-01-01

    is motivated by an obvious need for users to survey huge volumes of objects in query answers. An ontology formalism and a special notion of-instantiated ontology" are introduced. The latter is a structure reflecting the content in the document collection in that; it is a restriction of a general world......We present here ail approach to conceptual querying where the aim is, given a collection of textual database objects or documents, to target an abstraction of the entire database content in terms of the concepts appearing in documents, rather than the documents in the collection. The approach...... knowledge ontology to the concepts instantiated in the collection. The notion of ontology-based similarity is briefly described, language constructs for direct navigation and retrieval of concepts in the ontology are discussed and approaches to conceptual summarization are presented....

  11. Heuristic query optimization for query multiple table and multiple clausa on mobile finance application

    Science.gov (United States)

    Indrayana, I. N. E.; P, N. M. Wirasyanti D.; Sudiartha, I. KG

    2018-01-01

    Mobile application allow many users to access data from the application without being limited to space, space and time. Over time the data population of this application will increase. Data access time will cause problems if the data record has reached tens of thousands to millions of records.The objective of this research is to maintain the performance of data execution for large data records. One effort to maintain data access time performance is to apply query optimization method. The optimization used in this research is query heuristic optimization method. The built application is a mobile-based financial application using MySQL database with stored procedure therein. This application is used by more than one business entity in one database, thus enabling rapid data growth. In this stored procedure there is an optimized query using heuristic method. Query optimization is performed on a “Select” query that involves more than one table with multiple clausa. Evaluation is done by calculating the average access time using optimized and unoptimized queries. Access time calculation is also performed on the increase of population data in the database. The evaluation results shown the time of data execution with query heuristic optimization relatively faster than data execution time without using query optimization.

  12. Mastering jQuery mobile

    CERN Document Server

    Lambert, Chip

    2015-01-01

    You've started down the path of jQuery Mobile, now begin mastering some of jQuery Mobile's higher level topics. Go beyond jQuery Mobile's documentation and master one of the hottest mobile technologies out there. Previous JavaScript and PHP experience can help you get the most out of this book.

  13. Instant Cassandra query language

    CERN Document Server

    Singh, Amresh

    2013-01-01

    Get to grips with a new technology, understand what it is and what it can do for you, and then get to work with the most important features and tasks. It's an Instant Starter guide.Instant Cassandra Query Language is great for those who are working with Cassandra databases and who want to either learn CQL to check data from the console or build serious applications using CQL. If you're looking for something that helps you get started with CQL in record time and you hate the idea of learning a new language syntax, then this book is for you.

  14. CUFID-query: accurate network querying through random walk based network flow estimation.

    Science.gov (United States)

    Jeong, Hyundoo; Qian, Xiaoning; Yoon, Byung-Jun

    2017-12-28

    Functional modules in biological networks consist of numerous biomolecules and their complicated interactions. Recent studies have shown that biomolecules in a functional module tend to have similar interaction patterns and that such modules are often conserved across biological networks of different species. As a result, such conserved functional modules can be identified through comparative analysis of biological networks. In this work, we propose a novel network querying algorithm based on the CUFID (Comparative network analysis Using the steady-state network Flow to IDentify orthologous proteins) framework combined with an efficient seed-and-extension approach. The proposed algorithm, CUFID-query, can accurately detect conserved functional modules as small subnetworks in the target network that are expected to perform similar functions to the given query functional module. The CUFID framework was recently developed for probabilistic pairwise global comparison of biological networks, and it has been applied to pairwise global network alignment, where the framework was shown to yield accurate network alignment results. In the proposed CUFID-query algorithm, we adopt the CUFID framework and extend it for local network alignment, specifically to solve network querying problems. First, in the seed selection phase, the proposed method utilizes the CUFID framework to compare the query and the target networks and to predict the probabilistic node-to-node correspondence between the networks. Next, the algorithm selects and greedily extends the seed in the target network by iteratively adding nodes that have frequent interactions with other nodes in the seed network, in a way that the conductance of the extended network is maximally reduced. Finally, CUFID-query removes irrelevant nodes from the querying results based on the personalized PageRank vector for the induced network that includes the fully extended network and its neighboring nodes. Through extensive

  15. Evaluation of Sub Query Performance in SQL Server

    Science.gov (United States)

    Oktavia, Tanty; Sujarwo, Surya

    2014-03-01

    The paper explores several sub query methods used in a query and their impact on the query performance. The study uses experimental approach to evaluate the performance of each sub query methods combined with indexing strategy. The sub query methods consist of in, exists, relational operator and relational operator combined with top operator. The experimental shows that using relational operator combined with indexing strategy in sub query has greater performance compared with using same method without indexing strategy and also other methods. In summary, for application that emphasized on the performance of retrieving data from database, it better to use relational operator combined with indexing strategy. This study is done on Microsoft SQL Server 2012.

  16. Executing Complexity-Increasing Queries in Relational (MySQL) and NoSQL (MongoDB and EXist) Size-Growing ISO/EN 13606 Standardized EHR Databases.

    Science.gov (United States)

    Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Castro, Antonio L; Moreno, Oscar; Pascual, Mario

    2018-03-19

    This research shows a protocol to assess the computational complexity of querying relational and non-relational (NoSQL (not only Structured Query Language)) standardized electronic health record (EHR) medical information database systems (DBMS). It uses a set of three doubling-sized databases, i.e. databases storing 5000, 10,000 and 20,000 realistic standardized EHR extracts, in three different database management systems (DBMS): relational MySQL object-relational mapping (ORM), document-based NoSQL MongoDB, and native extensible markup language (XML) NoSQL eXist. The average response times to six complexity-increasing queries were computed, and the results showed a linear behavior in the NoSQL cases. In the NoSQL field, MongoDB presents a much flatter linear slope than eXist. NoSQL systems may also be more appropriate to maintain standardized medical information systems due to the special nature of the updating policies of medical information, which should not affect the consistency and efficiency of the data stored in NoSQL databases. One limitation of this protocol is the lack of direct results of improved relational systems such as archetype relational mapping (ARM) with the same data. However, the interpolation of doubling-size database results to those presented in the literature and other published results suggests that NoSQL systems might be more appropriate in many specific scenarios and problems to be solved. For example, NoSQL may be appropriate for document-based tasks such as EHR extracts used in clinical practice, or edition and visualization, or situations where the aim is not only to query medical information, but also to restore the EHR in exactly its original form.

  17. Executing Complexity-Increasing Queries in Relational (MySQL) and NoSQL (MongoDB and EXist) Size-Growing ISO/EN 13606 Standardized EHR Databases

    Science.gov (United States)

    Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Castro, Antonio L; Moreno, Oscar; Pascual, Mario

    2018-01-01

    This research shows a protocol to assess the computational complexity of querying relational and non-relational (NoSQL (not only Structured Query Language)) standardized electronic health record (EHR) medical information database systems (DBMS). It uses a set of three doubling-sized databases, i.e. databases storing 5000, 10,000 and 20,000 realistic standardized EHR extracts, in three different database management systems (DBMS): relational MySQL object-relational mapping (ORM), document-based NoSQL MongoDB, and native extensible markup language (XML) NoSQL eXist. The average response times to six complexity-increasing queries were computed, and the results showed a linear behavior in the NoSQL cases. In the NoSQL field, MongoDB presents a much flatter linear slope than eXist. NoSQL systems may also be more appropriate to maintain standardized medical information systems due to the special nature of the updating policies of medical information, which should not affect the consistency and efficiency of the data stored in NoSQL databases. One limitation of this protocol is the lack of direct results of improved relational systems such as archetype relational mapping (ARM) with the same data. However, the interpolation of doubling-size database results to those presented in the literature and other published results suggests that NoSQL systems might be more appropriate in many specific scenarios and problems to be solved. For example, NoSQL may be appropriate for document-based tasks such as EHR extracts used in clinical practice, or edition and visualization, or situations where the aim is not only to query medical information, but also to restore the EHR in exactly its original form. PMID:29608174

  18. Moving Spatial Keyword Queries

    DEFF Research Database (Denmark)

    Wu, Dingming; Yiu, Man Lung; Jensen, Christian S.

    2013-01-01

    propose two algorithms for computing safe zones that guarantee correct results at any time and that aim to optimize the server-side computation as well as the communication between the server and the client. We exploit tight and conservative approximations of safe zones and aggressive computational space...... text data. State-of-the-art solutions for moving queries employ safe zones that guarantee the validity of reported results as long as the user remains within the safe zone associated with a result. However, existing safe-zone methods focus solely on spatial locations and ignore text relevancy. We...... pruning. We present techniques that aim to compute the next safe zone efficiently, and we present two types of conservative safe zones that aim to reduce the communication cost. Empirical studies with real data suggest that the proposals are efficient. To understand the effectiveness of the proposed safe...

  19. From Questions to Queries

    Directory of Open Access Journals (Sweden)

    M. Drlík

    2007-12-01

    Full Text Available The extension of (Internet databases forceseveryone to become more familiar with techniques of datastorage and retrieval because users’ success often dependson their ability to pose right questions and to be able tointerpret their answers. University programs pay moreattention to developing database programming skills than todata exploitation skills. To educate our students to become“database users”, the authors intensively exploit supportivetools simplifying the production of database elements astables, queries, forms, reports, web pages, and macros.Videosequences demonstrating “standard operations” forcompleting them have been prepared to enhance out-ofclassroomlearning. The use of SQL and other professionaltools is reduced to the cases when the wizards are unable togenerate the intended construct.

  20. Enabling Semantic Queries Against the Spatial Database

    Directory of Open Access Journals (Sweden)

    PENG, X.

    2012-02-01

    Full Text Available The spatial database based upon the object-relational database management system (ORDBMS has the merits of a clear data model, good operability and high query efficiency. That is why it has been widely used in spatial data organization and management. However, it cannot express the semantic relationships among geospatial objects, making the query results difficult to meet the user's requirement well. Therefore, this paper represents an attempt to combine the Semantic Web technology with the spatial database so as to make up for the traditional database's disadvantages. In this way, on the one hand, users can take advantages of ORDBMS to store and manage spatial data; on the other hand, if the spatial database is released in the form of Semantic Web, the users could describe a query more concisely with the cognitive pattern which is similar to that of daily life. As a consequence, this methodology enables the benefits of both Semantic Web and the object-relational database (ORDB available. The paper discusses systematically the semantic enriched spatial database's architecture, key technologies and implementation. Subsequently, we demonstrate the function of spatial semantic queries via a practical prototype system. The query results indicate that the method used in this study is feasible.

  1. SPARQL Assist language-neutral query composer

    Science.gov (United States)

    2012-01-01

    Background SPARQL query composition is difficult for the lay-person, and even the experienced bioinformatician in cases where the data model is unfamiliar. Moreover, established best-practices and internationalization concerns dictate that the identifiers for ontological terms should be opaque rather than human-readable, which further complicates the task of synthesizing queries manually. Results We present SPARQL Assist: a Web application that addresses these issues by providing context-sensitive type-ahead completion during SPARQL query construction. Ontological terms are suggested using their multi-lingual labels and descriptions, leveraging existing support for internationalization and language-neutrality. Moreover, the system utilizes the semantics embedded in ontologies, and within the query itself, to help prioritize the most likely suggestions. Conclusions To ensure success, the Semantic Web must be easily available to all users, regardless of locale, training, or preferred language. By enhancing support for internationalization, and moreover by simplifying the manual construction of SPARQL queries through the use of controlled-natural-language interfaces, we believe we have made some early steps towards simplifying access to Semantic Web resources. PMID:22373327

  2. SPARQL assist language-neutral query composer.

    Science.gov (United States)

    McCarthy, Luke; Vandervalk, Ben; Wilkinson, Mark

    2012-01-25

    SPARQL query composition is difficult for the lay-person, and even the experienced bioinformatician in cases where the data model is unfamiliar. Moreover, established best-practices and internationalization concerns dictate that the identifiers for ontological terms should be opaque rather than human-readable, which further complicates the task of synthesizing queries manually. We present SPARQL Assist: a Web application that addresses these issues by providing context-sensitive type-ahead completion during SPARQL query construction. Ontological terms are suggested using their multi-lingual labels and descriptions, leveraging existing support for internationalization and language-neutrality. Moreover, the system utilizes the semantics embedded in ontologies, and within the query itself, to help prioritize the most likely suggestions. To ensure success, the Semantic Web must be easily available to all users, regardless of locale, training, or preferred language. By enhancing support for internationalization, and moreover by simplifying the manual construction of SPARQL queries through the use of controlled-natural-language interfaces, we believe we have made some early steps towards simplifying access to Semantic Web resources.

  3. Ranking Queries on Uncertain Data

    CERN Document Server

    Hua, Ming

    2011-01-01

    Uncertain data is inherent in many important applications, such as environmental surveillance, market analysis, and quantitative economics research. Due to the importance of those applications and rapidly increasing amounts of uncertain data collected and accumulated, analyzing large collections of uncertain data has become an important task. Ranking queries (also known as top-k queries) are often natural and useful in analyzing uncertain data. Ranking Queries on Uncertain Data discusses the motivations/applications, challenging problems, the fundamental principles, and the evaluation algorith

  4. Research Issues in Mobile Querying

    DEFF Research Database (Denmark)

    Breunig, M.; Jensen, Christian Søndergaard; Klein, M.

    2004-01-01

    This document reports on key aspects of the discussions conducted within the working group. In particular, the document aims to offer a structured and somewhat digested summary of the group's discussions. The document first offers concepts that enable characterization of "mobile queries" as well...... as the types of systems that enable such queries. It explores the notion of context in mobile queries. The document ends with a few observations, mainly regarding challenges....

  5. Optimizing queries in distributed systems

    Directory of Open Access Journals (Sweden)

    Ion LUNGU

    2006-01-01

    Full Text Available This research presents the main elements of query optimizations in distributed systems. First, data architecture according with system level architecture in a distributed environment is presented. Then the architecture of a distributed database management system (DDBMS is described on conceptual level followed by the presentation of the distributed query execution steps on these information systems. The research ends with presentation of some aspects of distributed database query optimization and strategies used for that.

  6. The role of economics in the QUERI program: QUERI Series.

    Science.gov (United States)

    Smith, Mark W; Barnett, Paul G

    2008-04-22

    The United States (U.S.) Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI) has implemented economic analyses in single-site and multi-site clinical trials. To date, no one has reviewed whether the QUERI Centers are taking an optimal approach to doing so. Consistent with the continuous learning culture of the QUERI Program, this paper provides such a reflection. We present a case study of QUERI as an example of how economic considerations can and should be integrated into implementation research within both single and multi-site studies. We review theoretical and applied cost research in implementation studies outside and within VA. We also present a critique of the use of economic research within the QUERI program. Economic evaluation is a key element of implementation research. QUERI has contributed many developments in the field of implementation but has only recently begun multi-site implementation trials across multiple regions within the national VA healthcare system. These trials are unusual in their emphasis on developing detailed costs of implementation, as well as in the use of business case analyses (budget impact analyses). Economics appears to play an important role in QUERI implementation studies, only after implementation has reached the stage of multi-site trials. Economic analysis could better inform the choice of which clinical best practices to implement and the choice of implementation interventions to employ. QUERI economics also would benefit from research on costing methods and development of widely accepted international standards for implementation economics.

  7. Semi-Supervised Learning to Identify UMLS Semantic Relations.

    Science.gov (United States)

    Luo, Yuan; Uzuner, Ozlem

    2014-01-01

    The UMLS Semantic Network is constructed by experts and requires periodic expert review to update. We propose and implement a semi-supervised approach for automatically identifying UMLS semantic relations from narrative text in PubMed. Our method analyzes biomedical narrative text to collect semantic entity pairs, and extracts multiple semantic, syntactic and orthographic features for the collected pairs. We experiment with seeded k-means clustering with various distance metrics. We create and annotate a ground truth corpus according to the top two levels of the UMLS semantic relation hierarchy. We evaluate our system on this corpus and characterize the learning curves of different clustering configuration. Using KL divergence consistently performs the best on the held-out test data. With full seeding, we obtain macro-averaged F-measures above 70% for clustering the top level UMLS relations (2-way), and above 50% for clustering the second level relations (7-way).

  8. Smart query answering for marine sensor data.

    Science.gov (United States)

    Shahriar, Md Sumon; de Souza, Paulo; Timms, Greg

    2011-01-01

    We review existing query answering systems for sensor data. We then propose an extended query answering approach termed smart query, specifically for marine sensor data. The smart query answering system integrates pattern queries and continuous queries. The proposed smart query system considers both streaming data and historical data from marine sensor networks. The smart query also uses query relaxation technique and semantics from domain knowledge as a recommender system. The proposed smart query benefits in building data and information systems for marine sensor networks.

  9. Smart Query Answering for Marine Sensor Data

    Directory of Open Access Journals (Sweden)

    Paulo de Souza

    2011-03-01

    Full Text Available We review existing query answering systems for sensor data. We then propose an extended query answering approach termed smart query, specifically for marine sensor data. The smart query answering system integrates pattern queries and continuous queries. The proposed smart query system considers both streaming data and historical data from marine sensor networks. The smart query also uses query relaxation technique and semantics from domain knowledge as a recommender system. The proposed smart query benefits in building data and information systems for marine sensor networks.

  10. Dataflow Query Execution in a Parallel Main-Memory Environment

    NARCIS (Netherlands)

    Wilschut, A.N.; Apers, Peter M.G.

    1991-01-01

    The performance and characteristics of the execution of various join-trees on a parallel DBMS are studied. The results are a step in the direction of the design of a query optimization strategy that is fit for parallel execution of complex queries. Among others, synchronization issues are identified

  11. Enhancing Recall in Semantic Querying

    DEFF Research Database (Denmark)

    Rouces, Jacobo

    2013-01-01

    lexically and structurally different, which we will introduce in the next section. As RDF graphs from different sources are expected to be linked, the modeling heterogeneities will make the federated graph become sparser and inconsistent. This is detrimental to the recall of SPARQL queries, as the query...

  12. jQuery Pocket Reference

    CERN Document Server

    Flanagan, David

    2010-01-01

    "As someone who uses jQuery on a regular basis, it was surprising to discover how much of the library I'm not using. This book is indispensable for anyone who is serious about using jQuery for non-trivial applications."-- Raffaele Cecco, longtime developer of video games, including Cybernoid, Exolon, and Stormlord jQuery is the "write less, do more" JavaScript library. Its powerful features and ease of use have made it the most popular client-side JavaScript framework for the Web. This book is jQuery's trusty companion: the definitive "read less, learn more" guide to the library. jQuery P

  13. jQuery UI cookbook

    CERN Document Server

    Boduch, Adam

    2013-01-01

    Filled with a practical collection of recipes, jQuery UI Cookbook is full of clear, step-by-step instructions that will help you harness the powerful UI framework in jQuery. Depending on your needs, you can dip in and out of the Cookbook and its recipes, or follow the book from start to finish.If you are a jQuery UI developer looking to improve your existing applications, extract ideas for your new application, or to better understand the overall widget architecture, then jQuery UI Cookbook is a must-have for you. The reader should at least have a rudimentary understanding of what jQuery UI is

  14. Instant jQuery selectors

    CERN Document Server

    De Rosa, Aurelio

    2013-01-01

    Filled with practical, step-by-step instructions and clear explanations for the most important and useful tasks. Instant jQuery Selectors follows a simple how-to format with recipes aimed at making you well versed with the wide range of selectors that jQuery has to offer through a myriad of examples.Instant jQuery Selectors is for web developers who want to delve into jQuery from its very starting point: selectors. Even if you're already familiar with the framework and its selectors, you could find several tips and tricks that you aren't aware of, especially about performance and how jQuery ac

  15. Location-Dependent Query Processing Under Soft Real-Time Constraints

    Directory of Open Access Journals (Sweden)

    Zoubir Mammeri

    2009-01-01

    Full Text Available In recent years, mobile devices and applications achieved an increasing development. In database field, this development required methods to consider new query types like location-dependent queries (i.e. the query results depend on the query issuer location. Although several researches addressed problems related to location-dependent query processing, a few works considered timing requirements that may be associated with queries (i.e., the query results must be delivered to mobile clients on time. The main objective of this paper is to propose a solution for location-dependent query processing under soft real-time constraints. Hence, we propose methods to take into account client location-dependency and to maximize the percentage of queries respecting their deadlines. We validate our proposal by implementing a prototype based on Oracle DBMS. Performance evaluation results show that the proposed solution optimizes the percentage of queries meeting their deadlines and the communication cost.

  16. Advanced SPARQL querying in small molecule databases.

    Science.gov (United States)

    Galgonek, Jakub; Hurt, Tomáš; Michlíková, Vendula; Onderka, Petr; Schwarz, Jan; Vondrášek, Jiří

    2016-01-01

    In recent years, the Resource Description Framework (RDF) and the SPARQL query language have become more widely used in the area of cheminformatics and bioinformatics databases. These technologies allow better interoperability of various data sources and powerful searching facilities. However, we identified several deficiencies that make usage of such RDF databases restrictive or challenging for common users. We extended a SPARQL engine to be able to use special procedures inside SPARQL queries. This allows the user to work with data that cannot be simply precomputed and thus cannot be directly stored in the database. We designed an algorithm that checks a query against data ontology to identify possible user errors. This greatly improves query debugging. We also introduced an approach to visualize retrieved data in a user-friendly way, based on templates describing visualizations of resource classes. To integrate all of our approaches, we developed a simple web application. Our system was implemented successfully, and we demonstrated its usability on the ChEBI database transformed into RDF form. To demonstrate procedure call functions, we employed compound similarity searching based on OrChem. The application is publicly available at https://bioinfo.uochb.cas.cz/projects/chemRDF.

  17. In-context query reformulation for failing SPARQL queries

    Science.gov (United States)

    Viswanathan, Amar; Michaelis, James R.; Cassidy, Taylor; de Mel, Geeth; Hendler, James

    2017-05-01

    Knowledge bases for decision support systems are growing increasingly complex, through continued advances in data ingest and management approaches. However, humans do not possess the cognitive capabilities to retain a bird's-eyeview of such knowledge bases, and may end up issuing unsatisfiable queries to such systems. This work focuses on the implementation of a query reformulation approach for graph-based knowledge bases, specifically designed to support the Resource Description Framework (RDF). The reformulation approach presented is instance-and schema-aware. Thus, in contrast to relaxation techniques found in the state-of-the-art, the presented approach produces in-context query reformulation.

  18. An Experimental Investigation of Complexity in Database Query Formulation Tasks

    Science.gov (United States)

    Casterella, Gretchen Irwin; Vijayasarathy, Leo

    2013-01-01

    Information Technology professionals and other knowledge workers rely on their ability to extract data from organizational databases to respond to business questions and support decision making. Structured query language (SQL) is the standard programming language for querying data in relational databases, and SQL skills are in high demand and are…

  19. The role of economics in the QUERI program: QUERI Series

    Directory of Open Access Journals (Sweden)

    Smith Mark W

    2008-04-01

    Full Text Available Abstract Background The United States (U.S. Department of Veterans Affairs (VA Quality Enhancement Research Initiative (QUERI has implemented economic analyses in single-site and multi-site clinical trials. To date, no one has reviewed whether the QUERI Centers are taking an optimal approach to doing so. Consistent with the continuous learning culture of the QUERI Program, this paper provides such a reflection. Methods We present a case study of QUERI as an example of how economic considerations can and should be integrated into implementation research within both single and multi-site studies. We review theoretical and applied cost research in implementation studies outside and within VA. We also present a critique of the use of economic research within the QUERI program. Results Economic evaluation is a key element of implementation research. QUERI has contributed many developments in the field of implementation but has only recently begun multi-site implementation trials across multiple regions within the national VA healthcare system. These trials are unusual in their emphasis on developing detailed costs of implementation, as well as in the use of business case analyses (budget impact analyses. Conclusion Economics appears to play an important role in QUERI implementation studies, only after implementation has reached the stage of multi-site trials. Economic analysis could better inform the choice of which clinical best practices to implement and the choice of implementation interventions to employ. QUERI economics also would benefit from research on costing methods and development of widely accepted international standards for implementation economics.

  20. Recommendation Sets and Choice Queries

    DEFF Research Database (Denmark)

    Viappiani, Paolo Renato; Boutilier, Craig

    2011-01-01

    Utility elicitation is an important component of many applications, such as decision support systems and recommender systems. Such systems query users about their preferences and offer recommendations based on the system's belief about the user's utility function. We analyze the connection between...... the problem of generating optimal recommendation sets and the problem of generating optimal choice queries, considering both Bayesian and regret-based elicitation. Our results show that, somewhat surprisingly, under very general circumstances, the optimal recommendation set coincides with the optimal query....

  1. Identifying job characteristics related to employed women's breastfeeding behaviors.

    Science.gov (United States)

    Spitzmueller, Christiane; Zhang, Jing; Thomas, Candice L; Wang, Zhuxi; Fisher, Gwenith G; Matthews, Russell A; Strathearn, Lane

    2018-05-14

    For employed mothers of infants, reconciliation of work demands and breastfeeding constitutes a significant challenge. The discontinuation of breastfeeding has the potential to result in negative outcomes for the mother (e.g., higher likelihood of obesity), her employer (e.g., increased absenteeism), and her infant (e.g., increased risk of infection). Given previous research findings identifying return to work as a major risk factor for breastfeeding cessation, we investigate what types of job characteristics relate to women's intentions to breastfeed shortly after giving birth and women's actual breastfeeding initiation and duration. Using job titles and job descriptors contained in a large Australian longitudinal cohort data set (N = 809), we coded job titles using the U.S. Department of Labor (DOL)'s Occupational Information Network (O*NET) database and extracted job characteristics. Hazardous working conditions and job autonomy were identified as significant determinants of women's breastfeeding intentions, their initiation of breastfeeding, and ultimately their breastfeeding continuation. Hence, we recommend that human resource professionals, managers, and public health initiatives provide breastfeeding-supportive resources to women who, based on their job characteristics, are at high risk to prematurely discontinue breastfeeding to ensure these mothers have equal opportunity to reap the benefits of breastfeeding. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  2. Querying XML Data with SPARQL

    Science.gov (United States)

    Bikakis, Nikos; Gioldasis, Nektarios; Tsinaraki, Chrisa; Christodoulakis, Stavros

    SPARQL is today the standard access language for Semantic Web data. In the recent years XML databases have also acquired industrial importance due to the widespread applicability of XML in the Web. In this paper we present a framework that bridges the heterogeneity gap and creates an interoperable environment where SPARQL queries are used to access XML databases. Our approach assumes that fairly generic mappings between ontology constructs and XML Schema constructs have been automatically derived or manually specified. The mappings are used to automatically translate SPARQL queries to semantically equivalent XQuery queries which are used to access the XML databases. We present the algorithms and the implementation of SPARQL2XQuery framework, which is used for answering SPARQL queries over XML databases.

  3. Robust Optimization of Database Queries

    Indian Academy of Sciences (India)

    JAYANT

    2011-07-06

    Jul 6, 2011 ... Based on first-order logic. ○ Edgar ... Cost-based Query Optimizer s choice of execution plan ... Determines the values of goods shipped between nations in a time period select ..... Born: 1881 Elected: 1934 Section: Medicine.

  4. Schedule Sales Query Raw Data

    Data.gov (United States)

    General Services Administration — Schedule Sales Query presents sales volume figures as reported to GSA by contractors. The reports are generated as quarterly reports for the current year and the...

  5. jQuery For Dummies

    CERN Document Server

    Beighley, Lynn

    2010-01-01

    Learn how jQuery can make your Web page or blog stand out from the crowd!. jQuery is free, open source software that allows you to extend and customize Joomla!, Drupal, AJAX, and WordPress via plug-ins. Assuming no previous programming experience, Lynn Beighley takes you through the basics of jQuery from the very start. You'll discover how the jQuery library separates itself from other JavaScript libraries through its ease of use, compactness, and friendliness if you're a beginner programmer. Written in the easy-to-understand style of the For Dummies brand, this book demonstrates how you can a

  6. Flexible Query Answering Systems 2006

    DEFF Research Database (Denmark)

    -computer interaction. The overall theme of the FQAS conferences is innovative query systems aimed at providing easy, flexible, and intuitive access to information. Such systems are intended to facilitate retrieval from information repositories such as databases, libraries, and the World-Wide Web. These repositories......This volume constitutes the proceedings of the Seventh International Conference on Flexible Query Answering Systems, FQAS 2006, held in Milan, Italy, on June 7--10, 2006. FQAS is the premier conference for researchers and practitioners concerned with the vital task of providing easy, flexible...... are typically equipped with standard query systems which are often inadequate, and the focus of FQAS is the development of query systems that are more expressive, informative, cooperative, and productive. These proceedings contain contributions from invited speakers and 53 original papers out of about 100...

  7. SkyQuery - A Prototype Distributed Query and Cross-Matching Web Service for the Virtual Observatory

    Science.gov (United States)

    Thakar, A. R.; Budavari, T.; Malik, T.; Szalay, A. S.; Fekete, G.; Nieto-Santisteban, M.; Haridas, V.; Gray, J.

    2002-12-01

    We have developed a prototype distributed query and cross-matching service for the VO community, called SkyQuery, which is implemented with hierarchichal Web Services. SkyQuery enables astronomers to run combined queries on existing distributed heterogeneous astronomy archives. SkyQuery provides a simple, user-friendly interface to run distributed queries over the federation of registered astronomical archives in the VO. The SkyQuery client connects to the portal Web Service, which farms the query out to the individual archives, which are also Web Services called SkyNodes. The cross-matching algorithm is run recursively on each SkyNode. Each archive is a relational DBMS with a HTM index for fast spatial lookups. The results of the distributed query are returned as an XML DataSet that is automatically rendered by the client. SkyQuery also returns the image cutout corresponding to the query result. SkyQuery finds not only matches between the various catalogs, but also dropouts - objects that exist in some of the catalogs but not in others. This is often as important as finding matches. We demonstrate the utility of SkyQuery with a brown-dwarf search between SDSS and 2MASS, and a search for radio-quiet quasars in SDSS, 2MASS and FIRST. The importance of a service like SkyQuery for the worldwide astronomical community cannot be overstated: data on the same objects in various archives is mapped in different wavelength ranges and looks very different due to different errors, instrument sensitivities and other peculiarities of each archive. Our cross-matching algorithm preforms a fuzzy spatial join across multiple catalogs. This type of cross-matching is currently often done by eye, one object at a time. A static cross-identification table for a set of archives would become obsolete by the time it was built - the exponential growth of astronomical data means that a dynamic cross-identification mechanism like SkyQuery is the only viable option. SkyQuery was funded by a

  8. Mining the SDSS SkyServer SQL queries log

    Science.gov (United States)

    Hirota, Vitor M.; Santos, Rafael; Raddick, Jordan; Thakar, Ani

    2016-05-01

    SkyServer, the Internet portal for the Sloan Digital Sky Survey (SDSS) astronomic catalog, provides a set of tools that allows data access for astronomers and scientific education. One of SkyServer data access interfaces allows users to enter ad-hoc SQL statements to query the catalog. SkyServer also presents some template queries that can be used as basis for more complex queries. This interface has logged over 330 million queries submitted since 2001. It is expected that analysis of this data can be used to investigate usage patterns, identify potential new classes of queries, find similar queries, etc. and to shed some light on how users interact with the Sloan Digital Sky Survey data and how scientists have adopted the new paradigm of e-Science, which could in turn lead to enhancements on the user interfaces and experience in general. In this paper we review some approaches to SQL query mining, apply the traditional techniques used in the literature and present lessons learned, namely, that the general text mining approach for feature extraction and clustering does not seem to be adequate for this type of data, and, most importantly, we find that this type of analysis can result in very different queries being clustered together.

  9. Effective Filtering of Query Results on Updated User Behavioral Profiles in Web Mining

    Directory of Open Access Journals (Sweden)

    S. Sadesh

    2015-01-01

    Full Text Available Web with tremendous volume of information retrieves result for user related queries. With the rapid growth of web page recommendation, results retrieved based on data mining techniques did not offer higher performance filtering rate because relationships between user profile and queries were not analyzed in an extensive manner. At the same time, existing user profile based prediction in web data mining is not exhaustive in producing personalized result rate. To improve the query result rate on dynamics of user behavior over time, Hamilton Filtered Regime Switching User Query Probability (HFRS-UQP framework is proposed. HFRS-UQP framework is split into two processes, where filtering and switching are carried out. The data mining based filtering in our research work uses the Hamilton Filtering framework to filter user result based on personalized information on automatic updated profiles through search engine. Maximized result is fetched, that is, filtered out with respect to user behavior profiles. The switching performs accurate filtering updated profiles using regime switching. The updating in profile change (i.e., switches regime in HFRS-UQP framework identifies the second- and higher-order association of query result on the updated profiles. Experiment is conducted on factors such as personalized information search retrieval rate, filtering efficiency, and precision ratio.

  10. Linking Health Records for Federated Query Processing

    Directory of Open Access Journals (Sweden)

    Dewri Rinku

    2016-07-01

    Full Text Available A federated query portal in an electronic health record infrastructure enables large epidemiology studies by combining data from geographically dispersed medical institutions. However, an individual’s health record has been found to be distributed across multiple carrier databases in local settings. Privacy regulations may prohibit a data source from revealing clear text identifiers, thereby making it non-trivial for a query aggregator to determine which records correspond to the same underlying individual. In this paper, we explore this problem of privately detecting and tracking the health records of an individual in a distributed infrastructure. We begin with a secure set intersection protocol based on commutative encryption, and show how to make it practical on comparison spaces as large as 1010 pairs. Using bigram matching, precomputed tables, and data parallelism, we successfully reduced the execution time to a matter of minutes, while retaining a high degree of accuracy even in records with data entry errors. We also propose techniques to prevent the inference of identifier information when knowledge of underlying data distributions is known to an adversary. Finally, we discuss how records can be tracked utilizing the detection results during query processing.

  11. Dynamic Planar Range Maxima Queries

    DEFF Research Database (Denmark)

    Brodal, Gerth Stølting; Tsakalidis, Konstantinos

    2011-01-01

    We consider the dynamic two-dimensional maxima query problem. Let P be a set of n points in the plane. A point is maximal if it is not dominated by any other point in P. We describe two data structures that support the reporting of the t maximal points that dominate a given query point, and allow...... for insertions and deletions of points in P. In the pointer machine model we present a linear space data structure with O(logn + t) worst case query time and O(logn) worst case update time. This is the first dynamic data structure for the planar maxima dominance query problem that achieves these bounds...... are integers in the range U = {0, …,2 w  − 1 }. We present a linear space data structure that supports 3-sided range maxima queries in O(logn/loglogn+t) worst case time and updates in O(logn/loglogn) worst case time. These are the first sublogarithmic worst case bounds for all operations in the RAM model....

  12. Searching for rare diseases in PubMed: a blind comparison of Orphanet expert query and query based on terminological knowledge.

    Science.gov (United States)

    Griffon, N; Schuers, M; Dhombres, F; Merabti, T; Kerdelhué, G; Rollin, L; Darmoni, S J

    2016-08-02

    Despite international initiatives like Orphanet, it remains difficult to find up-to-date information about rare diseases. The aim of this study is to propose an exhaustive set of queries for PubMed based on terminological knowledge and to evaluate it versus the queries based on expertise provided by the most frequently used resource in Europe: Orphanet. Four rare disease terminologies (MeSH, OMIM, HPO and HRDO) were manually mapped to each other permitting the automatic creation of expended terminological queries for rare diseases. For 30 rare diseases, 30 citations retrieved by Orphanet expert query and/or query based on terminological knowledge were assessed for relevance by two independent reviewers unaware of the query's origin. An adjudication procedure was used to resolve any discrepancy. Precision, relative recall and F-measure were all computed. For each Orphanet rare disease (n = 8982), there was a corresponding terminological query, in contrast with only 2284 queries provided by Orphanet. Only 553 citations were evaluated due to queries with 0 or only a few hits. There were no significant differences between the Orpha query and terminological query in terms of precision, respectively 0.61 vs 0.52 (p = 0.13). Nevertheless, terminological queries retrieved more citations more often than Orpha queries (0.57 vs. 0.33; p = 0.01). Interestingly, Orpha queries seemed to retrieve older citations than terminological queries (p < 0.0001). The terminological queries proposed in this study are now currently available for all rare diseases. They may be a useful tool for both precision or recall oriented literature search.

  13. Man vs. Machine: Differences in SPARQL Queries

    NARCIS (Netherlands)

    Rietveld, L.; Hoekstra, R.

    2014-01-01

    Server-side SPARQL query logs have been a topic of study for some time now. The USEWOD collection of query logs is currently the primary source of information for researchers. A recurring problem is that these logs leave application queries and queries created by humans indistinguishable. In this

  14. Macromolecular query language (MMQL): prototype data model and implementation.

    Science.gov (United States)

    Shindyalov, I N; Chang, W; Pu, C; Bourne, P E

    1994-11-01

    Macromolecular query language (MMQL) is an extensible interpretive language in which to pose questions concerning the experimental or derived features of the 3-D structure of biological macromolecules. MMQL portends to be intuitive with a simple syntax, so that from a user's perspective complex queries are easily written. A number of basic queries and a more complex query--determination of structures containing a five-strand Greek key motif--are presented to illustrate the strengths and weaknesses of the language. The predominant features of MMQL are a filter and pattern grammar which are combined to express a wide range of interesting biological queries. Filters permit the selection of object attributes, for example, compound name and resolution, whereas the patterns currently implemented query primary sequence, close contacts, hydrogen bonding, secondary structure, conformation and amino acid properties (volume, polarity, isoelectric point, hydrophobicity and different forms of exposure). MMQL queries are processed by MMQLlib; a C++ class library, to which new query methods and pattern types are easily added. The prototype implementation described uses PDBlib, another C(++)-based class library from representing the features of biological macromolecules at the level of detail parsable from a PDB file. Since PDBlib can represent data stored in relational and object-oriented databases, as well as PDB files, once these data are loaded they too can be queried by MMQL. Performance metrics are given for queries of PDB files for which all derived data are calculated at run time and compared to a preliminary version of OOPDB, a prototype object-oriented database with a schema based on a persistent version of PDBlib which offers more efficient data access and the potential to maintain derived information. MMQLlib, PDBlib and associated software are available via anonymous ftp from cuhhca.hhmi.columbia.edu.

  15. Head First jQuery

    CERN Document Server

    Benedetti, Ryan

    2011-01-01

    Want to add more interactivity and polish to your websites? Discover how jQuery can help you build complex scripting functionality in just a few lines of code. With Head First jQuery, you'll quickly get up to speed on this amazing JavaScript library by learning how to navigate HTML documents while handling events, effects, callbacks, and animations. By the time you've completed the book, you'll be incorporating Ajax apps, working seamlessly with HTML and CSS, and handling data with PHP, MySQL and JSON. If you want to learn-and understand-how to create interactive web pages, unobtrusive scrip

  16. SPARK: Adapting Keyword Query to Semantic Search

    Science.gov (United States)

    Zhou, Qi; Wang, Chong; Xiong, Miao; Wang, Haofen; Yu, Yong

    Semantic search promises to provide more accurate result than present-day keyword search. However, progress with semantic search has been delayed due to the complexity of its query languages. In this paper, we explore a novel approach of adapting keywords to querying the semantic web: the approach automatically translates keyword queries into formal logic queries so that end users can use familiar keywords to perform semantic search. A prototype system named 'SPARK' has been implemented in light of this approach. Given a keyword query, SPARK outputs a ranked list of SPARQL queries as the translation result. The translation in SPARK consists of three major steps: term mapping, query graph construction and query ranking. Specifically, a probabilistic query ranking model is proposed to select the most likely SPARQL query. In the experiment, SPARK achieved an encouraging translation result.

  17. Research in Mobile Database Query Optimization and Processing

    Directory of Open Access Journals (Sweden)

    Agustinus Borgy Waluyo

    2005-01-01

    Full Text Available The emergence of mobile computing provides the ability to access information at any time and place. However, as mobile computing environments have inherent factors like power, storage, asymmetric communication cost, and bandwidth limitations, efficient query processing and minimum query response time are definitely of great interest. This survey groups a variety of query optimization and processing mechanisms in mobile databases into two main categories, namely: (i query processing strategy, and (ii caching management strategy. Query processing includes both pull and push operations (broadcast mechanisms. We further classify push operation into on-demand broadcast and periodic broadcast. Push operation (on-demand broadcast relates to designing techniques that enable the server to accommodate multiple requests so that the request can be processed efficiently. Push operation (periodic broadcast corresponds to data dissemination strategies. In this scheme, several techniques to improve the query performance by broadcasting data to a population of mobile users are described. A caching management strategy defines a number of methods for maintaining cached data items in clients' local storage. This strategy considers critical caching issues such as caching granularity, caching coherence strategy and caching replacement policy. Finally, this survey concludes with several open issues relating to mobile query optimization and processing strategy.

  18. Identifying Indicators Related to Constructs for Engineering Design Outcome

    Science.gov (United States)

    Wilhelmsen, Cheryl A.; Dixon, Raymond A.

    2016-01-01

    This study ranked constructs articulated by Childress and Rhodes (2008) and identified the key indicators for each construct as a starting point to explore what should be included on an instrument to measure the engineering design process and outcomes of students in high schools that use the PLTW and EbDTM curricula in Idaho. A case-study design…

  19. A study of medical and health queries to web search engines.

    Science.gov (United States)

    Spink, Amanda; Yang, Yin; Jansen, Jim; Nykanen, Pirrko; Lorence, Daniel P; Ozmutlu, Seda; Ozmutlu, H Cenk

    2004-03-01

    This paper reports findings from an analysis of medical or health queries to different web search engines. We report results: (i). comparing samples of 10000 web queries taken randomly from 1.2 million query logs from the AlltheWeb.com and Excite.com commercial web search engines in 2001 for medical or health queries, (ii). comparing the 2001 findings from Excite and AlltheWeb.com users with results from a previous analysis of medical and health related queries from the Excite Web search engine for 1997 and 1999, and (iii). medical or health advice-seeking queries beginning with the word 'should'. Findings suggest: (i). a small percentage of web queries are medical or health related, (ii). the top five categories of medical or health queries were: general health, weight issues, reproductive health and puberty, pregnancy/obstetrics, and human relationships, and (iii). over time, the medical and health queries may have declined as a proportion of all web queries, as the use of specialized medical/health websites and e-commerce-related queries has increased. Findings provide insights into medical and health-related web querying and suggests some implications for the use of the general web search engines when seeking medical/health information.

  20. Fuzzy Querying: Issues and Perspectives..

    Czech Academy of Sciences Publication Activity Database

    Kacprzyk, J.; Pasi, G.; Vojtáš, Peter; Zadrozny, S.

    2000-01-01

    Roč. 36, č. 6 (2000), s. 605-616 ISSN 0023-5954 Institutional research plan: AV0Z1030915 Keywords : flexible querying * information retrieval * fuzzy databases Subject RIV: BA - General Mathematics http://dml.cz/handle/10338.dmlcz/135376

  1. Automatically Preparing Safe SQL Queries

    Science.gov (United States)

    Bisht, Prithvi; Sistla, A. Prasad; Venkatakrishnan, V. N.

    We present the first sound program source transformation approach for automatically transforming the code of a legacy web application to employ PREPARE statements in place of unsafe SQL queries. Our approach therefore opens the way for eradicating the SQL injection threat vector from legacy web applications.

  2. Querying Large Biological Network Datasets

    Science.gov (United States)

    Gulsoy, Gunhan

    2013-01-01

    New experimental methods has resulted in increasing amount of genetic interaction data to be generated every day. Biological networks are used to store genetic interaction data gathered. Increasing amount of data available requires fast large scale analysis methods. Therefore, we address the problem of querying large biological network datasets.…

  3. Identifying Fracture Types and Relative Ages Using Fluid Inclusion Stratigraphy

    Energy Technology Data Exchange (ETDEWEB)

    Dilley, Lorie M.; Norman, David; Owens, Lara

    2008-06-30

    Enhanced Geothermal Systems (EGS) are designed to recover heat from the subsurface by mechanically creating fractures in subsurface rocks. Understanding the life cycle of a fracture in a geothermal system is fundamental to the development of techniques for creating fractures. Recognizing the stage of a fracture, whether it is currently open and transmitting fluids; if it recently has closed; or if it is an ancient fracture would assist in targeting areas for further fracture stimulation. Identifying dense fracture areas as well as large open fractures from small fracture systems will also assist in fracture stimulation selection. Geothermal systems are constantly generating fractures, and fluids and gases passing through rocks in these systems leave small fluid and gas samples trapped in healed microfractures. Fluid inclusions trapped in minerals as the fractures heal are characteristic of the fluids that formed them, and this signature can be seen in fluid inclusion gas analysis. Our hypothesis is that fractures over their life cycle have different chemical signatures that we can see in fluid inclusion gas analysis and by using the new method of fluid inclusion stratigraphy (FIS) the different stages of fractures, along with an estimate of fracture size can be identified during the well drilling process. We have shown with this study that it is possible to identify fracture locations using FIS and that different fractures have different chemical signatures however that signature is somewhat dependent upon rock type. Open, active fractures correlate with increase concentrations of CO2, N2, Ar, and to a lesser extent H2O. These fractures would be targets for further enhancement. The usefulness of this method is that it is low cost alternative to current well logging techniques and can be done as a well is being drilled.

  4. Fragger: a protein fragment picker for structural queries.

    Science.gov (United States)

    Berenger, Francois; Simoncini, David; Voet, Arnout; Shrestha, Rojan; Zhang, Kam Y J

    2017-01-01

    Protein modeling and design activities often require querying the Protein Data Bank (PDB) with a structural fragment, possibly containing gaps. For some applications, it is preferable to work on a specific subset of the PDB or with unpublished structures. These requirements, along with specific user needs, motivated the creation of a new software to manage and query 3D protein fragments. Fragger is a protein fragment picker that allows protein fragment databases to be created and queried. All fragment lengths are supported and any set of PDB files can be used to create a database. Fragger can efficiently search a fragment database with a query fragment and a distance threshold. Matching fragments are ranked by distance to the query. The query fragment can have structural gaps and the allowed amino acid sequences matching a query can be constrained via a regular expression of one-letter amino acid codes. Fragger also incorporates a tool to compute the backbone RMSD of one versus many fragments in high throughput. Fragger should be useful for protein design, loop grafting and related structural bioinformatics tasks.

  5. Identifying Dyscalculia Symptoms Related to Magnocellular Reasoning Using Smartphones.

    Science.gov (United States)

    Knudsen, Greger Siem; Babic, Ankica

    2016-01-01

    This paper presents a study that has developed a mobile software application for assisting diagnosis of learning disabilities in mathematics, called dyscalculia, and measuring correlations between dyscalculia symptoms and magnocellular reasoning. Usually, software aids for dyscalculic individuals are focused on both assisting diagnosis and teaching the material. The software developed in this study however maintains a specific focus on the former, and in the process attempts to capture alleged correlations between dyscalculia symptoms and possible underlying causes of the condition. Classification of symptoms is performed by k-Nearest Neighbor algorithm classifying five parameters evaluating user's skills, returning calculated performance in each category as well as correlation strength between detected symptoms and magnocellular reasoning abilities. Expert evaluations has found the application to be appropriate and productive for its intended purpose, proving that mobile software is a suitable and valuable tool for assisting dyscalculia diagnosis and identifying root causes of developing the condition.

  6. Identifying Patterns in Extreme Precipitation Risk and the Related Impacts

    Science.gov (United States)

    Schroeer, K.; Tye, M. R.

    2017-12-01

    Extreme precipitation can harm human life and assets through flooding, hail, landslides, or debris flows. Flood risk assessments typically concentrate on river or mountain torrent channels, using water depth, flow velocity, and/or sediment deposition to quantify the risk. In addition, extreme events with high recurrence intervals are often the main focus. However, damages from short-term and localized convective showers often occur away from watercourses. Also, damages from more frequent small scale extremes, although usually less disastrous, can accumulate to considerable financial burdens. Extreme convective precipitation is expected to intensify in a warmer climate, and vulnerability patterns might change in tandem with changes in the character of precipitation and flood types. This has consequences for adaptation planners who want to establish effective protection measures and reduce the cost from natural hazards. Here we merge hydrological and exposure data to identify patterns of risk under varying synoptic conditions. Exposure is calculated from a database of 76k damage claims reported to the national disaster fund in 480 municipalities in south eastern Austria from 1990-2015. Hydrological data comprise sub-daily precipitation (59 gauges) and streamflow (62 gauges) observations. We use synoptic circulation types to identify typical precipitation patterns. They indicate the character of precipitation even if a gauge is not in close proximity, facilitating potential future research with regional climate model data. Results show that more claims are reported under synoptic conditions favouring convective precipitation (on average 1.5-3 times more than on other days). For agrarian municipalities, convective precipitation damages are among the costliest after long low-intensity precipitation events. In contrast, Alpine communities are particularly vulnerable to convective high-intensity rainfall. In addition to possible observational error, uncertainty is present

  7. Identifying diseases-related metabolites using random walk.

    Science.gov (United States)

    Hu, Yang; Zhao, Tianyi; Zhang, Ningyi; Zang, Tianyi; Zhang, Jun; Cheng, Liang

    2018-04-11

    Metabolites disrupted by abnormal state of human body are deemed as the effect of diseases. In comparison with the cause of diseases like genes, these markers are easier to be captured for the prevention and diagnosis of metabolic diseases. Currently, a large number of metabolic markers of diseases need to be explored, which drive us to do this work. The existing metabolite-disease associations were extracted from Human Metabolome Database (HMDB) using a text mining tool NCBO annotator as priori knowledge. Next we calculated the similarity of a pair-wise metabolites based on the similarity of disease sets of them. Then, all the similarities of metabolite pairs were utilized for constructing a weighted metabolite association network (WMAN). Subsequently, the network was utilized for predicting novel metabolic markers of diseases using random walk. Totally, 604 metabolites and 228 diseases were extracted from HMDB. From 604 metabolites, 453 metabolites are selected to construct the WMAN, where each metabolite is deemed as a node, and the similarity of two metabolites as the weight of the edge linking them. The performance of the network is validated using the leave one out method. As a result, the high area under the receiver operating characteristic curve (AUC) (0.7048) is achieved. The further case studies for identifying novel metabolites of diabetes mellitus were validated in the recent studies. In this paper, we presented a novel method for prioritizing metabolite-disease pairs. The superior performance validates its reliability for exploring novel metabolic markers of diseases.

  8. Querying Sentiment Development over Time

    DEFF Research Database (Denmark)

    Andreasen, Troels; Christiansen, Henning; Have, Christian Theil

    2013-01-01

    A new language is introduced for describing hypotheses about fluctuations of measurable properties in streams of timestamped data, and as prime example, we consider trends of emotions in the constantly flowing stream of Twitter messages. The language, called EmoEpisodes, has a precise semantics...... that measures how well a hypothesis characterizes a given time interval; the semantics is parameterized so it can be adjusted to different views of the data. EmoEpisodes is extended to a query language with variables standing for unknown topics and emotions, and the query-answering mechanism will return...... instantiations for topics and emotions as well as time intervals that provide the largest deflections in this measurement. Experiments are performed on a selection of Twitter data to demonstrates the usefulness of the approach....

  9. Identifying Sentiment of Hookah-Related Posts on Twitter

    Science.gov (United States)

    Ramanujam, Jagannathan; Lerman, Kristina; Chu, Kar-Hai; Boley Cruz, Tess; Unger, Jennifer B

    2017-01-01

    Background The increasing popularity of hookah (or waterpipe) use in the United States and elsewhere has consequences for public health because it has similar health risks to that of combustible cigarettes. While hookah use rapidly increases in popularity, social media data (Twitter, Instagram) can be used to capture and describe the social and environmental contexts in which individuals use, perceive, discuss, and are marketed this tobacco product. These data may allow people to organically report on their sentiment toward tobacco products like hookah unprimed by a researcher, without instrument bias, and at low costs. Objective This study describes the sentiment of hookah-related posts on Twitter and describes the importance of debiasing Twitter data when attempting to understand attitudes. Methods Hookah-related posts on Twitter (N=986,320) were collected from March 24, 2015, to December 2, 2016. Machine learning models were used to describe sentiment on 20 different emotions and to debias the data so that Twitter posts reflected sentiment of legitimate human users and not of social bots or marketing-oriented accounts that would possibly provide overly positive or overly negative sentiment of hookah. Results From the analytical sample, 352,116 tweets (59.50%) were classified as positive while 177,537 (30.00%) were classified as negative, and 62,139 (10.50%) neutral. Among all positive tweets, 218,312 (62.00%) were classified as highly positive emotions (eg, active, alert, excited, elated, happy, and pleasant), while 133,804 (38.00%) positive tweets were classified as passive positive emotions (eg, contented, serene, calm, relaxed, and subdued). Among all negative tweets, 95,870 (54.00%) were classified as subdued negative emotions (eg, sad, unhappy, depressed, and bored) while the remaining 81,667 (46.00%) negative tweets were classified as highly negative emotions (eg, tense, nervous, stressed, upset, and unpleasant). Sentiment changed drastically when

  10. Identifying Sentiment of Hookah-Related Posts on Twitter.

    Science.gov (United States)

    Allem, Jon-Patrick; Ramanujam, Jagannathan; Lerman, Kristina; Chu, Kar-Hai; Boley Cruz, Tess; Unger, Jennifer B

    2017-10-18

    The increasing popularity of hookah (or waterpipe) use in the United States and elsewhere has consequences for public health because it has similar health risks to that of combustible cigarettes. While hookah use rapidly increases in popularity, social media data (Twitter, Instagram) can be used to capture and describe the social and environmental contexts in which individuals use, perceive, discuss, and are marketed this tobacco product. These data may allow people to organically report on their sentiment toward tobacco products like hookah unprimed by a researcher, without instrument bias, and at low costs. This study describes the sentiment of hookah-related posts on Twitter and describes the importance of debiasing Twitter data when attempting to understand attitudes. Hookah-related posts on Twitter (N=986,320) were collected from March 24, 2015, to December 2, 2016. Machine learning models were used to describe sentiment on 20 different emotions and to debias the data so that Twitter posts reflected sentiment of legitimate human users and not of social bots or marketing-oriented accounts that would possibly provide overly positive or overly negative sentiment of hookah. From the analytical sample, 352,116 tweets (59.50%) were classified as positive while 177,537 (30.00%) were classified as negative, and 62,139 (10.50%) neutral. Among all positive tweets, 218,312 (62.00%) were classified as highly positive emotions (eg, active, alert, excited, elated, happy, and pleasant), while 133,804 (38.00%) positive tweets were classified as passive positive emotions (eg, contented, serene, calm, relaxed, and subdued). Among all negative tweets, 95,870 (54.00%) were classified as subdued negative emotions (eg, sad, unhappy, depressed, and bored) while the remaining 81,667 (46.00%) negative tweets were classified as highly negative emotions (eg, tense, nervous, stressed, upset, and unpleasant). Sentiment changed drastically when comparing a corpus of tweets with social bots

  11. Query Optimizations over Decentralized RDF Graphs

    KAUST Repository

    Abdelaziz, Ibrahim; Mansour, Essam; Ouzzani, Mourad; Aboulnaga, Ashraf; Kalnis, Panos

    2017-01-01

    Applications in life sciences, decentralized social networks, Internet of Things, and statistical linked dataspaces integrate data from multiple decentralized RDF graphs via SPARQL queries. Several approaches have been proposed to optimize query

  12. Concept-based query language approach to enterprise information systems

    Science.gov (United States)

    Niemi, Timo; Junkkari, Marko; Järvelin, Kalervo

    2014-01-01

    In enterprise information systems (EISs) it is necessary to model, integrate and compute very diverse data. In advanced EISs the stored data often are based both on structured (e.g. relational) and semi-structured (e.g. XML) data models. In addition, the ad hoc information needs of end-users may require the manipulation of data-oriented (structural), behavioural and deductive aspects of data. Contemporary languages capable of treating this kind of diversity suit only persons with good programming skills. In this paper we present a concept-oriented query language approach to manipulate this diversity so that the programming skill requirements are considerably reduced. In our query language, the features which need technical knowledge are hidden in application-specific concepts and structures. Therefore, users need not be aware of the underlying technology. Application-specific concepts and structures are represented by the modelling primitives of the extended RDOOM (relational deductive object-oriented modelling) which contains primitives for all crucial real world relationships (is-a relationship, part-of relationship, association), XML documents and views. Our query language also supports intensional and extensional-intensional queries, in addition to conventional extensional queries. In its query formulation, the end-user combines available application-specific concepts and structures through shared variables.

  13. A distributed query execution engine of big attributed graphs.

    Science.gov (United States)

    Batarfi, Omar; Elshawi, Radwa; Fayoumi, Ayman; Barnawi, Ahmed; Sakr, Sherif

    2016-01-01

    A graph is a popular data model that has become pervasively used for modeling structural relationships between objects. In practice, in many real-world graphs, the graph vertices and edges need to be associated with descriptive attributes. Such type of graphs are referred to as attributed graphs. G-SPARQL has been proposed as an expressive language, with a centralized execution engine, for querying attributed graphs. G-SPARQL supports various types of graph querying operations including reachability, pattern matching and shortest path where any G-SPARQL query may include value-based predicates on the descriptive information (attributes) of the graph edges/vertices in addition to the structural predicates. In general, a main limitation of centralized systems is that their vertical scalability is always restricted by the physical limits of computer systems. This article describes the design, implementation in addition to the performance evaluation of DG-SPARQL, a distributed, hybrid and adaptive parallel execution engine of G-SPARQL queries. In this engine, the topology of the graph is distributed over the main memory of the underlying nodes while the graph data are maintained in a relational store which is replicated on the disk of each of the underlying nodes. DG-SPARQL evaluates parts of the query plan via SQL queries which are pushed to the underlying relational stores while other parts of the query plan, as necessary, are evaluated via indexless memory-based graph traversal algorithms. Our experimental evaluation shows the efficiency and the scalability of DG-SPARQL on querying massive attributed graph datasets in addition to its ability to outperform the performance of Apache Giraph, a popular distributed graph processing system, by orders of magnitudes.

  14. On tractable query evaluation for SPARQL

    OpenAIRE

    Mengel, Stefan; Skritek, Sebastian

    2017-01-01

    Despite much work within the last decade on foundational properties of SPARQL - the standard query language for RDF data - rather little is known about the exact limits of tractability for this language. In particular, this is the case for SPARQL queries that contain the OPTIONAL-operator, even though it is one of the most intensively studied features of SPARQL. The aim of our work is to provide a more thorough picture of tractable classes of SPARQL queries. In general, SPARQL query evaluatio...

  15. Nearest Neighbor Queries in Road Networks

    DEFF Research Database (Denmark)

    Jensen, Christian Søndergaard; Kolar, Jan; Pedersen, Torben Bach

    2003-01-01

    in road networks. Such queries may be of use in many services. Specifically, we present an easily implementable data model that serves well as a foundation for such queries. We also present the design of a prototype system that implements the queries based on the data model. The algorithm used...

  16. Advanced Query Formulation in Deductive Databases.

    Science.gov (United States)

    Niemi, Timo; Jarvelin, Kalervo

    1992-01-01

    Discusses deductive databases and database management systems (DBMS) and introduces a framework for advanced query formulation for end users. Recursive processing is described, a sample extensional database is presented, query types are explained, and criteria for advanced query formulation from the end user's viewpoint are examined. (31…

  17. SCRY: Enabling quantitative reasoning in SPARQL queries

    NARCIS (Netherlands)

    Meroño-Peñuela, A.; Stringer, Bas; Loizou, Antonis; Abeln, Sanne; Heringa, Jaap

    2015-01-01

    The inability to include quantitative reasoning in SPARQL queries slows down the application of Semantic Web technology in the life sciences. SCRY, our SPARQL compatible service layer, improves this by executing services at query time and making their outputs query-accessible, generating RDF data on

  18. On the formulation of performant sparql queries

    NARCIS (Netherlands)

    Loizou, A.; Angles, R.; Groth, P.T.

    2014-01-01

    Abstract The combination of the flexibility of RDF and the expressiveness of SPARQL provides a powerful mechanism to model, integrate and query data. However, these properties also mean that it is nontrivial to write performant SPARQL queries. Indeed, it is quite easy to create queries that tax even

  19. How Good Are Query Optimizers, Really?

    NARCIS (Netherlands)

    Leis, Viktor; Gubichev, Andrey; Mirchev, Atanas; Boncz, Peter; Kemper, Alfons; Neumann, Thomas

    2016-01-01

    Finding a good join order is crucial for query performance. In this paper, we introduce the Join Order Benchmark (JOB) and experimentally revisit the main components in the classic query optimizer architecture using a complex, real-world data set and realistic multi-join queries. We investigate the

  20. Predecessor queries in dynamic integer sets

    DEFF Research Database (Denmark)

    Brodal, Gerth Stølting

    1997-01-01

    We consider the problem of maintaining a set of n integers in the range 0.2w–1 under the operations of insertion, deletion, predecessor queries, minimum queries and maximum queries on a unit cost RAM with word size w bits. Let f (n) be an arbitrary nondecreasing smooth function satisfying n...

  1. Mobile Information Access with Spoken Query Answering

    DEFF Research Database (Denmark)

    Brøndsted, Tom; Larsen, Henrik Legind; Larsen, Lars Bo

    2006-01-01

    window focused over the part which most likely contains an answer to the query. The two systems are integrated into a full spoken query answering system. The prototype can answer queries and questions within the chosen football (soccer) test domain, but the system has the flexibility for being ported...

  2. Path Index Based Keywords to SPARQL Query Transformation for Semantic Data Federations

    Directory of Open Access Journals (Sweden)

    Thilini Cooray

    2016-06-01

    Full Text Available Semantic web is a highly emerging research domain. Enhancing the ability of keyword query processing on Semantic Web data provides a huge support for familiarizing the usefulness of Semantic Web to the general public. Most of the existing approaches focus on just user keyword matching to RDF graphs and output the connecting elements as results. Semantic Web consists of SPARQL query language which can process queries more accurately and efficiently than general keyword matching. There are only about a couple of approaches available for transforming keyword queries to SPARQL. They basically rely on real time graph traversals? for identifying subgraphs which can connect user keywords. Those approaches are either limited to query processing on a single data store or a set of interlinked data sets. They have not focused on query processing on a federation of independent data sets which belongs to the same domain. This research proposes a Path Index based approach eliminating real time graph traversal for transforming keyword queries to SPARQL. We have introduced an ontology alignment based approach for keyword query transforming on a federation of RDF data stored using multiple heterogeneous vocabularies. Evaluation shows that the proposed approach have the ability to generate SPARQL queries which can provide highly relevant results for user keyword queries. The Path Index based query transformation approach has also achieved high efficiency compared to the existing approach.

  3. Labeling RDF Graphs for Linear Time and Space Querying

    Science.gov (United States)

    Furche, Tim; Weinzierl, Antonius; Bry, François

    Indices and data structures for web querying have mostly considered tree shaped data, reflecting the view of XML documents as tree-shaped. However, for RDF (and when querying ID/IDREF constraints in XML) data is indisputably graph-shaped. In this chapter, we first study existing indexing and labeling schemes for RDF and other graph datawith focus on support for efficient adjacency and reachability queries. For XML, labeling schemes are an important part of the widespread adoption of XML, in particular for mapping XML to existing (relational) database technology. However, the existing indexing and labeling schemes for RDF (and graph data in general) sacrifice one of the most attractive properties of XML labeling schemes, the constant time (and per-node space) test for adjacency (child) and reachability (descendant). In the second part, we introduce the first labeling scheme for RDF data that retains this property and thus achieves linear time and space processing of acyclic RDF queries on a significantly larger class of graphs than previous approaches (which are mostly limited to tree-shaped data). Finally, we show how this labeling scheme can be applied to (acyclic) SPARQL queries to obtain an evaluation algorithm with time and space complexity linear in the number of resources in the queried RDF graph.

  4. A Framework for WWW Query Processing

    Science.gov (United States)

    Wu, Binghui Helen; Wharton, Stephen (Technical Monitor)

    2000-01-01

    Query processing is the most common operation in a DBMS. Sophisticated query processing has been mainly targeted at a single enterprise environment providing centralized control over data and metadata. Submitting queries by anonymous users on the web is different in such a way that load balancing or DBMS' accessing control becomes the key issue. This paper provides a solution by introducing a framework for WWW query processing. The success of this framework lies in the utilization of query optimization techniques and the ontological approach. This methodology has proved to be cost effective at the NASA Goddard Space Flight Center Distributed Active Archive Center (GDAAC).

  5. Web development with jQuery

    CERN Document Server

    York, Richard

    2015-01-01

    Newly revised and updated resource on jQuery's many features and advantages Web Development with jQuery offers a major update to the popular Beginning JavaScript and CSS Development with jQuery from 2009. More than half of the content is new or updated, and reflects recent innovations with regard to mobile applications, jQuery mobile, and the spectrum of associated plugins. Readers can expect thorough revisions with expanded coverage of events, CSS, AJAX, animation, and drag and drop. New chapters bring developers up to date on popular features like jQuery UI, navigation, tables, interacti

  6. A Pragmatic Evaluation of the National Cancer Institute Physician Data Query (PDQ)®-Based Brief Counseling on Cancer-Related Fatigue among Patients Undergoing Radiation Therapy

    Science.gov (United States)

    Bauml, Joshua; Xie, Sharon X; Penn, Courtney; Desai, Krupali; Dong, Kimberly W; Bruner, Deborah Watkins; Vapiwala, Neha; Mao, Jun James

    2018-01-01

    Purpose Cancer-Related Fatigue (CRF) negatively affects quality of life among cancer patients. This study seeks to evaluate the outcome and patient receptiveness of a brief counseling program based on National Cancer Institute (NCI) PDQ® information to manage CRF when integrated into Radiation Therapy (RT). Methods We conducted a prospective cohort study among patients undergoing non-palliative RT. Patients with stage I–III tumors and with Karnofsky score 60 or better were given a ten-minute behavioral counseling session during the first two weeks of RT. The Brief Fatigue Inventory (BFI) was administered at baseline/end of RT. Results Of 93 patients enrolled, 89% found the counseling useful and practical. By the end of RT, 59% reported increased exercise, 41.6% sought nutrition counseling, 72.7% prioritized daily activities, 74.4% took daytime naps, and 70.5% talked with other cancer patients. Regarding counseling, patients who had received chemotherapy prior to RT had no change in fatigue (−0.2), those who received RT alone had mild increase in fatigue (0.7, p=0.02), and those who received concurrent chemotherapy experienced a substantial increase in fatigue (3.0 to 5.2, p=0.05). Higher baseline fatigue and receipt of chemotherapy were predictive of worsened fatigue in a multivariate model (both p<0.01). Conclusion Our data suggests that brief behavioral counseling based on NCI guidelines is well accepted by patients showing an uptake in many activities to cope with CRF. Those who receive concurrent chemotherapy and with higher baseline fatigue are at risk for worsening fatigue despite of guideline-based therapy. PMID:29479490

  7. Querying Large Physics Data Sets Over an Information Grid

    CERN Document Server

    Baker, N; Kovács, Z; Le Goff, J M; McClatchey, R

    2001-01-01

    Optimising use of the Web (WWW) for LHC data analysis is a complex problem and illustrates the challenges arising from the integration of and computation across massive amounts of information distributed worldwide. Finding the right piece of information can, at times, be extremely time-consuming, if not impossible. So-called Grids have been proposed to facilitate LHC computing and many groups have embarked on studies of data replication, data migration and networking philosophies. Other aspects such as the role of 'middleware' for Grids are emerging as requiring research. This paper positions the need for appropriate middleware that enables users to resolve physics queries across massive data sets. It identifies the role of meta-data for query resolution and the importance of Information Grids for high-energy physics analysis rather than just Computational or Data Grids. This paper identifies software that is being implemented at CERN to enable the querying of very large collaborating HEP data-sets, initially...

  8. Predicting Drug Recalls From Internet Search Engine Queries.

    Science.gov (United States)

    Yom-Tov, Elad

    2017-01-01

    Batches of pharmaceuticals are sometimes recalled from the market when a safety issue or a defect is detected in specific production runs of a drug. Such problems are usually detected when patients or healthcare providers report abnormalities to medical authorities. Here, we test the hypothesis that defective production lots can be detected earlier by monitoring queries to Internet search engines. We extracted queries from the USA to the Bing search engine, which mentioned one of the 5195 pharmaceutical drugs during 2015 and all recall notifications issued by the Food and Drug Administration (FDA) during that year. By using attributes that quantify the change in query volume at the state level, we attempted to predict if a recall of a specific drug will be ordered by FDA in a time horizon ranging from 1 to 40 days in future. Our results show that future drug recalls can indeed be identified with an AUC of 0.791 and a lift at 5% of approximately 6 when predicting a recall occurring one day ahead. This performance degrades as prediction is made for longer periods ahead. The most indicative attributes for prediction are sudden spikes in query volume about a specific medicine in each state. Recalls of prescription drugs and those estimated to be of medium-risk are more likely to be identified using search query data. These findings suggest that aggregated Internet search engine data can be used to facilitate in early warning of faulty batches of medicines.

  9. Scale-Independent Relational Query Processing

    Science.gov (United States)

    Armbrust, Michael Paul

    2013-01-01

    An increasingly common pattern is for newly-released web applications to succumb to a "Success Disaster". In this scenario, overloaded database machines and resultant high response times destroy a previously good user experience, just as a site is becoming popular. Unfortunately, the data independence provided by a traditional relational…

  10. Semantic Query Processing : Estimating Relational Purity

    NARCIS (Netherlands)

    Kalo, Jan-Christoph; Lofi, C.; Maseli, René Pascal; Balke, Wolf-Tilo; Leyer, M.

    2017-01-01

    The use of semantic information found in structured knowledge bases has become an integral part of the processing pipeline of modern intelligent in-
    formation systems. However, such semantic information is frequently insuffi-cient to capture the rich semantics demanded by the applications, and

  11. An Adaptive Genetic Algorithm with Dynamic Population Size for Optimizing Join Queries

    OpenAIRE

    Vellev, Stoyan

    2008-01-01

    The problem of finding the optimal join ordering executing a query to a relational database management system is a combinatorial optimization problem, which makes deterministic exhaustive solution search unacceptable for queries with a great number of joined relations. In this work an adaptive genetic algorithm with dynamic population size is proposed for optimizing large join queries. The performance of the algorithm is compared with that of several classical non-determinis...

  12. EquiX-A Search and Query Language for XML.

    Science.gov (United States)

    Cohen, Sara; Kanza, Yaron; Kogan, Yakov; Sagiv, Yehoshua; Nutt, Werner; Serebrenik, Alexander

    2002-01-01

    Describes EquiX, a search language for XML that combines querying with searching to query the data and the meta-data content of Web pages. Topics include search engines; a data model for XML documents; search query syntax; search query semantics; an algorithm for evaluating a query on a document; and indexing EquiX queries. (LRW)

  13. Adding Conflict Resolution Features to a Query Language for Database Federations

    Directory of Open Access Journals (Sweden)

    Kai-Uwe Sattler

    2000-11-01

    Full Text Available A main problem of data integration is the treatment of conflicts caused by different modeling of realworld entities, different data models or simply by different representations of one and the same object. During the integration phase these conflicts have to be identified and resolved as part of the mapping between local and global schemata. Therefore, conflict resolution affects the definition of the integrated view as well as query transformation and evaluation, in this paper we present a SQL extension for defining and querying database federations. This language addresses in particular the resolution of integration conflicts by providing mechanisms for mapping attributes, restructuring relations as well as extended integration operations. Finally, the application of these resolution strategies is briefly explained by presenting a simple conflict resolution method.

  14. Method of and device for querying of protected structured data

    NARCIS (Netherlands)

    Jonker, Willem; Brinkman, Richard; Doumen, J.M.; Schoenmakers, Berry

    2005-01-01

    Method of and device for querying of protected data structured in the form of a tree. A corresponding tree of node polynomials is constructed such that each node polynomial evaluates to zero for an input equal to an identifier assigned to a node name occurring in a branch of the data tree starting

  15. Method of and device for querying of protected structured data

    NARCIS (Netherlands)

    Brinkman, Richard; Doumen, J.M.; Jonker, Willem; Schoenmakers, B.

    Method of and device for querying of protected data structured in the form of a tree. A corresponding tree of node polynomials is constructed such that each node polynomial evaluates to zero for an input equal to an identifier assigned to a node name occurring in a branch of the data tree starting

  16. Analysing Twitter and web queries for flu trend prediction.

    Science.gov (United States)

    Santos, José Carlos; Matos, Sérgio

    2014-05-07

    Social media platforms encourage people to share diverse aspects of their daily life. Among these, shared health related information might be used to infer health status and incidence rates for specific conditions or symptoms. In this work, we present an infodemiology study that evaluates the use of Twitter messages and search engine query logs to estimate and predict the incidence rate of influenza like illness in Portugal. Based on a manually classified dataset of 2704 tweets from Portugal, we selected a set of 650 textual features to train a Naïve Bayes classifier to identify tweets mentioning flu or flu-like illness or symptoms. We obtained a precision of 0.78 and an F-measure of 0.83, based on cross validation over the complete annotated set. Furthermore, we trained a multiple linear regression model to estimate the health-monitoring data from the Influenzanet project, using as predictors the relative frequencies obtained from the tweet classification results and from query logs, and achieved a correlation ratio of 0.89 (puser-generated content have mostly focused on the english language. Our results further validate those studies and show that by changing the initial steps of data preprocessing and feature extraction and selection, the proposed approaches can be adapted to other languages. Additionally, we investigated whether the predictive model created can be applied to data from the subsequent flu season. In this case, although the prediction result was good, an initial phase to adapt the regression model could be necessary to achieve more robust results.

  17. Optimal Planar Orthogonal Skyline Counting Queries

    DEFF Research Database (Denmark)

    Brodal, Gerth Stølting; Larsen, Kasper Green

    2014-01-01

    counting queries, i.e. given a query rectangle R to report the size of the skyline of P\\cap R. We present a data structure for storing n points with integer coordinates having query time O(lg n/lglg n) and space usage O(n). The model of computation is a unit cost RAM with logarithmic word size. We prove...

  18. Adding query privacy to robust DHTs

    DEFF Research Database (Denmark)

    Backes, Michael; Goldberg, Ian; Kate, Aniket

    2012-01-01

    intermediate peers that (help to) route the queries towards their destinations. In this paper, we satisfy this requirement by presenting an approach for providing privacy for the keys in DHT queries. We use the concept of oblivious transfer (OT) in communication over DHTs to preserve query privacy without...... privacy over robust DHTs. Finally, we compare the performance of our privacy-preserving protocols with their more privacy-invasive counterparts. We observe that there is no increase in the message complexity...

  19. jQuery Tools UI Library

    CERN Document Server

    Libby, Alex

    2012-01-01

    A practical tutorial with powerful yet simple projects that are quick to implement. This book is aimed at developers who have prior jQuery knowledge, but may not have any prior experience with jQuery Tools. It is possible that they may have started with the basics of jQuery Tools, but want to learn more about how it can be used, as well as get ideas for future projects.

  20. Research on presentation and query service of geo-spatial data based on ontology

    Science.gov (United States)

    Li, Hong-wei; Li, Qin-chao; Cai, Chang

    2008-10-01

    The paper analyzed the deficiency on presentation and query of geo-spatial data existed in current GIS, discussed the advantages that ontology possessed in formalization of geo-spatial data and the presentation of semantic granularity, taken land-use classification system as an example to construct domain ontology, and described it by OWL; realized the grade level and category presentation of land-use data benefited from the thoughts of vertical and horizontal navigation; and then discussed query mode of geo-spatial data based on ontology, including data query based on types and grade levels, instances and spatial relation, and synthetic query based on types and instances; these methods enriched query mode of current GIS, and is a useful attempt; point out that the key point of the presentation and query of spatial data based on ontology is to construct domain ontology that can correctly reflect geo-concept and its spatial relation and realize its fine formalization description.

  1. Secure Skyline Queries on Cloud Platform.

    Science.gov (United States)

    Liu, Jinfei; Yang, Juncheng; Xiong, Li; Pei, Jian

    2017-04-01

    Outsourcing data and computation to cloud server provides a cost-effective way to support large scale data storage and query processing. However, due to security and privacy concerns, sensitive data (e.g., medical records) need to be protected from the cloud server and other unauthorized users. One approach is to outsource encrypted data to the cloud server and have the cloud server perform query processing on the encrypted data only. It remains a challenging task to support various queries over encrypted data in a secure and efficient way such that the cloud server does not gain any knowledge about the data, query, and query result. In this paper, we study the problem of secure skyline queries over encrypted data. The skyline query is particularly important for multi-criteria decision making but also presents significant challenges due to its complex computations. We propose a fully secure skyline query protocol on data encrypted using semantically-secure encryption. As a key subroutine, we present a new secure dominance protocol, which can be also used as a building block for other queries. Finally, we provide both serial and parallelized implementations and empirically study the protocols in terms of efficiency and scalability under different parameter settings, verifying the feasibility of our proposed solutions.

  2. A structural query system for Han characters

    DEFF Research Database (Denmark)

    Skala, Matthew

    2016-01-01

    The IDSgrep structural query system for Han character dictionaries is presented. This dictionary search system represents the spatial structure of Han characters using Extended Ideographic Description Sequences (EIDSes), a data model and syntax based on the Unicode IDS concept. It includes a query...... language for EIDS databases, with a freely available implementation and format translation from popular third-party IDS and XML character databases. The system is designed to suit the needs of font developers and foreign language learners. The search algorithm includes a bit vector index inspired by Bloom...... filters to support faster query operations. Experimental results are presented, evaluating the effect of the indexing on query performance....

  3. Adding Query Privacy to Robust DHTs

    DEFF Research Database (Denmark)

    Backes, Michael; Goldberg, Ian; Kate, Aniket

    2011-01-01

    intermediate peers that (help to) route the queries towards their destinations. In this paper, we satisfy this requirement by presenting an approach for providing privacy for the keys in DHT queries. We use the concept of oblivious transfer (OT) in communication over DHTs to preserve query privacy without...... of obtaining query privacy over robust DHTs. Finally, we compare the performance of our privacy-preserving protocols with their more privacy-invasive counterparts. We observe that there is no increase in the message complexity and only a small overhead in the computational complexity....

  4. Towards Intelligible Query Processing in Relevance Feedback-Based Image Retrieval Systems

    OpenAIRE

    Mohammed, Belkhatir

    2008-01-01

    We have specified within the scope of this paper a framework combining semantics and relational (spatial) characterizations within a coupled architecture in order to address the semantic gap. This framework is instantiated by an operational model based on a sound logic-based formalism, allowing to define a representation for image documents and a matching function to compare index and query structures. We have specified a query framework coupling keyword-based querying with a relevance feedba...

  5. Secure count query on encrypted genomic data.

    Science.gov (United States)

    Hasan, Mohammad Zahidul; Mahdi, Md Safiur Rahman; Sadat, Md Nazmus; Mohammed, Noman

    2018-05-01

    Human genomic information can yield more effective healthcare by guiding medical decisions. Therefore, genomics research is gaining popularity as it can identify potential correlations between a disease and a certain gene, which improves the safety and efficacy of drug treatment and can also develop more effective prevention strategies [1]. To reduce the sampling error and to increase the statistical accuracy of this type of research projects, data from different sources need to be brought together since a single organization does not necessarily possess required amount of data. In this case, data sharing among multiple organizations must satisfy strict policies (for instance, HIPAA and PIPEDA) that have been enforced to regulate privacy-sensitive data sharing. Storage and computation on the shared data can be outsourced to a third party cloud service provider, equipped with enormous storage and computation resources. However, outsourcing data to a third party is associated with a potential risk of privacy violation of the participants, whose genomic sequence or clinical profile is used in these studies. In this article, we propose a method for secure sharing and computation on genomic data in a semi-honest cloud server. In particular, there are two main contributions. Firstly, the proposed method can handle biomedical data containing both genotype and phenotype. Secondly, our proposed index tree scheme reduces the computational overhead significantly for executing secure count query operation. In our proposed method, the confidentiality of shared data is ensured through encryption, while making the entire computation process efficient and scalable for cutting-edge biomedical applications. We evaluated our proposed method in terms of efficiency on a database of Single-Nucleotide Polymorphism (SNP) sequences, and experimental results demonstrate that the execution time for a query of 50 SNPs in a database of 50,000 records is approximately 5 s, where each record

  6. GeoSpark SQL: An Effective Framework Enabling Spatial Queries on Spark

    Directory of Open Access Journals (Sweden)

    Zhou Huang

    2017-09-01

    Full Text Available In the era of big data, Internet-based geospatial information services such as various LBS apps are deployed everywhere, followed by an increasing number of queries against the massive spatial data. As a result, the traditional relational spatial database (e.g., PostgreSQL with PostGIS and Oracle Spatial cannot adapt well to the needs of large-scale spatial query processing. Spark is an emerging outstanding distributed computing framework in the Hadoop ecosystem. This paper aims to address the increasingly large-scale spatial query-processing requirement in the era of big data, and proposes an effective framework GeoSpark SQL, which enables spatial queries on Spark. On the one hand, GeoSpark SQL provides a convenient SQL interface; on the other hand, GeoSpark SQL achieves both efficient storage management and high-performance parallel computing through integrating Hive and Spark. In this study, the following key issues are discussed and addressed: (1 storage management methods under the GeoSpark SQL framework, (2 the spatial operator implementation approach in the Spark environment, and (3 spatial query optimization methods under Spark. Experimental evaluation is also performed and the results show that GeoSpark SQL is able to achieve real-time query processing. It should be noted that Spark is not a panacea. It is observed that the traditional spatial database PostGIS/PostgreSQL performs better than GeoSpark SQL in some query scenarios, especially for the spatial queries with high selectivity, such as the point query and the window query. In general, GeoSpark SQL performs better when dealing with compute-intensive spatial queries such as the kNN query and the spatial join query.

  7. Regular paths in SparQL: querying the NCI Thesaurus.

    Science.gov (United States)

    Detwiler, Landon T; Suciu, Dan; Brinkley, James F

    2008-11-06

    OWL, the Web Ontology Language, provides syntax and semantics for representing knowledge for the semantic web. Many of the constructs of OWL have a basis in the field of description logics. While the formal underpinnings of description logics have lead to a highly computable language, it has come at a cognitive cost. OWL ontologies are often unintuitive to readers lacking a strong logic background. In this work we describe GLEEN, a regular path expression library, which extends the RDF query language SparQL to support complex path expressions over OWL and other RDF-based ontologies. We illustrate the utility of GLEEN by showing how it can be used in a query-based approach to defining simpler, more intuitive views of OWL ontologies. In particular we show how relatively simple GLEEN-enhanced SparQL queries can create views of the OWL version of the NCI Thesaurus that match the views generated by the web-based NCI browser.

  8. Genetic algorithms for RDF chain query optimization

    NARCIS (Netherlands)

    Hogenboom, A.C.; Milea, D.V.; Frasincar, F.; Kaymak, U.; Calders, T.; Tuyls, K.; Pechenizkiy, M.

    2009-01-01

    The application of Semantic Web technologies in an Electronic Commerce environment implies a need for good support tools. Fast query engines are required for efficient real-time querying of large amounts of data, usually represented using RDF. We focus on optimizing a special class of SPARQL

  9. How Do Children Reformulate Their Search Queries?

    Science.gov (United States)

    Rutter, Sophie; Ford, Nigel; Clough, Paul

    2015-01-01

    Introduction: This paper investigates techniques used by children in year 4 (age eight to nine) of a UK primary school to reformulate their queries, and how they use information retrieval systems to support query reformulation. Method: An in-depth study analysing the interactions of twelve children carrying out search tasks in a primary school…

  10. Towards Verbalizing SPARQL Queries in Arabic

    Directory of Open Access Journals (Sweden)

    I. Al Agha

    2016-04-01

    Full Text Available With the wide spread of Open Linked Data and Semantic Web technologies, a larger amount of data has been published on the Web in the RDF and OWL formats. This data can be queried using SPARQL, the Semantic Web Query Language. SPARQL cannot be understood by ordinary users and is not directly accessible to humans, and thus they will not be able to check whether the retrieved answers truly correspond to the intended information need. Driven by this challenge, natural language generation from SPARQL data has recently attracted a considerable attention. However, most existing solutions to verbalize SPARQL in natural language focused on English and Latin-based languages. Little effort has been made on the Arabic language which has different characteristics and morphology. This work aims to particularly help Arab users to perceive SPARQL queries on the Semantic Web by translating SPARQL to Arabic. It proposes an approach that gets a SPARQL query as an input and generates a query expressed in Arabic as an output. The translation process combines both morpho-syntactic analysis and language dependencies to generate a legible and understandable Arabic query. The approach was preliminary assessed with a sample query set, and results indicated that 75% of the queries were correctly translated into Arabic.

  11. The Data Cyclotron query processing scheme

    NARCIS (Netherlands)

    Goncalves, R.; Kersten, M.

    2011-01-01

    A grand challenge of distributed query processing is to devise a self-organizing architecture which exploits all hardware resources optimally to manage the database hot set, minimize query response time, and maximize throughput without single point global coordination. The Data Cyclotron

  12. The Data Cyclotron query processing scheme.

    NARCIS (Netherlands)

    R.A. Goncalves (Romulo); M.L. Kersten (Martin)

    2011-01-01

    htmlabstractA grand challenge of distributed query processing is to devise a self-organizing architecture which exploits all hardware resources optimally to manage the database hot set, minimize query response time, and maximize throughput without single point global coordination. The Data Cyclotron

  13. Querying and Mining Strings Made Easy

    KAUST Repository

    Sahli, Majed

    2017-10-13

    With the advent of large string datasets in several scientific and business applications, there is a growing need to perform ad-hoc analysis on strings. Currently, strings are stored, managed, and queried using procedural codes. This limits users to certain operations supported by existing procedural applications and requires manual query planning with limited tuning opportunities. This paper presents StarQL, a generic and declarative query language for strings. StarQL is based on a native string data model that allows StarQL to support a large variety of string operations and provide semantic-based query optimization. String analytic queries are too intricate to be solved on one machine. Therefore, we propose a scalable and efficient data structure that allows StarQL implementations to handle large sets of strings and utilize large computing infrastructures. Our evaluation shows that StarQL is able to express workloads of application-specific tools, such as BLAST and KAT in bioinformatics, and to mine Wikipedia text for interesting patterns using declarative queries. Furthermore, the StarQL query optimizer shows an order of magnitude reduction in query execution time.

  14. Exploiting External Collections for Query Expansion

    NARCIS (Netherlands)

    Weerkamp, W.; Balog, K.; de Rijke, M.

    2012-01-01

    A persisting challenge in the field of information retrieval is the vocabulary mismatch between a user’s information need and the relevant documents. One way of addressing this issue is to apply query modeling: to add terms to the original query and reweigh the terms. In social media, where

  15. Improving Web Search for Difficult Queries

    Science.gov (United States)

    Wang, Xuanhui

    2009-01-01

    Search engines have now become essential tools in all aspects of our life. Although a variety of information needs can be served very successfully, there are still a lot of queries that search engines can not answer very effectively and these queries always make users feel frustrated. Since it is quite often that users encounter such "difficult…

  16. A semantic perspective on query log analysis

    NARCIS (Netherlands)

    Hofmann, K.; de Rijke, M.; Huurnink, B.; Meij, E.

    2009-01-01

    We present our views on the CLEF log file analysis task. We argue for a task definition that focuses on the semantic enrichment of query logs. In addition, we discuss how additional information about the context in which queries are being made could further our understanding of users’ information

  17. A Multi-Query Optimizer for Monet

    NARCIS (Netherlands)

    S. Manegold (Stefan); A.J. Pellenkoft (Jan); M.L. Kersten (Martin)

    2000-01-01

    textabstractDatabase systems allow for concurrent use of several applications (and query interfaces). Each application generates an ``optimal'' plan---a sequence of low-level database operators---for accessing the database. The queries posed by users through the same application can be optimized

  18. A multi-query optimizer for Monet

    NARCIS (Netherlands)

    S. Manegold (Stefan); A.J. Pellenkoft (Jan); M.L. Kersten (Martin)

    2000-01-01

    textabstractDatabase systems allow for concurrent use of several applications (and query interfaces). Each application generates an ``optimal'' plan---a sequence of low-level database operators---for accessing the database. The queries posed by users through the same application can be optimized

  19. Path Minima Queries in Dynamic Weighted Trees

    DEFF Research Database (Denmark)

    Davoodi, Pooya; Brodal, Gerth Stølting; Satti, Srinivasa Rao

    2011-01-01

    In the path minima problem on a tree, each edge is assigned a weight and a query asks for the edge with minimum weight on a path between two nodes. For the dynamic version of the problem, where the edge weights can be updated, we give data structures that achieve optimal query time\\todo{what about...

  20. Querying Business Process Models with VMQL

    DEFF Research Database (Denmark)

    Störrle, Harald; Acretoaie, Vlad

    2013-01-01

    The Visual Model Query Language (VMQL) has been invented with the objectives (1) to make it easier for modelers to query models effectively, and (2) to be universally applicable to all modeling languages. In previous work, we have applied VMQL to UML, and validated the first of these two claims. ...

  1. Efficient Approximate OLAP Querying Over Time Series

    DEFF Research Database (Denmark)

    Perera, Kasun Baruhupolage Don Kasun Sanjeewa; Hahmann, Martin; Lehner, Wolfgang

    2016-01-01

    The ongoing trend for data gathering not only produces larger volumes of data, but also increases the variety of recorded data types. Out of these, especially time series, e.g. various sensor readings, have attracted attention in the domains of business intelligence and decision making. As OLAP...... queries play a major role in these domains, it is desirable to also execute them on time series data. While this is not a problem on the conceptual level, it can become a bottleneck with regards to query run-time. In general, processing OLAP queries gets more computationally intensive as the volume...... of data grows. This is a particular problem when querying time series data, which generally contains multiple measures recorded at fine time granularities. Usually, this issue is addressed either by scaling up hardware or by employing workload based query optimization techniques. However, these solutions...

  2. Ontobee: A linked ontology data server to support ontology term dereferencing, linkage, query and integration

    Science.gov (United States)

    Ong, Edison; Xiang, Zuoshuang; Zhao, Bin; Liu, Yue; Lin, Yu; Zheng, Jie; Mungall, Chris; Courtot, Mélanie; Ruttenberg, Alan; He, Yongqun

    2017-01-01

    Linked Data (LD) aims to achieve interconnected data by representing entities using Unified Resource Identifiers (URIs), and sharing information using Resource Description Frameworks (RDFs) and HTTP. Ontologies, which logically represent entities and relations in specific domains, are the basis of LD. Ontobee (http://www.ontobee.org/) is a linked ontology data server that stores ontology information using RDF triple store technology and supports query, visualization and linkage of ontology terms. Ontobee is also the default linked data server for publishing and browsing biomedical ontologies in the Open Biological Ontology (OBO) Foundry (http://obofoundry.org) library. Ontobee currently hosts more than 180 ontologies (including 131 OBO Foundry Library ontologies) with over four million terms. Ontobee provides a user-friendly web interface for querying and visualizing the details and hierarchy of a specific ontology term. Using the eXtensible Stylesheet Language Transformation (XSLT) technology, Ontobee is able to dereference a single ontology term URI, and then output RDF/eXtensible Markup Language (XML) for computer processing or display the HTML information on a web browser for human users. Statistics and detailed information are generated and displayed for each ontology listed in Ontobee. In addition, a SPARQL web interface is provided for custom advanced SPARQL queries of one or multiple ontologies. PMID:27733503

  3. Ontobee: A linked ontology data server to support ontology term dereferencing, linkage, query and integration.

    Science.gov (United States)

    Ong, Edison; Xiang, Zuoshuang; Zhao, Bin; Liu, Yue; Lin, Yu; Zheng, Jie; Mungall, Chris; Courtot, Mélanie; Ruttenberg, Alan; He, Yongqun

    2017-01-04

    Linked Data (LD) aims to achieve interconnected data by representing entities using Unified Resource Identifiers (URIs), and sharing information using Resource Description Frameworks (RDFs) and HTTP. Ontologies, which logically represent entities and relations in specific domains, are the basis of LD. Ontobee (http://www.ontobee.org/) is a linked ontology data server that stores ontology information using RDF triple store technology and supports query, visualization and linkage of ontology terms. Ontobee is also the default linked data server for publishing and browsing biomedical ontologies in the Open Biological Ontology (OBO) Foundry (http://obofoundry.org) library. Ontobee currently hosts more than 180 ontologies (including 131 OBO Foundry Library ontologies) with over four million terms. Ontobee provides a user-friendly web interface for querying and visualizing the details and hierarchy of a specific ontology term. Using the eXtensible Stylesheet Language Transformation (XSLT) technology, Ontobee is able to dereference a single ontology term URI, and then output RDF/eXtensible Markup Language (XML) for computer processing or display the HTML information on a web browser for human users. Statistics and detailed information are generated and displayed for each ontology listed in Ontobee. In addition, a SPARQL web interface is provided for custom advanced SPARQL queries of one or multiple ontologies. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Query Optimizations over Decentralized RDF Graphs

    KAUST Repository

    Abdelaziz, Ibrahim

    2017-05-18

    Applications in life sciences, decentralized social networks, Internet of Things, and statistical linked dataspaces integrate data from multiple decentralized RDF graphs via SPARQL queries. Several approaches have been proposed to optimize query processing over a small number of heterogeneous data sources by utilizing schema information. In the case of schema similarity and interlinks among sources, these approaches cause unnecessary data retrieval and communication, leading to poor scalability and response time. This paper addresses these limitations and presents Lusail, a system for scalable and efficient SPARQL query processing over decentralized graphs. Lusail achieves scalability and low query response time through various optimizations at compile and run times. At compile time, we use a novel locality-aware query decomposition technique that maximizes the number of query triple patterns sent together to a source based on the actual location of the instances satisfying these triple patterns. At run time, we use selectivity-awareness and parallel query execution to reduce network latency and to increase parallelism by delaying the execution of subqueries expected to return large results. We evaluate Lusail using real and synthetic benchmarks, with data sizes up to billions of triples on an in-house cluster and a public cloud. We show that Lusail outperforms state-of-the-art systems by orders of magnitude in terms of scalability and response time.

  5. An XML-Enabled Data Mining Query Language XML-DMQL

    NARCIS (Netherlands)

    Feng, L.; Dillon, T.

    2005-01-01

    Inspired by the good work of Han et al. (1996) and Elfeky et al. (2001) on the design of data mining query languages for relational and object-oriented databases, in this paper, we develop an expressive XML-enabled data mining query language by extension of XQuery. We first describe some

  6. Federated query processing for the semantic web

    CERN Document Server

    Buil-Aranda, C

    2014-01-01

    During the last years, the amount of RDF data has increased exponentially over the Web, exposed via SPARQL endpoints. These SPARQL endpoints allow users to direct SPARQL queries to the RDF data. Federated SPARQL query processing allows to query several of these RDF databases as if they were a single one, integrating the results from all of them. This is a key concept in the Web of Data and it is also a hot topic in the community. Besides of that, the W3C SPARQL-WG has standardized it in the new Recommendation SPARQL 1.1.This book provides a formalisation of the W3C proposed recommendation. Thi

  7. Experimental quantum private queries with linear optics

    International Nuclear Information System (INIS)

    De Martini, Francesco; Giovannetti, Vittorio; Lloyd, Seth; Maccone, Lorenzo; Nagali, Eleonora; Sansoni, Linda; Sciarrino, Fabio

    2009-01-01

    The quantum private query is a quantum cryptographic protocol to recover information from a database, preserving both user and data privacy: the user can test whether someone has retained information on which query was asked and the database provider can test the amount of information released. Here we discuss a variant of the quantum private query algorithm that admits a simple linear optical implementation: it employs the photon's momentum (or time slot) as address qubits and its polarization as bus qubit. A proof-of-principle experimental realization is implemented.

  8. Instant MDX queries for SQL Server 2012

    CERN Document Server

    Emond, Nicholas

    2013-01-01

    Get to grips with a new technology, understand what it is and what it can do for you, and then get to work with the most important features and tasks. This short, focused guide is a great way to get stated with writing MDX queries. New developers can use this book as a reference for how to use functions and the syntax of a query as well as how to use Calculated Members and Named Sets.This book is great for new developers who want to learn the MDX query language from scratch and install SQL Server 2012 with Analysis Services

  9. Responsive web design with jQuery

    CERN Document Server

    Carlos, Gilberto

    2013-01-01

    Responsive Web Design with jQuery follows a standard tutorial-based approach, covering various aspects of responsive web design by building a comprehensive website.""Responsive Web Design with jQuery"" is aimed at web designers who are interested in building device-agnostic websites. You should have a grasp of standard HTML, CSS, and JavaScript development, and have a familiarity with graphic design. Some exposure to jQuery and HTML5 will be beneficial but isn't essential.

  10. Drug-related problems identified in medication reviews by Australian pharmacists

    DEFF Research Database (Denmark)

    Stafford, Andrew C; Tenni, Peter C; Peterson, Gregory M

    2009-01-01

    OBJECTIVE: In Australia, accredited pharmacists perform medication reviews for patients to identify and resolve drug-related problems. We analysed the drug-related problems identified in reviews for both home-dwelling and residential care-facility patients. The objective of this study was to exam......OBJECTIVE: In Australia, accredited pharmacists perform medication reviews for patients to identify and resolve drug-related problems. We analysed the drug-related problems identified in reviews for both home-dwelling and residential care-facility patients. The objective of this study....... These reviews had been self-selected by pharmacists and submitted as part of the reaccreditation process to the primary body responsible for accrediting Australian pharmacists to perform medication reviews. The drug-related problems identified in each review were classified by type and drugs involved. MAIN...... OUTCOME MEASURE: The number and nature of drug-related problems identified in pharmacist-conducted medication reviews. RESULTS: There were 1,038 drug-related problems identified in 234 medication reviews (mean 4.6 (+/-2.2) problems per review). The number of problems was higher (4.9 +/- 2.0 vs. 3.9 +/- 2...

  11. Vectorization vs. compilation in query execution

    NARCIS (Netherlands)

    J. Sompolski (Juliusz); M. Zukowski (Marcin); P.A. Boncz (Peter)

    2011-01-01

    textabstractCompiling database queries into executable (sub-) programs provides substantial benefits comparing to traditional interpreted execution. Many of these benefits, such as reduced interpretation overhead, better instruction code locality, and providing opportunities to use SIMD

  12. Pro PHP and jQuery

    CERN Document Server

    Lengstorf, Jason

    2010-01-01

    This book is for intermediate programmers interested in building AJAX web applications using jQuery and PHP. Along with teaching some advanced PHP techniques, it will teach you how to take your dynamic applications to the next level by adding a JavaScript layer with jQuery. * Learn to utilize built-in PHP functions to build calendar tools.* Learn how jQuery can be used for AJAX, animation, client-side validation, and more.What you'll learn* Use PHP to build a calendar application that allows users to post, view, edit, and delete events.* Use jQuery to allow the calendar app to be viewed and ed

  13. Schedule Sales Query Report Generation System

    Data.gov (United States)

    General Services Administration — Schedule Sales Query presents sales volume figures as reported to GSA by contractors. The reports are generated as quarterly reports for the current year and the...

  14. A Query System Implementation Case Study.

    Science.gov (United States)

    Hiser, Judith N.; Neil, M. Elizabeth

    1985-01-01

    The Department of Administrative Programming Services of Clemson University investigated products available in user-friendly retrieval systems. The test of INTELLECT, a natural language query system written by Artifical Intelligence Corporation, is described. (Author/MLW)

  15. Path-based Queries on Trajectory Data

    DEFF Research Database (Denmark)

    Krogh, Benjamin Bjerre; Pelekis, Nikos; Theodoridis, Yannis

    2014-01-01

    In traffic research, management, and planning a number of path-based analyses are heavily used, e.g., for computing turn-times, evaluating green waves, or studying traffic flow. These analyses require retrieving the trajectories that follow the full path being analyzed. Existing path queries cannot...... sufficiently support such path-based analyses because they retrieve all trajectories that touch any edge in the path. In this paper, we define and formalize the strict path query. This is a novel query type tailored to support path-based analysis, where trajectories must follow all edges in the path...... a specific path by only retrieving data from the first and last edge in the path. To correctly answer strict path queries existing network-constrained trajectory indexes must retrieve data from all edges in the path. An extensive performance study of NETTRA using a very large real-world trajectory data set...

  16. Menangkal Serangan SQL Injection Dengan Parameterized Query

    Directory of Open Access Journals (Sweden)

    Yulianingsih Yulianingsih

    2016-06-01

    Full Text Available Semakin meningkat pertumbuhan layanan informasi maka semakin tinggi pula tingkat kerentanan keamanan dari suatu sumber informasi. Melalui tulisan ini disajikan penelitian yang dilakukan secara eksperimen yang membahas tentang kejahatan penyerangan database secara SQL Injection. Penyerangan dilakukan melalui halaman autentikasi dikarenakan halaman ini merupakan pintu pertama akses yang seharusnya memiliki pertahanan yang cukup. Kemudian dilakukan eksperimen terhadap metode Parameterized Query untuk mendapatkan solusi terhadap permasalahan tersebut.   Kata kunci— Layanan Informasi, Serangan, eksperimen, SQL Injection, Parameterized Query.

  17. Evaluating SPARQL queries on massive RDF datasets

    KAUST Repository

    Al-Harbi, Razen

    2015-08-01

    Distributed RDF systems partition data across multiple computer nodes. Partitioning is typically based on heuristics that minimize inter-node communication and it is performed in an initial, data pre-processing phase. Therefore, the resulting partitions are static and do not adapt to changes in the query workload; as a result, existing systems are unable to consistently avoid communication for queries that are not favored by the initial data partitioning. Furthermore, for very large RDF knowledge bases, the partitioning phase becomes prohibitively expensive, leading to high startup costs. In this paper, we propose AdHash, a distributed RDF system which addresses the shortcomings of previous work. First, AdHash initially applies lightweight hash partitioning, which drastically minimizes the startup cost, while favoring the parallel processing of join patterns on subjects, without any data communication. Using a locality-aware planner, queries that cannot be processed in parallel are evaluated with minimal communication. Second, AdHash monitors the data access patterns and adapts dynamically to the query load by incrementally redistributing and replicating frequently accessed data. As a result, the communication cost for future queries is drastically reduced or even eliminated. Our experiments with synthetic and real data verify that AdHash (i) starts faster than all existing systems, (ii) processes thousands of queries before other systems become online, and (iii) gracefully adapts to the query load, being able to evaluate queries on billion-scale RDF data in sub-seconds. In this demonstration, audience can use a graphical interface of AdHash to verify its performance superiority compared to state-of-the-art distributed RDF systems.

  18. Evaluating SPARQL queries on massive RDF datasets

    KAUST Repository

    Al-Harbi, Razen; Abdelaziz, Ibrahim; Kalnis, Panos; Mamoulis, Nikos

    2015-01-01

    In this paper, we propose AdHash, a distributed RDF system which addresses the shortcomings of previous work. First, AdHash initially applies lightweight hash partitioning, which drastically minimizes the startup cost, while favoring the parallel processing of join patterns on subjects, without any data communication. Using a locality-aware planner, queries that cannot be processed in parallel are evaluated with minimal communication. Second, AdHash monitors the data access patterns and adapts dynamically to the query load by incrementally redistributing and replicating frequently accessed data. As a result, the communication cost for future queries is drastically reduced or even eliminated. Our experiments with synthetic and real data verify that AdHash (i) starts faster than all existing systems, (ii) processes thousands of queries before other systems become online, and (iii) gracefully adapts to the query load, being able to evaluate queries on billion-scale RDF data in sub-seconds. In this demonstration, audience can use a graphical interface of AdHash to verify its performance superiority compared to state-of-the-art distributed RDF systems.

  19. A Query Cache Tool for Optimizing Repeatable and Parallel OLAP Queries

    Science.gov (United States)

    Santos, Ricardo Jorge; Bernardino, Jorge

    On-line analytical processing against data warehouse databases is a common form of getting decision making information for almost every business field. Decision support information oftenly concerns periodic values based on regular attributes, such as sales amounts, percentages, most transactioned items, etc. This means that many similar OLAP instructions are periodically repeated, and simultaneously, between the several decision makers. Our Query Cache Tool takes advantage of previously executed queries, storing their results and the current state of the data which was accessed. Future queries only need to execute against the new data, inserted since the queries were last executed, and join these results with the previous ones. This makes query execution much faster, because we only need to process the most recent data. Our tool also minimizes the execution time and resource consumption for similar queries simultaneously executed by different users, putting the most recent ones on hold until the first finish and returns the results for all of them. The stored query results are held until they are considered outdated, then automatically erased. We present an experimental evaluation of our tool using a data warehouse based on a real-world business dataset and use a set of typical decision support queries to discuss the results, showing a very high gain in query execution time.

  20. Application of Clustering Algorithm CLOPE to the Query Grouping Problem in the Field of Materialized View Maintenance

    Directory of Open Access Journals (Sweden)

    Kateryna Novokhatska

    2016-03-01

    Full Text Available In recent years, materialized views (MVs are widely used to enhance the database performance by storing pre-calculated results of resource-intensive queries in the physical memory. In order to identify which queries may be potentially materialized, database transaction log for a long period of time should be analyzed. The goal of analysis is to distinguish resource-intensive and frequently used queries collected from database log, and optimize these queries by implementation of MVs. In order to achieve greater efficiency of MVs, they were used not only for the optimization of single queries, but also for entire groups of queries that are similar in syntax and execution results. Thus, the problem stated in this article is the development of approach that will allow forming groups of queries with similar syntax around the most resource-intensive queries in order to identify the list of potential candidates for materialization. For solving this problem, we have applied the algorithm of categorical data clustering to the query grouping problem on the step of database log analysis and searching candidates for materialization. In the current work CLOPE algorithm was modified to cover the introduced problem. Statistical and timing indicators were taken into account in order to form the clusters around the most resource intensive queries. Application of modified algorithm CLOPE allowed to decrease calculable complexity of clustering and to enhance the quality of formed groups.

  1. Identifying heat-related deaths by using medical examiner and vital statistics data: Surveillance analysis and descriptive epidemiology - Oklahoma, 1990-2011.

    Science.gov (United States)

    Johnson, Matthew G; Brown, Sheryll; Archer, Pam; Wendelboe, Aaron; Magzamen, Sheryl; Bradley, Kristy K

    2016-10-01

    Approximately 660 deaths occur annually in the United States associated with excess natural heat. A record heat wave in Oklahoma during 2011 generated increased interest concerning heat-related mortality among public health preparedness partners. We aimed to improve surveillance for heat-related mortality and better characterize heat-related deaths in Oklahoma during 1990-2011, and to enhance public health messaging during future heat emergencies. Heat-related deaths were identified by querying vital statistics (VS) and medical examiner (ME) data during 1990-2011. Case inclusion criteria were developed by using heat-related International Classification of Diseases codes, cause-of-death nomenclature, and ME investigation narrative. We calculated sensitivity and predictive value positive (PVP) for heat-related mortality surveillance by using VS and ME data and performed a descriptive analysis. During the study period, 364 confirmed and probable heat-related deaths were identified when utilizing both data sets. ME reports had 87% sensitivity and 74% PVP; VS reports had 80% sensitivity and 52% PVP. Compared to Oklahoma's general population, decedents were disproportionately male (67% vs. 49%), aged ≥65 years (46% vs. 14%), and unmarried (78% vs. 47%). Higher rates of heat-related mortality were observed among Blacks. Of 95 decedents with available information, 91 (96%) did not use air conditioning. Linking ME and VS data sources together and using narrative description for case classification allows for improved case ascertainment and surveillance data quality. Males, Blacks, persons aged ≥65 years, unmarried persons, and those without air conditioning carry a disproportionate burden of the heat-related deaths in Oklahoma. Published by Elsevier Inc.

  2. Spatiotemporal conceptual platform for querying archaeological information systems

    Science.gov (United States)

    Partsinevelos, Panagiotis; Sartzetaki, Mary; Sarris, Apostolos

    2015-04-01

    Spatial and temporal distribution of archaeological sites has been shown to associate with several attributes including marine, water, mineral and food resources, climate conditions, geomorphological features, etc. In this study, archeological settlement attributes are evaluated under various associations in order to provide a specialized query platform in a geographic information system (GIS). Towards this end, a spatial database is designed to include a series of archaeological findings for a secluded geographic area of Crete in Greece. The key categories of the geodatabase include the archaeological type (palace, burial site, village, etc.), temporal information of the habitation/usage period (pre Minoan, Minoan, Byzantine, etc.), and the extracted geographical attributes of the sites (distance to sea, altitude, resources, etc.). Most of the related spatial attributes are extracted with readily available GIS tools. Additionally, a series of conceptual data attributes are estimated, including: Temporal relation of an era to a future one in terms of alteration of the archaeological type, topologic relations of various types and attributes, spatial proximity relations between various types. These complex spatiotemporal relational measures reveal new attributes towards better understanding of site selection for prehistoric and/or historic cultures, yet their potential combinations can become numerous. Therefore, after the quantification of the above mentioned attributes, they are classified as of their importance for archaeological site location modeling. Under this new classification scheme, the user may select a geographic area of interest and extract only the important attributes for a specific archaeological type. These extracted attributes may then be queried against the entire spatial database and provide a location map of possible new archaeological sites. This novel type of querying is robust since the user does not have to type a standard SQL query but

  3. Parallel main-memory indexing for moving-object query and update workloads

    DEFF Research Database (Denmark)

    Sidlauskas, Darius; Saltenis, Simonas; Jensen, Christian Søndergaard

    2012-01-01

    of supporting the location-related query and update workloads generated by very large populations of such moving objects. This paper presents a main-memory indexing technique that aims to support such workloads. The technique, called PGrid, uses a grid structure that is capable of exploiting the parallelism...... offered by modern processors. Unlike earlier proposals that maintain separate structures for updates and queries, PGrid allows both long-running queries and rapid updates to operate on a single data structure and thus offers up-to-date query results. Because PGrid does not rely on creating snapshots...... on the same current data-store state, PGrid outperforms snapshot-based techniques in terms of both query freshness and CPU cycle-wise efficiency....

  4. References and arrow notation instead of join operation in query languages

    Directory of Open Access Journals (Sweden)

    Alexandr Savinov

    2012-10-01

    Full Text Available We study properties of the join operation in query languages and describe some of its major drawbacks. We provide strong arguments against using joins as a main construct for retrieving related data elements in general purpose query languages and argue for using references instead. Since conventional references are quite restrictive when applied to data modeling and query languages, we propose to use generalized references as they are defined in the concept-oriented model (COM. These references are used by two new operations, called projection and de-projection, which are denoted by right and left arrows and therefore this access method is referred to as arrow notation. We demonstrate advantages of the arrow notation in comparison to joins and argue that it makes queries simpler, more natural, easier to understand, and the whole query writing process more productive and less error-prone.

  5. Using Active Learning to Identify Health Information Technology Related Patient Safety Events.

    Science.gov (United States)

    Fong, Allan; Howe, Jessica L; Adams, Katharine T; Ratwani, Raj M

    2017-01-18

    The widespread adoption of health information technology (HIT) has led to new patient safety hazards that are often difficult to identify. Patient safety event reports, which are self-reported descriptions of safety hazards, provide one view of potential HIT-related safety events. However, identifying HIT-related reports can be challenging as they are often categorized under other more predominate clinical categories. This challenge of identifying HIT-related reports is exacerbated by the increasing number and complexity of reports which pose challenges to human annotators that must manually review reports. In this paper, we apply active learning techniques to support classification of patient safety event reports as HIT-related. We evaluated different strategies and demonstrated a 30% increase in average precision of a confirmatory sampling strategy over a baseline no active learning approach after 10 learning iterations.

  6. NEOview: Near Earth Object Data Discovery and Query

    Science.gov (United States)

    Tibbetts, M.; Elvis, M.; Galache, J. L.; Harbo, P.; McDowell, J. C.; Rudenko, M.; Van Stone, D.; Zografou, P.

    2013-10-01

    Missions to Near Earth Objects (NEOs) figure prominently in NASA's Flexible Path approach to human space exploration. NEOs offer insight into both the origins of the Solar System and of life, as well as a source of materials for future missions. With NEOview scientists can locate NEO datasets, explore metadata provided by the archives, and query or combine disparate NEO datasets in the search for NEO candidates for exploration. NEOview is a software system that illustrates how standards-based interfaces facilitate NEO data discovery and research. NEOview software follows a client-server architecture. The server is a configurable implementation of the International Virtual Observatory Alliance (IVOA) Table Access Protocol (TAP), a general interface for tabular data access, that can be deployed as a front end to existing NEO datasets. The TAP client, seleste, is a graphical interface that provides intuitive means of discovering NEO providers, exploring dataset metadata to identify fields of interest, and constructing queries to retrieve or combine data. It features a powerful, graphical query builder capable of easing the user's introduction to table searches. Through science use cases, NEOview demonstrates how potential targets for NEO rendezvous could be identified by combining data from complementary sources. Through deployment and operations, it has been shown that the software components are data independent and configurable to many different data servers. As such, NEOview's TAP server and seleste TAP client can be used to create a seamless environment for data discovery and exploration for tabular data in any astronomical archive.

  7. DISPAQ: Distributed Profitable-Area Query from Big Taxi Trip Data.

    Science.gov (United States)

    Putri, Fadhilah Kurnia; Song, Giltae; Kwon, Joonho; Rao, Praveen

    2017-09-25

    One of the crucial problems for taxi drivers is to efficiently locate passengers in order to increase profits. The rapid advancement and ubiquitous penetration of Internet of Things (IoT) technology into transportation industries enables us to provide taxi drivers with locations that have more potential passengers (more profitable areas) by analyzing and querying taxi trip data. In this paper, we propose a query processing system, called Distributed Profitable-Area Query ( DISPAQ ) which efficiently identifies profitable areas by exploiting the Apache Software Foundation's Spark framework and a MongoDB database. DISPAQ first maintains a profitable-area query index (PQ-index) by extracting area summaries and route summaries from raw taxi trip data. It then identifies candidate profitable areas by searching the PQ-index during query processing. Then, it exploits a Z-Skyline algorithm, which is an extension of skyline processing with a Z-order space filling curve, to quickly refine the candidate profitable areas. To improve the performance of distributed query processing, we also propose local Z-Skyline optimization, which reduces the number of dominant tests by distributing killer profitable areas to each cluster node. Through extensive evaluation with real datasets, we demonstrate that our DISPAQ system provides a scalable and efficient solution for processing profitable-area queries from huge amounts of big taxi trip data.

  8. DISPAQ: Distributed Profitable-Area Query from Big Taxi Trip Data †

    Science.gov (United States)

    Putri, Fadhilah Kurnia; Song, Giltae; Rao, Praveen

    2017-01-01

    One of the crucial problems for taxi drivers is to efficiently locate passengers in order to increase profits. The rapid advancement and ubiquitous penetration of Internet of Things (IoT) technology into transportation industries enables us to provide taxi drivers with locations that have more potential passengers (more profitable areas) by analyzing and querying taxi trip data. In this paper, we propose a query processing system, called Distributed Profitable-Area Query (DISPAQ) which efficiently identifies profitable areas by exploiting the Apache Software Foundation’s Spark framework and a MongoDB database. DISPAQ first maintains a profitable-area query index (PQ-index) by extracting area summaries and route summaries from raw taxi trip data. It then identifies candidate profitable areas by searching the PQ-index during query processing. Then, it exploits a Z-Skyline algorithm, which is an extension of skyline processing with a Z-order space filling curve, to quickly refine the candidate profitable areas. To improve the performance of distributed query processing, we also propose local Z-Skyline optimization, which reduces the number of dominant tests by distributing killer profitable areas to each cluster node. Through extensive evaluation with real datasets, we demonstrate that our DISPAQ system provides a scalable and efficient solution for processing profitable-area queries from huge amounts of big taxi trip data. PMID:28946679

  9. Generic multiset programming for language-integrated querying

    DEFF Research Database (Denmark)

    Henglein, Fritz; Larsen, Ken Friis

    2010-01-01

    This paper demonstrates how relational algebraic programming based on efficient symbolic representations of multisets and operations on them can be applied to the query sublanguage of SQL in a type-safe fashion. In essence, it provides a library for naïve programming with multisets in a generalized...... SQL-style fashion, but avoids many cases of asymptotically inefficient nested iteration through cross-products....

  10. Using Search Engine Query Data to Explore the Epidemiology of Common Gastrointestinal Symptoms.

    Science.gov (United States)

    Hassid, Benjamin G; Day, Lukejohn W; Awad, Mohannad A; Sewell, Justin L; Osterberg, E Charles; Breyer, Benjamin N

    2017-03-01

    Internet searches are an increasingly used tool in medical research. To date, no studies have examined Google search data in relation to common gastrointestinal symptoms. The aim of this study was to compare trends in Internet search volume with clinical datasets for common gastrointestinal symptoms. Using Google Trends, we recorded relative changes in volume of searches related to dysphagia, vomiting, and diarrhea in the USA between January 2008 and January 2011. We queried the National Inpatient Sample (NIS) and the National Hospital Ambulatory Medical Care Survey (NHAMCS) during this time period and identified cases related to these symptoms. We assessed the correlation between Google Trends and these two clinical datasets, as well as examined seasonal variation trends. Changes to Google search volume for all three symptoms correlated significantly with changes to NIS output (dysphagia: r = 0.5, P = 0.002; diarrhea: r = 0.79, P search engine query volume over time. These data demonstrate that the prevalence of common GI symptoms is rising over time.

  11. jQuery Mobile Up and Running

    CERN Document Server

    Firtman, Maximiliano

    2012-01-01

    Would you like to build one mobile web application that works on iPad and Kindle Fire as well as iPhone and Android smartphones? This introductory guide to jQuery Mobile shows you how. Through a series of hands-on exercises, you'll learn the best ways to use this framework's many interface components to build customizable, multiplatform apps. You don't need any programming skills or previous experience with jQuery to get started. By the time you finish this book, you'll know how to create responsive, Ajax-based interfaces that work on a variety of smartphones and tablets, using jQuery Mobile

  12. jQuery for designers beginner's guide

    CERN Document Server

    MacLees, Natalie

    2014-01-01

    A step-by-step guide that spices up your web pages and designs them in the way you want using the most widely used JavaScript library, jQuery. The beginner-friendly and easy-to-understand approach of the book will help get to grips with jQuery in no time. If you know the fundamentals of HTML and CSS, and want to extend your knowledge by learning to use JavaScript, then this is just the book for you. jQuery makes JavaScript straightforward and approachable - you'll be surprised at how easy it can be to add animations and special effects to your beautifully designed pages.

  13. Implementation of Quantum Private Queries Using Nuclear Magnetic Resonance

    International Nuclear Information System (INIS)

    Wang Chuan; Hao Liang; Zhao Lian-Jie

    2011-01-01

    We present a modified protocol for the realization of a quantum private query process on a classical database. Using one-qubit query and CNOT operation, the query process can be realized in a two-mode database. In the query process, the data privacy is preserved as the sender would not reveal any information about the database besides her query information, and the database provider cannot retain any information about the query. We implement the quantum private query protocol in a nuclear magnetic resonance system. The density matrix of the memory registers are constructed. (general)

  14. Templates and Queries in Contextual Hypermedia

    DEFF Research Database (Denmark)

    Anderson, Kenneth Mark; Hansen, Frank Allan; Bouvin, Niels Olof

    2006-01-01

    discuss a framework, HyConSC, that implements this model and describe how it can be used to build new contextual hypermedia systems. Our framework aids the developer in the iterative development of contextual queries (via a dynamic query browser) and offers support for con-text matching, a key feature...... of contextual hypermedia. We have tested the framework with data and sensors taken from the HyCon contextual hypermedia system and are now migrating HyCon to this new framework....

  15. The node-weighted Steiner tree approach to identify elements of cancer-related signaling pathways.

    Science.gov (United States)

    Sun, Yahui; Ma, Chenkai; Halgamuge, Saman

    2017-12-28

    Cancer constitutes a momentous health burden in our society. Critical information on cancer may be hidden in its signaling pathways. However, even though a large amount of money has been spent on cancer research, some critical information on cancer-related signaling pathways still remains elusive. Hence, new works towards a complete understanding of cancer-related signaling pathways will greatly benefit the prevention, diagnosis, and treatment of cancer. We propose the node-weighted Steiner tree approach to identify important elements of cancer-related signaling pathways at the level of proteins. This new approach has advantages over previous approaches since it is fast in processing large protein-protein interaction networks. We apply this new approach to identify important elements of two well-known cancer-related signaling pathways: PI3K/Akt and MAPK. First, we generate a node-weighted protein-protein interaction network using protein and signaling pathway data. Second, we modify and use two preprocessing techniques and a state-of-the-art Steiner tree algorithm to identify a subnetwork in the generated network. Third, we propose two new metrics to select important elements from this subnetwork. On a commonly used personal computer, this new approach takes less than 2 s to identify the important elements of PI3K/Akt and MAPK signaling pathways in a large node-weighted protein-protein interaction network with 16,843 vertices and 1,736,922 edges. We further analyze and demonstrate the significance of these identified elements to cancer signal transduction by exploring previously reported experimental evidences. Our node-weighted Steiner tree approach is shown to be both fast and effective to identify important elements of cancer-related signaling pathways. Furthermore, it may provide new perspectives into the identification of signaling pathways for other human diseases.

  16. A Moving-Object Index for Efficient Query Processing with PeerWise Location Privacy

    DEFF Research Database (Denmark)

    Lin, Dan; Jensen, Christian S.; Zhang, Rui

    2011-01-01

    attention has been paid to enabling so-called peer-wise privacy—the protection of a user’s location from unauthorized peer users. This paper identifies an important efficiency problem in existing peer-privacy approaches that simply apply a filtering step to identify users that are located in a query range......, but that do not want to disclose their location to the querying peer. To solve this problem, we propose a novel, privacy-policy enabled index called the PEB-tree that seamlessly integrates location proximity and policy compatibility. We propose efficient algorithms that use the PEB-tree for processing privacy......-aware range and kNN queries. Extensive experiments suggest that the PEB-tree enables efficient query processing....

  17. Towards Automatic Improvement of Patient Queries in Health Retrieval Systems

    Directory of Open Access Journals (Sweden)

    Nesrine KSENTINI

    2016-07-01

    Full Text Available With the adoption of health information technology for clinical health, e-health is becoming usual practice today. Users of this technology find it difficult to seek information relevant to their needs due to the increasing amount of the clinical and medical data on the web, and the lack of knowledge of medical jargon. In this regards, a method is described to improve user's needs by automatically adding new related terms to their queries which appear in the same context of the original query in order to improve final search results. This method is based on the assessment of semantic relationships defined by a proposed statistical method between a set of terms or keywords. Experiments were performed on CLEF-eHealth-2015 database and the obtained results show the effectiveness of our proposed method.

  18. Integrating Non-Spatial Preferences into Spatial Location Queries

    DEFF Research Database (Denmark)

    Qu, Qiang; Liu, Siyuan; Yang, Bin

    2014-01-01

    Increasing volumes of geo-referenced data are becoming available. This data includes so-called points of interest that describe businesses, tourist attractions, etc. by means of a geo-location and properties such as a textual description or ratings. We propose and study the efficient implementation...... of a new kind of query on points of interest that takes into account both the locations and properties of the points of interest. The query takes a result cardinality, a spatial range, and property-related preferences as parameters, and it returns a compact set of points of interest with the given...... cardinality and in the given range that satisfies the preferences. Specifically, the points of interest in the result set cover so-called allying preferences and are located far from points of interest that possess so-called alienating preferences. A unified result rating function integrates the two kinds...

  19. Answering SPARQL queries modulo RDF Schema with paths

    OpenAIRE

    Alkhateeb, Faisal; Euzenat, Jérôme

    2013-01-01

    alkhateeb2013a; SPARQL is the standard query language for RDF graphs. In its strict instantiation, it only offers querying according to the RDF semantics and would thus ignore the semantics of data expressed with respect to (RDF) schemas or (OWL) ontologies. Several extensions to SPARQL have been proposed to query RDF data modulo RDFS, i.e., interpreting the query with RDFS semantics and/or considering external ontologies. We introduce a general framework which allows for expressing query ans...

  20. Prospective Mathematics Teachers' Ability to Identify Mistakes Related to Angle Concept of Sixth Grade Students

    Science.gov (United States)

    Arslan, Cigdem; Erbay, Hatice Nur; Guner, Pinar

    2017-01-01

    In the present study we try to highlight prospective mathematics teachers' ability to identify mistakes of sixth grade students related to angle concept. And also we examined prospective mathematics teachers' knowledge of angle concept. Study was carried out with 30 sixth-grade students and 38 prospective mathematics teachers. Sixth grade students…

  1. Meta-Analysis of Placental Transcriptome Data Identifies a Novel Molecular Pathway Related to Preeclampsia

    NARCIS (Netherlands)

    van Uitert, Miranda; Moerland, Perry D.; Enquobahrie, Daniel A.; Laivuori, Hannele; van der Post, Joris A. M.; Ris-Stalpers, Carrie; Afink, Gijs B.

    2015-01-01

    Studies using the placental transcriptome to identify key molecules relevant for preeclampsia are hampered by a relatively small sample size. In addition, they use a variety of bioinformatics and statistical methods, making comparison of findings challenging. To generate a more robust preeclampsia

  2. Sex-related hearing impairment in Wolfram syndrome patients identified by inactivating WFS1 mutations

    NARCIS (Netherlands)

    Pennings, RJE; Huygen, PLM; van den Ouweland, JMW; Cryns, K; Dikkeschei, LD; Van Camp, G; Cremers, CWRJ

    2004-01-01

    This study examined the audiovestibular profile of 11 Wolfram syndrome patients (4 males, 7 females) from 7 families, with identified WFS1 mutations, and the audiometric profile of 17 related heterozygous carriers of WFS1 mutations. Patients with Wolfram syndrome showed a downsloping audiogram and

  3. Sex-related hearing impairment in Wolfram syndrome patients identified by inactivating WFS1 mutations.

    NARCIS (Netherlands)

    Pennings, R.J.E.; Huygen, P.L.M.; Ouweland, J.M.W. van den; Cryns, K.; Dikkeschei, L.D.; Camp, G. van; Cremers, C.W.R.J.

    2004-01-01

    This study examined the audiovestibular profile of 11 Wolfram syndrome patients (4 males, 7 females) from 7 families, with identified WFS1 mutations, and the audiometric profile of 17 related heterozygous carriers of WFS1 mutations. Patients with Wolfram syndrome showed a downsloping audiogram and

  4. Federated queries of clinical data repositories: the sum of the parts does not equal the whole.

    Science.gov (United States)

    Weber, Griffin M

    2013-06-01

    In 2008 we developed a shared health research information network (SHRINE), which for the first time enabled research queries across the full patient populations of four Boston hospitals. It uses a federated architecture, where each hospital returns only the aggregate count of the number of patients who match a query. This allows hospitals to retain control over their local databases and comply with federal and state privacy laws. However, because patients may receive care from multiple hospitals, the result of a federated query might differ from what the result would be if the query were run against a single central repository. This paper describes the situations when this happens and presents a technique for correcting these errors. We use a one-time process of identifying which patients have data in multiple repositories by comparing one-way hash values of patient demographics. This enables us to partition the local databases such that all patients within a given partition have data at the same subset of hospitals. Federated queries are then run separately on each partition independently, and the combined results are presented to the user. Using theoretical bounds and simulated hospital networks, we demonstrate that once the partitions are made, SHRINE can produce more precise estimates of the number of patients matching a query. Uncertainty in the overlap of patient populations across hospitals limits the effectiveness of SHRINE and other federated query tools. Our technique reduces this uncertainty while retaining an aggregate federated architecture.

  5. Identifying and Analyzing Novel Epilepsy-Related Genes Using Random Walk with Restart Algorithm

    Directory of Open Access Journals (Sweden)

    Wei Guo

    2017-01-01

    Full Text Available As a pathological condition, epilepsy is caused by abnormal neuronal discharge in brain which will temporarily disrupt the cerebral functions. Epilepsy is a chronic disease which occurs in all ages and would seriously affect patients’ personal lives. Thus, it is highly required to develop effective medicines or instruments to treat the disease. Identifying epilepsy-related genes is essential in order to understand and treat the disease because the corresponding proteins encoded by the epilepsy-related genes are candidates of the potential drug targets. In this study, a pioneering computational workflow was proposed to predict novel epilepsy-related genes using the random walk with restart (RWR algorithm. As reported in the literature RWR algorithm often produces a number of false positive genes, and in this study a permutation test and functional association tests were implemented to filter the genes identified by RWR algorithm, which greatly reduce the number of suspected genes and result in only thirty-three novel epilepsy genes. Finally, these novel genes were analyzed based upon some recently published literatures. Our findings implicate that all novel genes were closely related to epilepsy. It is believed that the proposed workflow can also be applied to identify genes related to other diseases and deepen our understanding of the mechanisms of these diseases.

  6. Evolutionary Algorithms for Boolean Queries Optimization

    Czech Academy of Sciences Publication Activity Database

    Húsek, Dušan; Snášel, Václav; Neruda, Roman; Owais, S.S.J.; Krömer, P.

    2006-01-01

    Roč. 3, č. 1 (2006), s. 15-20 ISSN 1790-0832 R&D Projects: GA AV ČR 1ET100300414 Institutional research plan: CEZ:AV0Z10300504 Keywords : evolutionary algorithms * genetic algorithms * information retrieval * Boolean query Subject RIV: BA - General Mathematics

  7. Boolean Queries Optimization by Genetic Algorithms

    Czech Academy of Sciences Publication Activity Database

    Húsek, Dušan; Owais, S.S.J.; Krömer, P.; Snášel, Václav

    2005-01-01

    Roč. 15, - (2005), s. 395-409 ISSN 1210-0552 R&D Projects: GA AV ČR 1ET100300414 Institutional research plan: CEZ:AV0Z10300504 Keywords : evolutionary algorithms * genetic algorithms * genetic programming * information retrieval * Boolean query Subject RIV: BB - Applied Statistics, Operational Research

  8. External query expansion in the blogosphere

    NARCIS (Netherlands)

    Weerkamp, W.; de Rijke, M.; Voorhees, E.M.; Buckland, L.P.

    2009-01-01

    We describe the participation of the University of Amsterdam’s ILPS group in the blog track at TREC 2008. We mainly explored different ways of using external corpora to expand the original query. In the blog post retrieval task we did not succeed in improving over a simple baseline (equal weights

  9. C-SPARQL : SPARQL for continuous querying

    OpenAIRE

    Barbieri, Davide Francesco; Braga, Daniele; Ceri, Stefano; Valle, Emanuele Della; Grossniklaus, Michael

    2009-01-01

    C-SPARQL is an extension of SPARQL to support continuous queries, registered and continuously executed over RDF data streams, considering windows of such streams. Supporting streams in RDF format guarantees interoperability and opens up important applications, in which reasoners can deal with knowledge that evolves over time. We present C-SPARQL by means of examples in Urban Computing.

  10. The data cyclotron query processing scheme

    NARCIS (Netherlands)

    R.A. Goncalves (Romulo); M.L. Kersten (Martin)

    2010-01-01

    htmlabstractDistributed database systems exploit static workload characteristics to steer data fragmentation and data allocation schemes. However, the grand challenge of distributed query processing is to come up with a self-organizing architecture, which exploits all resources to manage the hot

  11. Enabling Incremental Query Re-Optimization.

    Science.gov (United States)

    Liu, Mengmeng; Ives, Zachary G; Loo, Boon Thau

    2016-01-01

    As declarative query processing techniques expand to the Web, data streams, network routers, and cloud platforms, there is an increasing need to re-plan execution in the presence of unanticipated performance changes. New runtime information may affect which query plan we prefer to run. Adaptive techniques require innovation both in terms of the algorithms used to estimate costs , and in terms of the search algorithm that finds the best plan. We investigate how to build a cost-based optimizer that recomputes the optimal plan incrementally given new cost information, much as a stream engine constantly updates its outputs given new data. Our implementation especially shows benefits for stream processing workloads. It lays the foundations upon which a variety of novel adaptive optimization algorithms can be built. We start by leveraging the recently proposed approach of formulating query plan enumeration as a set of recursive datalog queries ; we develop a variety of novel optimization approaches to ensure effective pruning in both static and incremental cases. We further show that the lessons learned in the declarative implementation can be equally applied to more traditional optimizer implementations.

  12. XAL: An algebra for XML query optimization

    NARCIS (Netherlands)

    Frasincar, F.; Houben, G.J.P.M.; Pau, C.D.; Zhou, Xiaofang

    2002-01-01

    This paper proposes XAL, an XML ALgebra. Its novelty is based on the simplicity of its data model and its well-defined logical operators, which makes it suitable for composability, optimizability, and semantics definition of a query language for XML data. At the heart of the algebra resides the

  13. Combining the power of searching and querying

    NARCIS (Netherlands)

    Cohen, S.; Kanza, Y.; Kogan, Y.A.; Nutt, W.; Sagiv, Y.; Serebrenik, A.; Etzion, O.; Scheuermann, P.

    2000-01-01

    EquiX is a search language for XML that combines the power of querying with the simplicity of searching. Requirements for search languages are discussed and it is shown that EquiX meets the necessary criteria. Both a graphical abstract syntax and a formal concrete syntax are presented for EquiX

  14. Beginning SQL queries from novice to professional

    CERN Document Server

    Churcher, Clare

    2016-01-01

    Anyone who does any work at all with databases needs to know something of SQL. This is a friendly and easy-to-read guide to writing queries with the all-important - in the database world - SQL language. The author writes with exceptional clarity.

  15. Web-Based Distributed XML Query Processing

    NARCIS (Netherlands)

    Smiljanic, M.; Feng, L.; Jonker, Willem; Blanken, Henk; Grabs, T.; Schek, H-J.; Schenkel, R.; Weikum, G.

    2003-01-01

    Web-based distributed XML query processing has gained in importance in recent years due to the widespread popularity of XML on the Web. Unlike centralized and tightly coupled distributed systems, Web-based distributed database systems are highly unpredictable and uncontrollable, with a rather

  16. Flattening Queries over Nested Data Types

    NARCIS (Netherlands)

    van Ruth, J.

    2006-01-01

    The theory developed in this thesis provides a method to improve the efficiency of querying nested data. The roots of this research lie in the tension between data model expressiveness and performance. Obviously, more expressive data models are more convenient for application programmers. For many

  17. Sonata: Query-Driven Network Telemetry

    KAUST Repository

    Gupta, Arpit; Harrison, Rob; Pawar, Ankita; Birkner, Rü diger; Canini, Marco; Feamster, Nick; Rexford, Jennifer; Willinger, Walter

    2017-01-01

    Operating networks depends on collecting and analyzing measurement data. Current technologies do not make it easy to do so, typically because they separate data collection (e.g., packet capture or flow monitoring) from analysis, producing either too much data to answer a general question or too little data to answer a detailed question. In this paper, we present Sonata, a network telemetry system that uses a uniform query interface to drive the joint collection and analysis of network traffic. Sonata takes the advantage of two emerging technologies---streaming analytics platforms and programmable network devices---to facilitate joint collection and analysis. Sonata allows operators to more directly express network traffic analysis tasks in terms of a high-level language. The underlying runtime partitions each query into a portion that runs on the switch and another that runs on the streaming analytics platform iteratively refines the query to efficiently capture only the traffic that pertains to the operator's query, and exploits sketches to reduce state in switches in exchange for more approximate results. Through an evaluation of a prototype implementation, we demonstrate that Sonata can support a wide range of network telemetry tasks with less state in the network, and lower data rates to streaming analytics systems, than current approaches can achieve.

  18. Sonata: Query-Driven Network Telemetry

    KAUST Repository

    Gupta, Arpit

    2017-05-02

    Operating networks depends on collecting and analyzing measurement data. Current technologies do not make it easy to do so, typically because they separate data collection (e.g., packet capture or flow monitoring) from analysis, producing either too much data to answer a general question or too little data to answer a detailed question. In this paper, we present Sonata, a network telemetry system that uses a uniform query interface to drive the joint collection and analysis of network traffic. Sonata takes the advantage of two emerging technologies---streaming analytics platforms and programmable network devices---to facilitate joint collection and analysis. Sonata allows operators to more directly express network traffic analysis tasks in terms of a high-level language. The underlying runtime partitions each query into a portion that runs on the switch and another that runs on the streaming analytics platform iteratively refines the query to efficiently capture only the traffic that pertains to the operator\\'s query, and exploits sketches to reduce state in switches in exchange for more approximate results. Through an evaluation of a prototype implementation, we demonstrate that Sonata can support a wide range of network telemetry tasks with less state in the network, and lower data rates to streaming analytics systems, than current approaches can achieve.

  19. TreeQ-VISTA: An Interactive Tree Visualization Tool withFunctional Annotation Query Capabilities

    Energy Technology Data Exchange (ETDEWEB)

    Gu, Shengyin; Anderson, Iain; Kunin, Victor; Cipriano, Michael; Minovitsky, Simon; Weber, Gunther; Amenta, Nina; Hamann, Bernd; Dubchak,Inna

    2007-05-07

    Summary: We describe a general multiplatform exploratorytool called TreeQ-Vista, designed for presenting functional annotationsin a phylogenetic context. Traits, such as phenotypic and genomicproperties, are interactively queried from a relational database with auser-friendly interface which provides a set of tools for users with orwithout SQL knowledge. The query results are projected onto aphylogenetic tree and can be displayed in multiple color groups. A richset of browsing, grouping and query tools are provided to facilitatetrait exploration, comparison and analysis.Availability: The program,detailed tutorial and examples are available online athttp://genome-test.lbl.gov/vista/TreeQVista.

  20. Sharing-Aware Horizontal Partitioning for Exploiting Correlations during Query Processing

    DEFF Research Database (Denmark)

    Tzoumas, Kostas; Deshpande, Amol; Jensen, Christian Søndergaard

    2010-01-01

    Optimization of join queries based on average selectivities is suboptimal in highly correlated databases. In such databases, relations are naturally divided into partitions, each partition having substantially different statistical characteristics. It is very compelling to discover such data...... partitions during query optimization and create multiple plans for a given query, one plan being optimal for a particular combination of data partitions. This scenario calls for the sharing of state among plans, so that common intermediate results are not recomputed. We study this problem in a setting...

  1. PERANGKAT BANTU UNTUK OPTIMASI QUERY PADA ORACLE DENGAN RESTRUKTURISASI SQL

    Directory of Open Access Journals (Sweden)

    Darlis Heru Murti

    2006-07-01

    Full Text Available Query merupakan bagian dari bahasa pemrograman SQL (Structured Query Language yang berfungsi untuk mengambil data (read dalam DBMS (Database Management System, termasuk Oracle [3]. Pada Oracle, ada tiga tahap proses yang dilakukan dalam pengeksekusian query, yaitu Parsing, Execute dan Fetch. Sebelum proses execute dijalankan, Oracle terlebih dahulu membuat execution plan yang akan menjadi skenario dalam proses excute.Dalam proses pengeksekusian query, terdapat faktor-faktor yang mempengaruhi kinerja query, di antaranya access path (cara pengambilan data dari sebuah tabel dan operasi join (cara menggabungkan data dari dua tabel. Untuk mendapatkan query dengan kinerja optimal, maka diperlukan pertimbangan-pertimbangan dalam menyikapi faktor-faktor tersebut.  Optimasi query merupakan suatu cara untuk mendapatkan query dengan kinerja seoptimal mungkin, terutama dilihat dari sudut pandang waktu. Ada banyak metode untuk mengoptimasi query, tapi pada Penelitian ini, penulis membuat sebuah aplikasi untuk mengoptimasi query dengan metode restrukturisasi SQL statement. Pada metode ini, objek yang dianalisa adalah struktur klausa yang membangun sebuah query. Aplikasi ini memiliki satu input dan lima jenis output. Input dari aplikasi ini adalah sebuah query sedangkan kelima jenis output aplikasi ini adalah berupa query hasil optimasi, saran perbaikan, saran pembuatan indeks baru, execution plan dan data statistik. Cara kerja aplikasi ini dibagi menjadi empat tahap yaitu mengurai query menjadi sub query, mengurai query per-klausa, menentukan access path dan operasi join dan restrukturisasi query.Dari serangkaian ujicoba yang dilakukan penulis, aplikasi telah dapat berjalan sesuai dengan tujuan pembuatan Penelitian ini, yaitu mendapatkan query dengan kinerja optimal.Kata Kunci : Query, SQL, DBMS, Oracle, Parsing, Execute, Fetch, Execution Plan, Access Path, Operasi Join, Restrukturisasi SQL statement.

  2. Query-by-example surgical activity detection.

    Science.gov (United States)

    Gao, Yixin; Vedula, S Swaroop; Lee, Gyusung I; Lee, Mija R; Khudanpur, Sanjeev; Hager, Gregory D

    2016-06-01

    Easy acquisition of surgical data opens many opportunities to automate skill evaluation and teaching. Current technology to search tool motion data for surgical activity segments of interest is limited by the need for manual pre-processing, which can be prohibitive at scale. We developed a content-based information retrieval method, query-by-example (QBE), to automatically detect activity segments within surgical data recordings of long duration that match a query. The example segment of interest (query) and the surgical data recording (target trial) are time series of kinematics. Our approach includes an unsupervised feature learning module using a stacked denoising autoencoder (SDAE), two scoring modules based on asymmetric subsequence dynamic time warping (AS-DTW) and template matching, respectively, and a detection module. A distance matrix of the query against the trial is computed using the SDAE features, followed by AS-DTW combined with template scoring, to generate a ranked list of candidate subsequences (substrings). To evaluate the quality of the ranked list against the ground-truth, thresholding conventional DTW distances and bipartite matching are applied. We computed the recall, precision, F1-score, and a Jaccard index-based score on three experimental setups. We evaluated our QBE method using a suture throw maneuver as the query, on two tool motion datasets (JIGSAWS and MISTIC-SL) captured in a training laboratory. We observed a recall of 93, 90 and 87 % and a precision of 93, 91, and 88 % with same surgeon same trial (SSST), same surgeon different trial (SSDT) and different surgeon (DS) experiment setups on JIGSAWS, and a recall of 87, 81 and 75 % and a precision of 72, 61, and 53 % with SSST, SSDT and DS experiment setups on MISTIC-SL, respectively. We developed a novel, content-based information retrieval method to automatically detect multiple instances of an activity within long surgical recordings. Our method demonstrated adequate recall

  3. Identifying content-based and relational techniques to change behaviour in motivational interviewing.

    Science.gov (United States)

    Hardcastle, Sarah J; Fortier, Michelle; Blake, Nicola; Hagger, Martin S

    2017-03-01

    Motivational interviewing (MI) is a complex intervention comprising multiple techniques aimed at changing health-related motivation and behaviour. However, MI techniques have not been systematically isolated and classified. This study aimed to identify the techniques unique to MI, classify them as content-related or relational, and evaluate the extent to which they overlap with techniques from the behaviour change technique taxonomy version 1 [BCTTv1; Michie, S., Richardson, M., Johnston, M., Abraham, C., Francis, J., Hardeman, W., … Wood, C. E. (2013). The behavior change technique taxonomy (v1) of 93 hierarchically clustered techniques: Building an international consensus for the reporting of behavior change interventions. Annals of Behavioral Medicine, 46, 81-95]. Behaviour change experts (n = 3) content-analysed MI techniques based on Miller and Rollnick's [(2013). Motivational interviewing: Preparing people for change (3rd ed.). New York: Guildford Press] conceptualisation. Each technique was then coded for independence and uniqueness by independent experts (n = 10). The experts also compared each MI technique to those from the BCTTv1. Experts identified 38 distinct MI techniques with high agreement on clarity, uniqueness, preciseness, and distinctiveness ratings. Of the identified techniques, 16 were classified as relational techniques. The remaining 22 techniques were classified as content based. Sixteen of the MI techniques were identified as having substantial overlap with techniques from the BCTTv1. The isolation and classification of MI techniques will provide researchers with the necessary tools to clearly specify MI interventions and test the main and interactive effects of the techniques on health behaviour. The distinction between relational and content-based techniques within MI is also an important advance, recognising that changes in motivation and behaviour in MI is a function of both intervention content and the interpersonal style

  4. Web search queries can predict stock market volumes.

    Science.gov (United States)

    Bordino, Ilaria; Battiston, Stefano; Caldarelli, Guido; Cristelli, Matthieu; Ukkonen, Antti; Weber, Ingmar

    2012-01-01

    We live in a computerized and networked society where many of our actions leave a digital trace and affect other people's actions. This has lead to the emergence of a new data-driven research field: mathematical methods of computer science, statistical physics and sociometry provide insights on a wide range of disciplines ranging from social science to human mobility. A recent important discovery is that search engine traffic (i.e., the number of requests submitted by users to search engines on the www) can be used to track and, in some cases, to anticipate the dynamics of social phenomena. Successful examples include unemployment levels, car and home sales, and epidemics spreading. Few recent works applied this approach to stock prices and market sentiment. However, it remains unclear if trends in financial markets can be anticipated by the collective wisdom of on-line users on the web. Here we show that daily trading volumes of stocks traded in NASDAQ-100 are correlated with daily volumes of queries related to the same stocks. In particular, query volumes anticipate in many cases peaks of trading by one day or more. Our analysis is carried out on a unique dataset of queries, submitted to an important web search engine, which enable us to investigate also the user behavior. We show that the query volume dynamics emerges from the collective but seemingly uncoordinated activity of many users. These findings contribute to the debate on the identification of early warnings of financial systemic risk, based on the activity of users of the www.

  5. Web search queries can predict stock market volumes.

    Directory of Open Access Journals (Sweden)

    Ilaria Bordino

    Full Text Available We live in a computerized and networked society where many of our actions leave a digital trace and affect other people's actions. This has lead to the emergence of a new data-driven research field: mathematical methods of computer science, statistical physics and sociometry provide insights on a wide range of disciplines ranging from social science to human mobility. A recent important discovery is that search engine traffic (i.e., the number of requests submitted by users to search engines on the www can be used to track and, in some cases, to anticipate the dynamics of social phenomena. Successful examples include unemployment levels, car and home sales, and epidemics spreading. Few recent works applied this approach to stock prices and market sentiment. However, it remains unclear if trends in financial markets can be anticipated by the collective wisdom of on-line users on the web. Here we show that daily trading volumes of stocks traded in NASDAQ-100 are correlated with daily volumes of queries related to the same stocks. In particular, query volumes anticipate in many cases peaks of trading by one day or more. Our analysis is carried out on a unique dataset of queries, submitted to an important web search engine, which enable us to investigate also the user behavior. We show that the query volume dynamics emerges from the collective but seemingly uncoordinated activity of many users. These findings contribute to the debate on the identification of early warnings of financial systemic risk, based on the activity of users of the www.

  6. Fragger: a protein fragment picker for structural queries [version 2; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Francois Berenger

    2018-04-01

    Full Text Available Protein modeling and design activities often require querying the Protein Data Bank (PDB with a structural fragment, possibly containing gaps. For some applications, it is preferable to work on a specific subset of the PDB or with unpublished structures. These requirements, along with specific user needs, motivated the creation of a new software to manage and query 3D protein fragments. Fragger is a protein fragment picker that allows protein fragment databases to be created and queried. All fragment lengths are supported and any set of PDB files can be used to create a database. Fragger can efficiently search a fragment database with a query fragment and a distance threshold. Matching fragments are ranked by distance to the query. The query fragment can have structural gaps and the allowed amino acid sequences matching a query can be constrained via a regular expression of one-letter amino acid codes. Fragger also incorporates a tool to compute the backbone RMSD of one versus many fragments in high throughput. Fragger should be useful for protein design, loop grafting and related structural bioinformatics tasks.

  7. The Effects of Normalisation of the Satisfaction of Novice End-User Querying Databases

    Directory of Open Access Journals (Sweden)

    Conrad Benedict

    1997-05-01

    Full Text Available This paper reports the results of an experiment that investigated the effects different structural characteristics of relational databases have on information satisfaction of end-users querying databases. The results show that unnormalised tables adversely affect end-user satisfaction. The adverse affect on end-user satisfaction is attributable primarily to the use of non atomic data. In this study, the affect on end user satisfaction of repeating fields was not significant. The study contributes to the further development of theories of individual adjustment to information technology in the workplace by alerting organisations and, in particular, database designers to the ways in which the structural characteristics of relational databases may affect end-user satisfaction. More importantly, the results suggest that database designers need to clearly identify the domains for each item appearing in their databases. These issues are of increasing importance because of the growth in the amount of data available to end-users in relational databases.

  8. ICan: an integrated co-alteration network to identify ovarian cancer-related genes.

    Science.gov (United States)

    Zhou, Yuanshuai; Liu, Yongjing; Li, Kening; Zhang, Rui; Qiu, Fujun; Zhao, Ning; Xu, Yan

    2015-01-01

    Over the last decade, an increasing number of integrative studies on cancer-related genes have been published. Integrative analyses aim to overcome the limitation of a single data type, and provide a more complete view of carcinogenesis. The vast majority of these studies used sample-matched data of gene expression and copy number to investigate the impact of copy number alteration on gene expression, and to predict and prioritize candidate oncogenes and tumor suppressor genes. However, correlations between genes were neglected in these studies. Our work aimed to evaluate the co-alteration of copy number, methylation and expression, allowing us to identify cancer-related genes and essential functional modules in cancer. We built the Integrated Co-alteration network (ICan) based on multi-omics data, and analyzed the network to uncover cancer-related genes. After comparison with random networks, we identified 155 ovarian cancer-related genes, including well-known (TP53, BRCA1, RB1 and PTEN) and also novel cancer-related genes, such as PDPN and EphA2. We compared the results with a conventional method: CNAmet, and obtained a significantly better area under the curve value (ICan: 0.8179, CNAmet: 0.5183). In this paper, we describe a framework to find cancer-related genes based on an Integrated Co-alteration network. Our results proved that ICan could precisely identify candidate cancer genes and provide increased mechanistic understanding of carcinogenesis. This work suggested a new research direction for biological network analyses involving multi-omics data.

  9. Identifying the Micro-relations Underpinning Familiarity Detection in Dynamic Displays Containing Multiple Objects

    Directory of Open Access Journals (Sweden)

    Jamie S. North

    2017-06-01

    Full Text Available We identified the important micro-relations that are perceived when attempting to recognize patterns in stimuli consisting of multiple dynamic objects. Skilled and less-skilled participants were presented with point light display sequences representing dynamic patterns in an invasion sport and were subsequently required to make familiarity based recognition judgments in three different conditions, each of which contained only a select number of features that were present at initial viewing. No differences in recognition accuracy were observed between skilled and less-skilled participants when just objects located in the periphery were presented. Yet, when presented with the relative motions of two centrally located attacking objects only, skilled participants were significantly more accurate than less-skilled participants and their recognition accuracy improved further when a target object was included against which these relative motions could be judged. Skilled participants can perceive and recognize global patterns on the basis of centrally located relational information.

  10. From Ambiguities to Insights: Query-based Comparisons of High-Dimensional Data

    Science.gov (United States)

    Kowalski, Jeanne; Talbot, Conover; Tsai, Hua L.; Prasad, Nijaguna; Umbricht, Christopher; Zeiger, Martha A.

    2007-11-01

    Genomic technologies will revolutionize drag discovery and development; that much is universally agreed upon. The high dimension of data from such technologies has challenged available data analytic methods; that much is apparent. To date, large-scale data repositories have not been utilized in ways that permit their wealth of information to be efficiently processed for knowledge, presumably due in large part to inadequate analytical tools to address numerous comparisons of high-dimensional data. In candidate gene discovery, expression comparisons are often made between two features (e.g., cancerous versus normal), such that the enumeration of outcomes is manageable. With multiple features, the setting becomes more complex, in terms of comparing expression levels of tens of thousands transcripts across hundreds of features. In this case, the number of outcomes, while enumerable, become rapidly large and unmanageable, and scientific inquiries become more abstract, such as "which one of these (compounds, stimuli, etc.) is not like the others?" We develop analytical tools that promote more extensive, efficient, and rigorous utilization of the public data resources generated by the massive support of genomic studies. Our work innovates by enabling access to such metadata with logically formulated scientific inquires that define, compare and integrate query-comparison pair relations for analysis. We demonstrate our computational tool's potential to address an outstanding biomedical informatics issue of identifying reliable molecular markers in thyroid cancer. Our proposed query-based comparison (QBC) facilitates access to and efficient utilization of metadata through logically formed inquires expressed as query-based comparisons by organizing and comparing results from biotechnologies to address applications in biomedicine.

  11. Identifying the Structure and Effect of Drinking-Related Self-Schemas.

    Science.gov (United States)

    Domenico, Lisa H; Strobbe, Stephen; Stein, Karen Farchaus; Giordani, Bruno J; Hagerty, Bonnie M; Pressler, Susan J

    2017-07-01

    Self-schemas have received increased attention as favorable targets for therapeutic intervention because of their central role in self-perception and behavior. The purpose of this integrative review was to identify, evaluate, and synthesize existing research pertaining to drinking-related self-schemas. Russell's integrative review strategy guided the search. Sixteen published works were identified, meeting criteria for evaluation ( n = 12 data-based publications and n = 4 models). The retrieved data-based publications rated fair-good using Polit and Beck's criteria; the overall body of literature rated "B" using Grimes and Schulz criteria. Retrieved models rated 4 to 7 using Fitzpatrick and Whall's criteria. The existing literature strongly supports the availability of a drinking-related self-schema among moderate-to-heavy drinking samples, and suggests a positive relationship between elaboration and drinking behavior. The relationship between valenced content of the schema and drinking behavior remains unexplored. Identifying variation in the structural properties of drinking-related self-schemas could lay the foundation for future interventions.

  12. Alexithymia Is Related to the Need for More Emotional Intensity to Identify Static Fearful Facial Expressions

    Directory of Open Access Journals (Sweden)

    Francesca Starita

    2018-06-01

    Full Text Available Individuals with high levels of alexithymia, a personality trait marked by difficulties in identifying and describing feelings and an externally oriented style of thinking, appear to require more time to accurately recognize intense emotional facial expressions (EFEs. However, in everyday life, EFEs are displayed at different levels of intensity and individuals with high alexithymia may also need more emotional intensity to identify EFEs. Nevertheless, the impact of alexithymia on the identification of EFEs, which vary in emotional intensity, has largely been neglected. To address this, two experiments were conducted in which participants with low (LA and high (HA levels of alexithymia were assessed in their ability to identify static (Experiment 1 and dynamic (Experiment 2 morphed faces ranging from neutral to intense EFEs. Results showed that HA needed more emotional intensity than LA to identify static fearful – but not happy or disgusted – faces. On the contrary, no evidence was found that alexithymia affected the identification of dynamic EFEs. These results extend current literature suggesting that alexithymia is related to the need for more perceptual information to identify static fearful EFEs.

  13. BioFed: federated query processing over life sciences linked open data.

    Science.gov (United States)

    Hasnain, Ali; Mehmood, Qaiser; Sana E Zainab, Syeda; Saleem, Muhammad; Warren, Claude; Zehra, Durre; Decker, Stefan; Rebholz-Schuhmann, Dietrich

    2017-03-15

    Biomedical data, e.g. from knowledge bases and ontologies, is increasingly made available following open linked data principles, at best as RDF triple data. This is a necessary step towards unified access to biological data sets, but this still requires solutions to query multiple endpoints for their heterogeneous data to eventually retrieve all the meaningful information. Suggested solutions are based on query federation approaches, which require the submission of SPARQL queries to endpoints. Due to the size and complexity of available data, these solutions have to be optimised for efficient retrieval times and for users in life sciences research. Last but not least, over time, the reliability of data resources in terms of access and quality have to be monitored. Our solution (BioFed) federates data over 130 SPARQL endpoints in life sciences and tailors query submission according to the provenance information. BioFed has been evaluated against the state of the art solution FedX and forms an important benchmark for the life science domain. The efficient cataloguing approach of the federated query processing system 'BioFed', the triple pattern wise source selection and the semantic source normalisation forms the core to our solution. It gathers and integrates data from newly identified public endpoints for federated access. Basic provenance information is linked to the retrieved data. Last but not least, BioFed makes use of the latest SPARQL standard (i.e., 1.1) to leverage the full benefits for query federation. The evaluation is based on 10 simple and 10 complex queries, which address data in 10 major and very popular data sources (e.g., Dugbank, Sider). BioFed is a solution for a single-point-of-access for a large number of SPARQL endpoints providing life science data. It facilitates efficient query generation for data access and provides basic provenance information in combination with the retrieved data. BioFed fully supports SPARQL 1.1 and gives access to the

  14. Identification and Analysis of Multi-tasking Product Information Search Sessions with Query Logs

    Directory of Open Access Journals (Sweden)

    Xiang Zhou

    2016-09-01

    Full Text Available Purpose: This research aims to identify product search tasks in online shopping and analyze the characteristics of consumer multi-tasking search sessions. Design/methodology/approach: The experimental dataset contains 8,949 queries of 582 users from 3,483 search sessions. A sequential comparison of the Jaccard similarity coefficient between two adjacent search queries and hierarchical clustering of queries is used to identify search tasks. Findings: (1 Users issued a similar number of queries (1.43 to 1.47 with similar lengths (7.3-7.6 characters per task in mono-tasking and multi-tasking sessions, and (2 Users spent more time on average in sessions with more tasks, but spent less time for each task when the number of tasks increased in a session. Research limitations: The task identification method that relies only on query terms does not completely reflect the complex nature of consumer shopping behavior. Practical implications: These results provide an exploratory understanding of the relationships among multiple shopping tasks, and can be useful for product recommendation and shopping task prediction. Originality/value: The originality of this research is its use of query clustering with online shopping task identification and analysis, and the analysis of product search session characteristics.

  15. jQuery UI 1.10 the user interface library for jQuery

    CERN Document Server

    Libby, Alex

    2013-01-01

    This book consists of an easy-to-follow, example-based approach that leads you step-by-step through the implementation and customization of each library component.This book is for frontend designers and developers who need to learn how to use jQuery UI quickly. To get the most out of this book, you should have a good working knowledge of HTML, CSS, and JavaScript, and should ideally be comfortable using jQuery.

  16. Graphical modeling and query language for hospitals.

    Science.gov (United States)

    Barzdins, Janis; Barzdins, Juris; Rencis, Edgars; Sostaks, Agris

    2013-01-01

    So far there has been little evidence that implementation of the health information technologies (HIT) is leading to health care cost savings. One of the reasons for this lack of impact by the HIT likely lies in the complexity of the business process ownership in the hospitals. The goal of our research is to develop a business model-based method for hospital use which would allow doctors to retrieve directly the ad-hoc information from various hospital databases. We have developed a special domain-specific process modelling language called the MedMod. Formally, we define the MedMod language as a profile on UML Class diagrams, but we also demonstrate it on examples, where we explain the semantics of all its elements informally. Moreover, we have developed the Process Query Language (PQL) that is based on MedMod process definition language. The purpose of PQL is to allow a doctor querying (filtering) runtime data of hospital's processes described using MedMod. The MedMod language tries to overcome deficiencies in existing process modeling languages, allowing to specify the loosely-defined sequence of the steps to be performed in the clinical process. The main advantages of PQL are in two main areas - usability and efficiency. They are: 1) the view on data through "glasses" of familiar process, 2) the simple and easy-to-perceive means of setting filtering conditions require no more expertise than using spreadsheet applications, 3) the dynamic response to each step in construction of the complete query that shortens the learning curve greatly and reduces the error rate, and 4) the selected means of filtering and data retrieving allows to execute queries in O(n) time regarding the size of the dataset. We are about to continue developing this project with three further steps. First, we are planning to develop user-friendly graphical editors for the MedMod process modeling and query languages. The second step is to do evaluation of usability the proposed language and tool

  17. Cross-species multiple environmental stress responses: An integrated approach to identify candidate genes for multiple stress tolerance in sorghum (Sorghum bicolor (L. Moench and related model species.

    Directory of Open Access Journals (Sweden)

    Adugna Abdi Woldesemayat

    Full Text Available Crop response to the changing climate and unpredictable effects of global warming with adverse conditions such as drought stress has brought concerns about food security to the fore; crop yield loss is a major cause of concern in this regard. Identification of genes with multiple responses across environmental stresses is the genetic foundation that leads to crop adaptation to environmental perturbations.In this paper, we introduce an integrated approach to assess candidate genes for multiple stress responses across-species. The approach combines ontology based semantic data integration with expression profiling, comparative genomics, phylogenomics, functional gene enrichment and gene enrichment network analysis to identify genes associated with plant stress phenotypes. Five different ontologies, viz., Gene Ontology (GO, Trait Ontology (TO, Plant Ontology (PO, Growth Ontology (GRO and Environment Ontology (EO were used to semantically integrate drought related information.Target genes linked to Quantitative Trait Loci (QTLs controlling yield and stress tolerance in sorghum (Sorghum bicolor (L. Moench and closely related species were identified. Based on the enriched GO terms of the biological processes, 1116 sorghum genes with potential responses to 5 different stresses, such as drought (18%, salt (32%, cold (20%, heat (8% and oxidative stress (25% were identified to be over-expressed. Out of 169 sorghum drought responsive QTLs associated genes that were identified based on expression datasets, 56% were shown to have multiple stress responses. On the other hand, out of 168 additional genes that have been evaluated for orthologous pairs, 90% were conserved across species for drought tolerance. Over 50% of identified maize and rice genes were responsive to drought and salt stresses and were co-located within multifunctional QTLs. Among the total identified multi-stress responsive genes, 272 targets were shown to be co-localized within QTLs

  18. Identifying Novel Candidate Genes Related to Apoptosis from a Protein-Protein Interaction Network

    Directory of Open Access Journals (Sweden)

    Baoman Wang

    2015-01-01

    Full Text Available Apoptosis is the process of programmed cell death (PCD that occurs in multicellular organisms. This process of normal cell death is required to maintain the balance of homeostasis. In addition, some diseases, such as obesity, cancer, and neurodegenerative diseases, can be cured through apoptosis, which produces few side effects. An effective comprehension of the mechanisms underlying apoptosis will be helpful to prevent and treat some diseases. The identification of genes related to apoptosis is essential to uncover its underlying mechanisms. In this study, a computational method was proposed to identify novel candidate genes related to apoptosis. First, protein-protein interaction information was used to construct a weighted graph. Second, a shortest path algorithm was applied to the graph to search for new candidate genes. Finally, the obtained genes were filtered by a permutation test. As a result, 26 genes were obtained, and we discuss their likelihood of being novel apoptosis-related genes by collecting evidence from published literature.

  19. ODG: Omics database generator - a tool for generating, querying, and analyzing multi-omics comparative databases to facilitate biological understanding.

    Science.gov (United States)

    Guhlin, Joseph; Silverstein, Kevin A T; Zhou, Peng; Tiffin, Peter; Young, Nevin D

    2017-08-10

    Rapid generation of omics data in recent years have resulted in vast amounts of disconnected datasets without systemic integration and knowledge building, while individual groups have made customized, annotated datasets available on the web with few ways to link them to in-lab datasets. With so many research groups generating their own data, the ability to relate it to the larger genomic and comparative genomic context is becoming increasingly crucial to make full use of the data. The Omics Database Generator (ODG) allows users to create customized databases that utilize published genomics data integrated with experimental data which can be queried using a flexible graph database. When provided with omics and experimental data, ODG will create a comparative, multi-dimensional graph database. ODG can import definitions and annotations from other sources such as InterProScan, the Gene Ontology, ENZYME, UniPathway, and others. This annotation data can be especially useful for studying new or understudied species for which transcripts have only been predicted, and rapidly give additional layers of annotation to predicted genes. In better studied species, ODG can perform syntenic annotation translations or rapidly identify characteristics of a set of genes or nucleotide locations, such as hits from an association study. ODG provides a web-based user-interface for configuring the data import and for querying the database. Queries can also be run from the command-line and the database can be queried directly through programming language hooks available for most languages. ODG supports most common genomic formats as well as generic, easy to use tab-separated value format for user-provided annotations. ODG is a user-friendly database generation and query tool that adapts to the supplied data to produce a comparative genomic database or multi-layered annotation database. ODG provides rapid comparative genomic annotation and is therefore particularly useful for non-model or

  20. Mathematical Formula Search using Natural Language Queries

    Directory of Open Access Journals (Sweden)

    YANG, S.

    2014-11-01

    Full Text Available This paper presents how to search mathematical formulae written in MathML when given plain words as a query. Since the proposed method allows natural language queries like the traditional Information Retrieval for the mathematical formula search, users do not need to enter any complicated math symbols and to use any formula input tool. For this, formula data is converted into plain texts, and features are extracted from the converted texts. In our experiments, we achieve an outstanding performance, a MRR of 0.659. In addition, we introduce how to utilize formula classification for formula search. By using class information, we finally achieve an improved performance, a MRR of 0.690.

  1. Identifying biological concepts from a protein-related corpus with a probabilistic topic model

    Directory of Open Access Journals (Sweden)

    Lu Xinghua

    2006-02-01

    Full Text Available Abstract Background Biomedical literature, e.g., MEDLINE, contains a wealth of knowledge regarding functions of proteins. Major recurring biological concepts within such text corpora represent the domains of this body of knowledge. The goal of this research is to identify the major biological topics/concepts from a corpus of protein-related MEDLINE© titles and abstracts by applying a probabilistic topic model. Results The latent Dirichlet allocation (LDA model was applied to the corpus. Based on the Bayesian model selection, 300 major topics were extracted from the corpus. The majority of identified topics/concepts was found to be semantically coherent and most represented biological objects or concepts. The identified topics/concepts were further mapped to the controlled vocabulary of the Gene Ontology (GO terms based on mutual information. Conclusion The major and recurring biological concepts within a collection of MEDLINE documents can be extracted by the LDA model. The identified topics/concepts provide parsimonious and semantically-enriched representation of the texts in a semantic space with reduced dimensionality and can be used to index text.

  2. TEMPORAL QUERY PROCESSIG USING SQL SERVER

    OpenAIRE

    Vali Shaik, Mastan; Sujatha, P

    2017-01-01

    Most data sources in real-life are not static but change their information in time. This evolution of data in time can give valuable insights to business analysts. Temporal data refers to data, where changes over time or temporal aspects play a central role. Temporal data denotes the evaluation of object characteristics over time. One of the main unresolved problems that arise during the data mining process is treating data that contains temporal information. Temporal queries on time evolving...

  3. Advanced SPARQL querying in small molecule databases

    Czech Academy of Sciences Publication Activity Database

    Galgonek, Jakub; Hurt, T.; Michlíková, V.; Onderka, P.; Schwarz, J.; Vondrášek, Jiří

    2016-01-01

    Roč. 8, Jun 6 (2016), č. článku 31. ISSN 1758-2946 R&D Projects: GA MŠk(CZ) LM2015047 Institutional support: RVO:61388963 Keywords : Resource Description Framework * SPARQL query language * Database of small molecules Subject RIV: CF - Physical ; Theoretical Chemistry Impact factor: 4.220, year: 2016 http://jcheminf.springeropen.com/articles/10.1186/s13321-016-0144-4

  4. LGscore: A method to identify disease-related genes using biological literature and Google data.

    Science.gov (United States)

    Kim, Jeongwoo; Kim, Hyunjin; Yoon, Youngmi; Park, Sanghyun

    2015-04-01

    Since the genome project in 1990s, a number of studies associated with genes have been conducted and researchers have confirmed that genes are involved in disease. For this reason, the identification of the relationships between diseases and genes is important in biology. We propose a method called LGscore, which identifies disease-related genes using Google data and literature data. To implement this method, first, we construct a disease-related gene network using text-mining results. We then extract gene-gene interactions based on co-occurrences in abstract data obtained from PubMed, and calculate the weights of edges in the gene network by means of Z-scoring. The weights contain two values: the frequency and the Google search results. The frequency value is extracted from literature data, and the Google search result is obtained using Google. We assign a score to each gene through a network analysis. We assume that genes with a large number of links and numerous Google search results and frequency values are more likely to be involved in disease. For validation, we investigated the top 20 inferred genes for five different diseases using answer sets. The answer sets comprised six databases that contain information on disease-gene relationships. We identified a significant number of disease-related genes as well as candidate genes for Alzheimer's disease, diabetes, colon cancer, lung cancer, and prostate cancer. Our method was up to 40% more accurate than existing methods. Copyright © 2015 Elsevier Inc. All rights reserved.

  5. PhyloExplorer: a web server to validate, explore and query phylogenetic trees

    Directory of Open Access Journals (Sweden)

    Auberval Nicolas

    2009-05-01

    Full Text Available Abstract Background Many important problems in evolutionary biology require molecular phylogenies to be reconstructed. Phylogenetic trees must then be manipulated for subsequent inclusion in publications or analyses such as supertree inference and tree comparisons. However, no tool is currently available to facilitate the management of tree collections providing, for instance: standardisation of taxon names among trees with respect to a reference taxonomy; selection of relevant subsets of trees or sub-trees according to a taxonomic query; or simply computation of descriptive statistics on the collection. Moreover, although several databases of phylogenetic trees exist, there is currently no easy way to find trees that are both relevant and complementary to a given collection of trees. Results We propose a tool to facilitate assessment and management of phylogenetic tree collections. Given an input collection of rooted trees, PhyloExplorer provides facilities for obtaining statistics describing the collection, correcting invalid taxon names, extracting taxonomically relevant parts of the collection using a dedicated query language, and identifying related trees in the TreeBASE database. Conclusion PhyloExplorer is a simple and interactive website implemented through underlying Python libraries and MySQL databases. It is available at: http://www.ncbi.orthomam.univ-montp2.fr/phyloexplorer/ and the source code can be downloaded from: http://code.google.com/p/taxomanie/.

  6. SM4MQ: A Semantic Model for Multidimensional Queries

    DEFF Research Database (Denmark)

    Varga, Jovan; Dobrokhotova, Ekaterina; Romero, Oscar

    2017-01-01

    metadata artifacts (e.g., queries) to assist users with the analysis. However, modeling and sharing of most of these artifacts are typically overlooked. Thus, in this paper we focus on the query metadata artifact in the Exploratory OLAP context and propose an RDF-based vocabulary for its representation......, sharing, and reuse on the SW. As OLAP is based on the underlying multidimensional (MD) data model we denote such queries as MD queries and define SM4MQ: A Semantic Model for Multidimensional Queries. Furthermore, we propose a method to automate the exploitation of queries by means of SPARQL. We apply...... the method to a use case of transforming queries from SM4MQ to a vector representation. For the use case, we developed the prototype and performed an evaluation that shows how our approach can significantly ease and support user assistance such as query recommendation....

  7. Accelerating SPARQL Queries and Analytics on RDF Data

    KAUST Repository

    Al-Harbi, Razen

    2016-01-01

    The complexity of SPARQL queries and RDF applications poses great challenges on distributed RDF management systems. SPARQL workloads are dynamic and con- sist of queries with variable complexities. Hence, systems that use static partitioning su

  8. A Revisit of Query Expansion with Different Semantic Levels

    DEFF Research Database (Denmark)

    Zhang, Ce; Cui, Bin; Cong, Gao

    2009-01-01

    Query expansion has received extensive attention in information retrieval community. Although semantic based query expansion appears to be promising in improving retrieval performance, previous research has shown that it cannot consistently improve retrieval performance. It is a tricky problem to...

  9. Consistent Differential Expression Pattern (CDEP) on microarray to identify genes related to metastatic behavior.

    Science.gov (United States)

    Tsoi, Lam C; Qin, Tingting; Slate, Elizabeth H; Zheng, W Jim

    2011-11-11

    To utilize the large volume of gene expression information generated from different microarray experiments, several meta-analysis techniques have been developed. Despite these efforts, there remain significant challenges to effectively increasing the statistical power and decreasing the Type I error rate while pooling the heterogeneous datasets from public resources. The objective of this study is to develop a novel meta-analysis approach, Consistent Differential Expression Pattern (CDEP), to identify genes with common differential expression patterns across different datasets. We combined False Discovery Rate (FDR) estimation and the non-parametric RankProd approach to estimate the Type I error rate in each microarray dataset of the meta-analysis. These Type I error rates from all datasets were then used to identify genes with common differential expression patterns. Our simulation study showed that CDEP achieved higher statistical power and maintained low Type I error rate when compared with two recently proposed meta-analysis approaches. We applied CDEP to analyze microarray data from different laboratories that compared transcription profiles between metastatic and primary cancer of different types. Many genes identified as differentially expressed consistently across different cancer types are in pathways related to metastatic behavior, such as ECM-receptor interaction, focal adhesion, and blood vessel development. We also identified novel genes such as AMIGO2, Gem, and CXCL11 that have not been shown to associate with, but may play roles in, metastasis. CDEP is a flexible approach that borrows information from each dataset in a meta-analysis in order to identify genes being differentially expressed consistently. We have shown that CDEP can gain higher statistical power than other existing approaches under a variety of settings considered in the simulation study, suggesting its robustness and insensitivity to data variation commonly associated with microarray

  10. PropBase Query Layer: a single portal to UK subsurface physical property databases

    Science.gov (United States)

    Kingdon, Andrew; Nayembil, Martin L.; Richardson, Anne E.; Smith, A. Graham

    2013-04-01

    Until recently, the delivery of geological information for industry and public was achieved by geological mapping. Now pervasively available computers mean that 3D geological models can deliver realistic representations of the geometric location of geological units, represented as shells or volumes. The next phase of this process is to populate these with physical properties data that describe subsurface heterogeneity and its associated uncertainty. Achieving this requires capture and serving of physical, hydrological and other property information from diverse sources to populate these models. The British Geological Survey (BGS) holds large volumes of subsurface property data, derived both from their own research data collection and also other, often commercially derived data sources. This can be voxelated to incorporate this data into the models to demonstrate property variation within the subsurface geometry. All property data held by BGS has for many years been stored in relational databases to ensure their long-term continuity. However these have, by necessity, complex structures; each database contains positional reference data and model information, and also metadata such as sample identification information and attributes that define the source and processing. Whilst this is critical to assessing these analyses, it also hugely complicates the understanding of variability of the property under assessment and requires multiple queries to study related datasets making extracting physical properties from these databases difficult. Therefore the PropBase Query Layer has been created to allow simplified aggregation and extraction of all related data and its presentation of complex data in simple, mostly denormalized, tables which combine information from multiple databases into a single system. The structure from each relational database is denormalized in a generalised structure, so that each dataset can be viewed together in a common format using a simple

  11. Identifying practice-related factors for high-volume prescribers of antibiotics in Danish general practice

    DEFF Research Database (Denmark)

    Aabenhus, Rune; Siersma, Volkert Dirk; Sandholdt, Håkon

    2017-01-01

    practice-related factors driving high antibiotic prescribing rates. Results: We included 98% of general practices in Denmark (n = 1962) and identified a 10% group of high prescribers who accounted for 15% of total antibiotic prescriptions and 18% of critically important antibiotic prescriptions. Once case...... prescriptions issued over the phone compared with all antibiotic prescriptions; and a high number of consultations per 1000 patients. We also found that a low number of consultations per 1000 patients was associated with a reduced likelihood of being a high prescriber of antibiotics. Conclusions: An apparent...

  12. Application of discriminative models for interactive query refinement in video retrieval

    Science.gov (United States)

    Srivastava, Amit; Khanwalkar, Saurabh; Kumar, Anoop

    2013-12-01

    The ability to quickly search for large volumes of videos for specific actions or events can provide a dramatic new capability to intelligence agencies. Example-based queries from video are a form of content-based information retrieval (CBIR) where the objective is to retrieve clips from a video corpus, or stream, using a representative query sample to find more like this. Often, the accuracy of video retrieval is largely limited by the gap between the available video descriptors and the underlying query concept, and such exemplar queries return many irrelevant results with relevant ones. In this paper, we present an Interactive Query Refinement (IQR) system which acts as a powerful tool to leverage human feedback and allow intelligence analyst to iteratively refine search queries for improved precision in the retrieved results. In our approach to IQR, we leverage discriminative models that operate on high dimensional features derived from low-level video descriptors in an iterative framework. Our IQR model solicits relevance feedback on examples selected from the region of uncertainty and updates the discriminating boundary to produce a relevance ranked results list. We achieved 358% relative improvement in Mean Average Precision (MAP) over initial retrieval list at a rank cutoff of 100 over 4 iterations. We compare our discriminative IQR model approach to a naïve IQR and show our model-based approach yields 49% relative improvement over the no model naïve system.

  13. Semantic querying of data guided by Formal Concept Analysis

    OpenAIRE

    Codocedo , Victor; Lykourentzou , Ioanna; Napoli , Amedeo

    2012-01-01

    International audience; In this paper we present a novel approach to handle querying over a concept lattice of documents and annotations. We focus on the problem of "non-matching documents", which are those that, despite being semantically relevant to the user query, do not contain the query's elements and hence cannot be retrieved by typical string matching approaches. In order to find these documents, we modify the initial user query using the concept lattice as a guide. We achieve this by ...

  14. Visual Querying in Chemical Databases using SMARTS Patterns

    OpenAIRE

    Šípek, Vojtěch

    2014-01-01

    The purpose of this thesis is to create framework for visual querying in chemical databases which will be implemented as a web application. By using graphical editor, which is a part of client side, the user creates queries which are translated into chemical query language SMARTS. This query is parsed on the application server which is connected to the chemical database. This framework also contains tooling for creating the database and index structure above it. 1

  15. Fast Inbound Top-K Query for Random Walk with Restart.

    Science.gov (United States)

    Zhang, Chao; Jiang, Shan; Chen, Yucheng; Sun, Yidan; Han, Jiawei

    2015-09-01

    Random walk with restart (RWR) is widely recognized as one of the most important node proximity measures for graphs, as it captures the holistic graph structure and is robust to noise in the graph. In this paper, we study a novel query based on the RWR measure, called the inbound top-k (Ink) query. Given a query node q and a number k , the Ink query aims at retrieving k nodes in the graph that have the largest weighted RWR scores to q . Ink queries can be highly useful for various applications such as traffic scheduling, disease treatment, and targeted advertising. Nevertheless, none of the existing RWR computation techniques can accurately and efficiently process the Ink query in large graphs. We propose two algorithms, namely Squeeze and Ripple, both of which can accurately answer the Ink query in a fast and incremental manner. To identify the top- k nodes, Squeeze iteratively performs matrix-vector multiplication and estimates the lower and upper bounds for all the nodes in the graph. Ripple employs a more aggressive strategy by only estimating the RWR scores for the nodes falling in the vicinity of q , the nodes outside the vicinity do not need to be evaluated because their RWR scores are propagated from the boundary of the vicinity and thus upper bounded. Ripple incrementally expands the vicinity until the top- k result set can be obtained. Our extensive experiments on real-life graph data sets show that Ink queries can retrieve interesting results, and the proposed algorithms are orders of magnitude faster than state-of-the-art method.

  16. Query Processing and Interlinking of Fuzzy Object-Oriented Database

    OpenAIRE

    Shweta Dwivedi; Santosh Kumar

    2017-01-01

    Due to the many limitation and poor data handling in the existing relational database, the software professional and researchers moves towards the object-oriented database which has much better capability to handling the real and complex real world data i.e. clear and crisp data and also have the capability to perform some huge and complex queries in an effective manner. On the other hand, a new approach in database is introduced named as Fuzzy Object-Oriented Database (FOOD); it has all the ...

  17. Parallelizing Federated SPARQL Queries in Presence of Replicated Data

    DEFF Research Database (Denmark)

    Minier, Thomas; Montoya, Gabriela; Skaf-Molli, Hala

    2017-01-01

    Federated query engines have been enhanced to exploit new data localities created by replicated data, e.g., Fedra. However, existing replication aware federated query engines mainly focus on pruning sources during the source selection and query decomposition in order to reduce intermediate result...

  18. Result diversification based on query-specific cluster ranking

    NARCIS (Netherlands)

    He, J.; Meij, E.; de Rijke, M.

    2011-01-01

    Result diversification is a retrieval strategy for dealing with ambiguous or multi-faceted queries by providing documents that cover as many facets of the query as possible. We propose a result diversification framework based on query-specific clustering and cluster ranking, in which diversification

  19. Result Diversification Based on Query-Specific Cluster Ranking

    NARCIS (Netherlands)

    J. He (Jiyin); E. Meij; M. de Rijke (Maarten)

    2011-01-01

    htmlabstractResult diversification is a retrieval strategy for dealing with ambiguous or multi-faceted queries by providing documents that cover as many facets of the query as possible. We propose a result diversification framework based on query-specific clustering and cluster ranking,

  20. Efficient Processing of Multiple DTW Queries in Time Series Databases

    DEFF Research Database (Denmark)

    Kremer, Hardy; Günnemann, Stephan; Ivanescu, Anca-Maria

    2011-01-01

    . In many of today’s applications, however, large numbers of queries arise at any given time. Existing DTW techniques do not process multiple DTW queries simultaneously, a serious limitation which slows down overall processing. In this paper, we propose an efficient processing approach for multiple DTW...... for multiple DTW queries....

  1. Modeling Large Time Series for Efficient Approximate Query Processing

    DEFF Research Database (Denmark)

    Perera, Kasun S; Hahmann, Martin; Lehner, Wolfgang

    2015-01-01

    query statistics derived from experiments and when running the system. Our approach can also reduce communication load by exchanging models instead of data. To allow seamless integration of model-based querying into traditional data warehouses, we introduce a SQL compatible query terminology. Our...

  2. Extracting Rankings for Spatial Keyword Queries from GPS Data

    DEFF Research Database (Denmark)

    Keles, Ilkcan; Jensen, Christian Søndergaard; Saltenis, Simonas

    2018-01-01

    Studies suggest that many search engine queries have local intent. We consider the evaluation of ranking functions important for such queries. The key challenge is to be able to determine the “best” ranking for a query, as this enables evaluation of the results of ranking functions. We propose...

  3. An Adaptive Directed Query Dissemination Scheme for Wireless Sensor Networks

    NARCIS (Netherlands)

    Chatterjea, Supriyo; De Luigi, Simone; Havinga, Paul J.M.; Sun, M.T.

    This paper describes a directed query dissemination scheme, DirQ that routes queries to the appropriate source nodes based on both constant and dynamicvalued attributes such as sensor types and sensor values. Unlike certain other query dissemination schemes, location information is not essential for

  4. Using relative handgrip strength to identify children at risk of sarcopenic obesity.

    Directory of Open Access Journals (Sweden)

    Michal Steffl

    Full Text Available Identifying children at risk of developing childhood sarcopenic obesity often requires specialized equipment and costly testing procedures, so cheaper and quicker methods would be advantageous, especially in field-based settings. The purpose of this study was to determine the relationships between the muscle-to-fat ratio (MFR and relative handgrip strength, and to determine the ability of handgrip strength relative to body mass index (grip-to-BMI to identify children who are at risk of developing sarcopenic obesity. Grip-to-BMI was measured in 730 Czech children (4 to 14 yrs. Bioelectrical impedance was used to estimate body fat mass and skeletal muscle mass, from which the MFR was calculated. The area under the curve (AUC was 0.791 (95% CI 0.692-0.890, p ˂ 0.001 in girls 4-9; 0.789 (95% CI 0.688-0.890, p ˂ 0.001 in girls 10-14 years old; 0.719 (95% CI 0.607-0.831, p = 0.001 in boys 4-9; and 0.896 (95% CI 0.823-0.969, p ˂ 0.001 in boys 10-14 years old. Calculated using the grip-to-BMI ratio, the OR (95% CI for girls to be at risk of sarcopenic obesity identified by MFR was 9.918 (4.243-23.186, p ˂ 0.001 and was 11.515 (4.280-30.982, p ˂ 0.001 for boys. The grip-to-BMI ratio can be used to predict the presence of sarcopenic obesity in children, which can play a role in pediatric health interventions.

  5. Automatic Query Generation and Query Relevance Measurement for Unsupervised Language Model Adaptation of Speech Recognition

    Directory of Open Access Journals (Sweden)

    Suzuki Motoyuki

    2009-01-01

    Full Text Available Abstract We are developing a method of Web-based unsupervised language model adaptation for recognition of spoken documents. The proposed method chooses keywords from the preliminary recognition result and retrieves Web documents using the chosen keywords. A problem is that the selected keywords tend to contain misrecognized words. The proposed method introduces two new ideas for avoiding the effects of keywords derived from misrecognized words. The first idea is to compose multiple queries from selected keyword candidates so that the misrecognized words and correct words do not fall into one query. The second idea is that the number of Web documents downloaded for each query is determined according to the "query relevance." Combining these two ideas, we can alleviate bad effect of misrecognized keywords by decreasing the number of downloaded Web documents from queries that contain misrecognized keywords. Finally, we examine a method of determining the number of iterative adaptations based on the recognition likelihood. Experiments have shown that the proposed stopping criterion can determine almost the optimum number of iterations. In the final experiment, the word accuracy without adaptation (55.29% was improved to 60.38%, which was 1.13 point better than the result of the conventional unsupervised adaptation method (59.25%.

  6. Automatic Query Generation and Query Relevance Measurement for Unsupervised Language Model Adaptation of Speech Recognition

    Directory of Open Access Journals (Sweden)

    Akinori Ito

    2009-01-01

    Full Text Available We are developing a method of Web-based unsupervised language model adaptation for recognition of spoken documents. The proposed method chooses keywords from the preliminary recognition result and retrieves Web documents using the chosen keywords. A problem is that the selected keywords tend to contain misrecognized words. The proposed method introduces two new ideas for avoiding the effects of keywords derived from misrecognized words. The first idea is to compose multiple queries from selected keyword candidates so that the misrecognized words and correct words do not fall into one query. The second idea is that the number of Web documents downloaded for each query is determined according to the “query relevance.” Combining these two ideas, we can alleviate bad effect of misrecognized keywords by decreasing the number of downloaded Web documents from queries that contain misrecognized keywords. Finally, we examine a method of determining the number of iterative adaptations based on the recognition likelihood. Experiments have shown that the proposed stopping criterion can determine almost the optimum number of iterations. In the final experiment, the word accuracy without adaptation (55.29% was improved to 60.38%, which was 1.13 point better than the result of the conventional unsupervised adaptation method (59.25%.

  7. An Application of Multivariate Statistical Analysis for Query-Driven Visualization

    Energy Technology Data Exchange (ETDEWEB)

    Gosink, Luke J. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Garth, Christoph [Univ. of California, Davis, CA (United States); Anderson, John C. [Univ. of California, Davis, CA (United States); Bethel, E. Wes [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Joy, Kenneth I. [Univ. of California, Davis, CA (United States)

    2011-03-01

    Driven by the ability to generate ever-larger, increasingly complex data, there is an urgent need in the scientific community for scalable analysis methods that can rapidly identify salient trends in scientific data. Query-Driven Visualization (QDV) strategies are among the small subset of techniques that can address both large and highly complex datasets. This paper extends the utility of QDV strategies with a statistics-based framework that integrates non-parametric distribution estimation techniques with a new segmentation strategy to visually identify statistically significant trends and features within the solution space of a query. In this framework, query distribution estimates help users to interactively explore their query's solution and visually identify the regions where the combined behavior of constrained variables is most important, statistically, to their inquiry. Our new segmentation strategy extends the distribution estimation analysis by visually conveying the individual importance of each variable to these regions of high statistical significance. We demonstrate the analysis benefits these two strategies provide and show how they may be used to facilitate the refinement of constraints over variables expressed in a user's query. We apply our method to datasets from two different scientific domains to demonstrate its broad applicability.

  8. Improving biomedical information retrieval by linear combinations of different query expansion techniques.

    Science.gov (United States)

    Abdulla, Ahmed AbdoAziz Ahmed; Lin, Hongfei; Xu, Bo; Banbhrani, Santosh Kumar

    2016-07-25

    Biomedical literature retrieval is becoming increasingly complex, and there is a fundamental need for advanced information retrieval systems. Information Retrieval (IR) programs scour unstructured materials such as text documents in large reserves of data that are usually stored on computers. IR is related to the representation, storage, and organization of information items, as well as to access. In IR one of the main problems is to determine which documents are relevant and which are not to the user's needs. Under the current regime, users cannot precisely construct queries in an accurate way to retrieve particular pieces of data from large reserves of data. Basic information retrieval systems are producing low-quality search results. In our proposed system for this paper we present a new technique to refine Information Retrieval searches to better represent the user's information need in order to enhance the performance of information retrieval by using different query expansion techniques and apply a linear combinations between them, where the combinations was linearly between two expansion results at one time. Query expansions expand the search query, for example, by finding synonyms and reweighting original terms. They provide significantly more focused, particularized search results than do basic search queries. The retrieval performance is measured by some variants of MAP (Mean Average Precision) and according to our experimental results, the combination of best results of query expansion is enhanced the retrieved documents and outperforms our baseline by 21.06 %, even it outperforms a previous study by 7.12 %. We propose several query expansion techniques and their combinations (linearly) to make user queries more cognizable to search engines and to produce higher-quality search results.

  9. Use of a twin dataset to identify AMD-related visual patterns controlled by genetic factors

    Science.gov (United States)

    Quellec, Gwénolé; Abràmoff, Michael D.; Russell, Stephen R.

    2010-03-01

    The mapping of genotype to the phenotype of age-related macular degeneration (AMD) is expected to improve the diagnosis and treatment of the disease in a near future. In this study, we focused on the first step to discover this mapping: we identified visual patterns related to AMD which seem to be controlled by genetic factors, without explicitly relating them to the genes. For this purpose, we used a dataset of eye fundus photographs from 74 twin pairs, either monozygotic twins, who have the same genotype, or dizygotic twins, whose genes responsible for AMD are less likely to be identical. If we are able to differentiate monozygotic twins from dizygotic twins, based on a given visual pattern, then this pattern is likely to be controlled by genetic factors. The main visible consequence of AMD is the apparition of drusen between the retinal pigment epithelium and Bruch's membrane. We developed two automated drusen detectors based on the wavelet transform: a shape-based detector for hard drusen, and a texture- and color- based detector for soft drusen. Forty visual features were evaluated at the location of the automatically detected drusen. These features characterize the texture, the shape, the color, the spatial distribution, or the amount of drusen. A distance measure between twin pairs was defined for each visual feature; a smaller distance should be measured between monozygotic twins for visual features controlled by genetic factors. The predictions of several visual features (75.7% accuracy) are comparable or better than the predictions of human experts.

  10. Lost in translation? A multilingual Query Builder improves the quality of PubMed queries: a randomised controlled trial.

    Science.gov (United States)

    Schuers, Matthieu; Joulakian, Mher; Kerdelhué, Gaetan; Segas, Léa; Grosjean, Julien; Darmoni, Stéfan J; Griffon, Nicolas

    2017-07-03

    MEDLINE is the most widely used medical bibliographic database in the world. Most of its citations are in English and this can be an obstacle for some researchers to access the information the database contains. We created a multilingual query builder to facilitate access to the PubMed subset using a language other than English. The aim of our study was to assess the impact of this multilingual query builder on the quality of PubMed queries for non-native English speaking physicians and medical researchers. A randomised controlled study was conducted among French speaking general practice residents. We designed a multi-lingual query builder to facilitate information retrieval, based on available MeSH translations and providing users with both an interface and a controlled vocabulary in their own language. Participating residents were randomly allocated either the French or the English version of the query builder. They were asked to translate 12 short medical questions into MeSH queries. The main outcome was the quality of the query. Two librarians blind to the arm independently evaluated each query, using a modified published classification that differentiated eight types of errors. Twenty residents used the French version of the query builder and 22 used the English version. 492 queries were analysed. There were significantly more perfect queries in the French group vs. the English group (respectively 37.9% vs. 17.9%; p PubMed queries in particular for researchers whose first language is not English.

  11. QTLs for seed vigor-related traits identified in maize seeds germinated under artificial aging conditions.

    Science.gov (United States)

    Han, Zanping; Ku, Lixia; Zhang, Zhenzhen; Zhang, Jun; Guo, Shulei; Liu, Haiying; Zhao, Ruifang; Ren, Zhenzhen; Zhang, Liangkun; Su, Huihui; Dong, Lei; Chen, Yanhui

    2014-01-01

    High seed vigor is important for agricultural production due to the associated potential for increased growth and productivity. However, a better understanding of the underlying molecular mechanisms is required because the genetic basis for seed vigor remains unknown. We used single-nucleotide polymorphism (SNP) markers to map quantitative trait loci (QTLs) for four seed vigor traits in two connected recombinant inbred line (RIL) maize populations under four treatment conditions during seed germination. Sixty-five QTLs distributed between the two populations were identified and a meta-analysis was used to integrate genetic maps. Sixty-one initially identified QTLs were integrated into 18 meta-QTLs (mQTLs). Initial QTLs with contribution to phenotypic variation values of R(2)>10% were integrated into mQTLs. Twenty-three candidate genes for association with seed vigor traits coincided with 13 mQTLs. The candidate genes had functions in the glycolytic pathway and in protein metabolism. QTLs with major effects (R(2)>10%) were identified under at least one treatment condition for mQTL2, mQTL3-2, and mQTL3-4. Candidate genes included a calcium-dependent protein kinase gene (302810918) involved in signal transduction that mapped in the mQTL3-2 interval associated with germination energy (GE) and germination percentage (GP), and an hsp20/alpha crystallin family protein gene (At5g51440) that mapped in the mQTL3-4 interval associated with GE and GP. Two initial QTLs with a major effect under at least two treatment conditions were identified for mQTL5-2. A cucumisin-like Ser protease gene (At5g67360) mapped in the mQTL5-2 interval associated with GP. The chromosome regions for mQTL2, mQTL3-2, mQTL3-4, and mQTL5-2 may be hot spots for QTLs related to seed vigor traits. The mQTLs and candidate genes identified in this study provide valuable information for the identification of additional quantitative trait genes.

  12. QTLs for seed vigor-related traits identified in maize seeds germinated under artificial aging conditions.

    Directory of Open Access Journals (Sweden)

    Zanping Han

    Full Text Available High seed vigor is important for agricultural production due to the associated potential for increased growth and productivity. However, a better understanding of the underlying molecular mechanisms is required because the genetic basis for seed vigor remains unknown. We used single-nucleotide polymorphism (SNP markers to map quantitative trait loci (QTLs for four seed vigor traits in two connected recombinant inbred line (RIL maize populations under four treatment conditions during seed germination. Sixty-five QTLs distributed between the two populations were identified and a meta-analysis was used to integrate genetic maps. Sixty-one initially identified QTLs were integrated into 18 meta-QTLs (mQTLs. Initial QTLs with contribution to phenotypic variation values of R(2>10% were integrated into mQTLs. Twenty-three candidate genes for association with seed vigor traits coincided with 13 mQTLs. The candidate genes had functions in the glycolytic pathway and in protein metabolism. QTLs with major effects (R(2>10% were identified under at least one treatment condition for mQTL2, mQTL3-2, and mQTL3-4. Candidate genes included a calcium-dependent protein kinase gene (302810918 involved in signal transduction that mapped in the mQTL3-2 interval associated with germination energy (GE and germination percentage (GP, and an hsp20/alpha crystallin family protein gene (At5g51440 that mapped in the mQTL3-4 interval associated with GE and GP. Two initial QTLs with a major effect under at least two treatment conditions were identified for mQTL5-2. A cucumisin-like Ser protease gene (At5g67360 mapped in the mQTL5-2 interval associated with GP. The chromosome regions for mQTL2, mQTL3-2, mQTL3-4, and mQTL5-2 may be hot spots for QTLs related to seed vigor traits. The mQTLs and candidate genes identified in this study provide valuable information for the identification of additional quantitative trait genes.

  13. Reformulating XQuery queries using GLAV mapping and complex unification

    Directory of Open Access Journals (Sweden)

    Saber Benharzallah

    2016-01-01

    Full Text Available This paper describes an algorithm for reformulation of XQuery queries. The mediation is based on an essential component called mediator. Its main role is to reformulate a user query, written in terms of global schema, into queries written in terms of source schemas. Our algorithm is based on the principle of logical equivalence, simple and complex unification, to obtain a better reformulation. It takes XQuery query, global schema (written in XMLSchema, and mappings GLAV as input parameters and provides resultant query written in terms of source schemas. The results of implementation show the proper functioning of the algorithm.

  14. Approximate furthest neighbor with application to annulus query

    DEFF Research Database (Denmark)

    Pagh, Rasmus; Silvestri, Francesco; Sivertsen, Johan von Tangen

    2016-01-01

    -dimensional Euclidean space. The method builds on the technique of Indyk (SODA 2003), storing random projections to provide sublinear query time for AFN. However, we introduce a different query algorithm, improving on Indyk׳s approximation factor and reducing the running time by a logarithmic factor. We also present......, the query-dependent approach is used for deriving a data structure for the approximate annulus query problem, which is defined as follows: given an input set S and two parameters r>0 and w≥1, construct a data structure that returns for each query point q a point p∈S such that the distance between p and q...

  15. Spatio-temporal databases complex motion pattern queries

    CERN Document Server

    Vieira, Marcos R

    2013-01-01

    This brief presents several new query processing techniques, called complex motion pattern queries, specifically designed for very large spatio-temporal databases of moving objects. The brief begins with the definition of flexible pattern queries, which are powerful because of the integration of variables and motion patterns. This is followed by a summary of the expressive power of patterns and flexibility of pattern queries. The brief then present the Spatio-Temporal Pattern System (STPS) and density-based pattern queries. STPS databases contain millions of records with information about mobi

  16. STARS 2.0: 2nd-generation open-source archiving and query software

    Science.gov (United States)

    Winegar, Tom

    2008-07-01

    The Subaru Telescope is in process of developing an open-source alternative to the 1st-generation software and databases (STARS 1) used for archiving and query. For STARS 2, we have chosen PHP and Python for scripting and MySQL as the database software. We have collected feedback from staff and observers, and used this feedback to significantly improve the design and functionality of our future archiving and query software. Archiving - We identified two weaknesses in 1st-generation STARS archiving software: a complex and inflexible table structure and uncoordinated system administration for our business model: taking pictures from the summit and archiving them in both Hawaii and Japan. We adopted a simplified and normalized table structure with passive keyword collection, and we are designing an archive-to-archive file transfer system that automatically reports real-time status and error conditions and permits error recovery. Query - We identified several weaknesses in 1st-generation STARS query software: inflexible query tools, poor sharing of calibration data, and no automatic file transfer mechanisms to observers. We are developing improved query tools and sharing of calibration data, and multi-protocol unassisted file transfer mechanisms for observers. In the process, we have redefined a 'query': from an invisible search result that can only transfer once in-house right now, with little status and error reporting and no error recovery - to a stored search result that can be monitored, transferred to different locations with multiple protocols, reporting status and error conditions and permitting recovery from errors.

  17. Suppression subtractive hybridization as a tool to identify anthocyanin metabolism-related genes in apple skin.

    Science.gov (United States)

    Ban, Yusuke; Moriguchi, Takaya

    2010-01-01

    The pigmentation of anthocyanins is one of the important determinants for consumer preference and marketability in horticultural crops such as fruits and flowers. To elucidate the mechanisms underlying the physiological process leading to the pigmentation of anthocyanins, identification of the genes differentially expressed in response to anthocyanin accumulation is a useful strategy. Currently, microarrays have been widely used to isolate differentially expressed genes. However, the use of microarrays is limited by its high cost of special apparatus and materials. Therefore, availability of microarrays is limited and does not come into common use at present. Suppression subtractive hybridization (SSH) is an alternative tool that has been widely used to identify differentially expressed genes due to its easy handling and relatively low cost. This chapter describes the procedures for SSH, including RNA extraction from polysaccharides and polyphenol-rich samples, poly(A)+ RNA purification, evaluation of subtraction efficiency, and differential screening using reverse northern in apple skin.

  18. Characterization of drug-related problems identified by clinical pharmacy staff at Danish hospitals

    DEFF Research Database (Denmark)

    Kjeldsen, Lene Juel; Birkholm, Trine; Fischer, Hanne

    2014-01-01

    Background In 2010, a database of drug related problems (DRPs) was implemented to assist clinical pharmacy staff in documenting clinical pharmacy activities locally. A study of quality, reliability and generalisability showed that national analyses of the data could be conducted. Analyses...... at the national level may help identify and prevent DRPs by performing national interventions. Objective The aim of the study was to explore the DRP characteristics as documented by clinical pharmacy staff at hospital pharmacies in the Danish DRP-database during a 3-year period. Setting Danish hospital pharmacies....... Method Data documented in the DRP-database during the initial 3 years after implementation were analyzed retrospectively. The DRP-database contains DRPs reported at hospitals by clinical pharmacy staff. The analyses focused on DRP categories, implementation rates and drugs associated with the DRPs. Main...

  19. Meta-Analysis of Placental Transcriptome Data Identifies a Novel Molecular Pathway Related to Preeclampsia.

    Science.gov (United States)

    van Uitert, Miranda; Moerland, Perry D; Enquobahrie, Daniel A; Laivuori, Hannele; van der Post, Joris A M; Ris-Stalpers, Carrie; Afink, Gijs B

    2015-01-01

    Studies using the placental transcriptome to identify key molecules relevant for preeclampsia are hampered by a relatively small sample size. In addition, they use a variety of bioinformatics and statistical methods, making comparison of findings challenging. To generate a more robust preeclampsia gene expression signature, we performed a meta-analysis on the original data of 11 placenta RNA microarray experiments, representing 139 normotensive and 116 preeclamptic pregnancies. Microarray data were pre-processed and analyzed using standardized bioinformatics and statistical procedures and the effect sizes were combined using an inverse-variance random-effects model. Interactions between genes in the resulting gene expression signature were identified by pathway analysis (Ingenuity Pathway Analysis, Gene Set Enrichment Analysis, Graphite) and protein-protein associations (STRING). This approach has resulted in a comprehensive list of differentially expressed genes that led to a 388-gene meta-signature of preeclamptic placenta. Pathway analysis highlights the involvement of the previously identified hypoxia/HIF1A pathway in the establishment of the preeclamptic gene expression profile, while analysis of protein interaction networks indicates CREBBP/EP300 as a novel element central to the preeclamptic placental transcriptome. In addition, there is an apparent high incidence of preeclampsia in women carrying a child with a mutation in CREBBP/EP300 (Rubinstein-Taybi Syndrome). The 388-gene preeclampsia meta-signature offers a vital starting point for further studies into the relevance of these genes (in particular CREBBP/EP300) and their concomitant pathways as biomarkers or functional molecules in preeclampsia. This will result in a better understanding of the molecular basis of this disease and opens up the opportunity to develop rational therapies targeting the placental dysfunction causal to preeclampsia.

  20. Meta-Analysis of Placental Transcriptome Data Identifies a Novel Molecular Pathway Related to Preeclampsia.

    Directory of Open Access Journals (Sweden)

    Miranda van Uitert

    Full Text Available Studies using the placental transcriptome to identify key molecules relevant for preeclampsia are hampered by a relatively small sample size. In addition, they use a variety of bioinformatics and statistical methods, making comparison of findings challenging. To generate a more robust preeclampsia gene expression signature, we performed a meta-analysis on the original data of 11 placenta RNA microarray experiments, representing 139 normotensive and 116 preeclamptic pregnancies. Microarray data were pre-processed and analyzed using standardized bioinformatics and statistical procedures and the effect sizes were combined using an inverse-variance random-effects model. Interactions between genes in the resulting gene expression signature were identified by pathway analysis (Ingenuity Pathway Analysis, Gene Set Enrichment Analysis, Graphite and protein-protein associations (STRING. This approach has resulted in a comprehensive list of differentially expressed genes that led to a 388-gene meta-signature of preeclamptic placenta. Pathway analysis highlights the involvement of the previously identified hypoxia/HIF1A pathway in the establishment of the preeclamptic gene expression profile, while analysis of protein interaction networks indicates CREBBP/EP300 as a novel element central to the preeclamptic placental transcriptome. In addition, there is an apparent high incidence of preeclampsia in women carrying a child with a mutation in CREBBP/EP300 (Rubinstein-Taybi Syndrome. The 388-gene preeclampsia meta-signature offers a vital starting point for further studies into the relevance of these genes (in particular CREBBP/EP300 and their concomitant pathways as biomarkers or functional molecules in preeclampsia. This will result in a better understanding of the molecular basis of this disease and opens up the opportunity to develop rational therapies targeting the placental dysfunction causal to preeclampsia.

  1. Optimizing queries in SQL Server 2008

    Directory of Open Access Journals (Sweden)

    Ion LUNGU

    2010-05-01

    Full Text Available Starting from the need to develop efficient IT systems, we intend to review theoptimization methods and tools that can be used by SQL Server database administratorsand developers of applications based on Microsoft technology, focusing on the latestversion of the proprietary DBMS, SQL Server 2008. We’ll reflect on the objectives tobe considered in improving the performance of SQL Server instances, we will tackle themostly used techniques for analyzing and optimizing queries and we will describe the“Optimize for ad hoc workloads”, “Plan Freezing” and “Optimize for unknown" newoptions, accompanied by relevant code examples.

  2. Deep web query interface understanding and integration

    CERN Document Server

    Dragut, Eduard C; Yu, Clement T

    2012-01-01

    There are millions of searchable data sources on the Web and to a large extent their contents can only be reached through their own query interfaces. There is an enormous interest in making the data in these sources easily accessible. There are primarily two general approaches to achieve this objective. The first is to surface the contents of these sources from the deep Web and add the contents to the index of regular search engines. The second is to integrate the searching capabilities of these sources and support integrated access to them. In this book, we introduce the state-of-the-art tech

  3. Downloading Multiple Records Using Query Strings

    Directory of Open Access Journals (Sweden)

    Adam Crymble

    2012-11-01

    Full Text Available Downloading a single record from a website is easy, but downloading many records at a time – an increasingly frequent need for a historian – is much more efficient using a programming language such as Python. In this lesson, we will write a program that will download a series of records from the Old Bailey Online using custom search criteria, and save them to a directory on our computer. This process involves interpreting and manipulating URL Query Strings. In this case, the tutorial will seek to download sources that contain references to people of African descent that were published in the Old Bailey Proceedings between 1700 and 1750.

  4. Insights from an international stakeholder consultation to identify informational needs related to seafood safety

    Energy Technology Data Exchange (ETDEWEB)

    Tediosi, Alice, E-mail: alice.tediosi@aeiforia.eu [Aeiforia Srl, 29027 Gariga di Podenzano (PC) (Italy); Fait, Gabriella [Aeiforia Srl, 29027 Gariga di Podenzano (PC) (Italy); Jacobs, Silke [Department of Public Health, Ghent University, 9000 Ghent (Belgium); Department of Agricultural Economics, Ghent University, 9000 Ghent (Belgium); Verbeke, Wim [Department of Agricultural Economics, Ghent University, 9000 Ghent (Belgium); Álvarez-Muñoz, Diana [Catalan Institute for Water Research (ICRA), Parc Científic i Tecnològic de la Universitat de Girona, 17003 Girona (Spain); Diogene, Jorge [IRTA, 43540 Sant Carles de la Ràpita (Spain); Reuver, Marieke [AquaTT, Dublin 2 (Ireland); Marques, António [Division of Aquaculture and Upgrading (DivAV), Portuguese Institute for the Sea and Atmosphere (IPMA), 1449-006 Lisbon (Portugal); Capri, Ettore [Università Cattolica del Sacro Cuore, 29122 Piacenza (Italy)

    2015-11-15

    Food safety assessment and communication have a strong importance in reducing human health risks related to food consumption. The research carried out within the ECsafeSEAFOOD project aims to assess seafood safety issues, mainly related to non-regulated priority environmental contaminants, and to evaluate their impact on public health. In order to make the research results accessible and exploitable, and to respond to actual stakeholders' demands, a consultation with international stakeholders was performed by means of a survey. The focus was on policy and decision makers, food producers and processors, and agencies (i.e. EU and National or Regional agencies related to Food Safety or Public Health) and consumer organisations. The survey considered questions related to: seafood safety assessment and mitigation strategies, availability of data, such as the level of information on different contaminants, and communication among different stakeholder groups. Furthermore, stakeholders were asked to give their opinion on how they believe consumers perceive risks associated with environmental contaminants. The survey was distributed to 531 key stakeholders and 91 responses were received from stakeholders from 30 EU and non-EU countries. The main results show that communication between different groups of stakeholders needs to be improved and that there is a deficit of information and data in the field of seafood safety. This pertains mainly to the transfer of contaminants between the environment and seafood, and to the diversity of environmental contaminants such as plastic additives, algal toxins and hormones. On-line tools were perceived to be the most useful communication channel. - Highlights: • We consulted stakeholders to identify their needs about seafood safety. • An on-line survey was prepared and sent to gather stakeholders' opinions. • Communication among stakeholders needs to be improved. • There is a deficit of information and data in the

  5. Insights from an international stakeholder consultation to identify informational needs related to seafood safety

    International Nuclear Information System (INIS)

    Tediosi, Alice; Fait, Gabriella; Jacobs, Silke; Verbeke, Wim; Álvarez-Muñoz, Diana; Diogene, Jorge; Reuver, Marieke; Marques, António; Capri, Ettore

    2015-01-01

    Food safety assessment and communication have a strong importance in reducing human health risks related to food consumption. The research carried out within the ECsafeSEAFOOD project aims to assess seafood safety issues, mainly related to non-regulated priority environmental contaminants, and to evaluate their impact on public health. In order to make the research results accessible and exploitable, and to respond to actual stakeholders' demands, a consultation with international stakeholders was performed by means of a survey. The focus was on policy and decision makers, food producers and processors, and agencies (i.e. EU and National or Regional agencies related to Food Safety or Public Health) and consumer organisations. The survey considered questions related to: seafood safety assessment and mitigation strategies, availability of data, such as the level of information on different contaminants, and communication among different stakeholder groups. Furthermore, stakeholders were asked to give their opinion on how they believe consumers perceive risks associated with environmental contaminants. The survey was distributed to 531 key stakeholders and 91 responses were received from stakeholders from 30 EU and non-EU countries. The main results show that communication between different groups of stakeholders needs to be improved and that there is a deficit of information and data in the field of seafood safety. This pertains mainly to the transfer of contaminants between the environment and seafood, and to the diversity of environmental contaminants such as plastic additives, algal toxins and hormones. On-line tools were perceived to be the most useful communication channel. - Highlights: • We consulted stakeholders to identify their needs about seafood safety. • An on-line survey was prepared and sent to gather stakeholders' opinions. • Communication among stakeholders needs to be improved. • There is a deficit of information and data in the field of

  6. Identifying areas of need relative to liver disease: geographic clustering within a health service district.

    Science.gov (United States)

    El-Atem, Nathan; Irvine, Katharine M; Valery, Patricia C; Wojcik, Kyle; Horsfall, Leigh; Johnson, Tracey; Janda, Monika; McPhail, Steven M; Powell, Elizabeth E

    2017-08-01

    accessing tertiary hospital liver services are clustered within specific geographic areas. The most striking geographic clustering was seen for people living with chronic hepatitis B, in regions with a relatively high proportion of people born in Vietnam and China. In addition to ethnicity, the data show an apparent ecological association between liver disease and both socioeconomic and educational and/or occupational disadvantage. What are the implications for practitioners? Identifying where demand for clinical services arises is an important step for service planning and preparing for potential outreach programs to optimise community-based care. It is likely that outreach programs to engage and enhance primary care services in geographic areas from which the greatest demand for tertiary liver disease speciality care arises would yield greater relative return on investment than non-targeted outreach programs.

  7. A methodological survey identified eight proposed frameworks for the adaptation of health related guidelines.

    Science.gov (United States)

    Darzi, Andrea; Abou-Jaoude, Elias A; Agarwal, Arnav; Lakis, Chantal; Wiercioch, Wojtek; Santesso, Nancy; Brax, Hneine; El-Jardali, Fadi; Schünemann, Holger J; Akl, Elie A

    2017-06-01

    five key steps strategy for adaptation of guidelines to the local context. The SGR method consists of nine steps and takes into consideration both methodological gaps and context-specific normative issues in source guidelines. We identified through searching personal files two abandoned methods. We identified and described eight proposed frameworks for the adaptation of health-related guidelines. There is a need to evaluate these different frameworks to assess rigor, efficiency, and transparency of their proposed processes. Copyright © 2017 Elsevier Inc. All rights reserved.

  8. Protecting count queries in study design.

    Science.gov (United States)

    Vinterbo, Staal A; Sarwate, Anand D; Boxwala, Aziz A

    2012-01-01

    Today's clinical research institutions provide tools for researchers to query their data warehouses for counts of patients. To protect patient privacy, counts are perturbed before reporting; this compromises their utility for increased privacy. The goal of this study is to extend current query answer systems to guarantee a quantifiable level of privacy and allow users to tailor perturbations to maximize the usefulness according to their needs. A perturbation mechanism was designed in which users are given options with respect to scale and direction of the perturbation. The mechanism translates the true count, user preferences, and a privacy level within administrator-specified bounds into a probability distribution from which the perturbed count is drawn. Users can significantly impact the scale and direction of the count perturbation and can receive more accurate final cohort estimates. Strong and semantically meaningful differential privacy is guaranteed, providing for a unified privacy accounting system that can support role-based trust levels. This study provides an open source web-enabled tool to investigate visually and numerically the interaction between system parameters, including required privacy level and user preference settings. Quantifying privacy allows system administrators to provide users with a privacy budget and to monitor its expenditure, enabling users to control the inevitable loss of utility. While current measures of privacy are conservative, this system can take advantage of future advances in privacy measurement. The system provides new ways of trading off privacy and utility that are not provided in current study design systems.

  9. Transcriptome Sequencing of Chemically Induced Aquilaria sinensis to Identify Genes Related to Agarwood Formation.

    Science.gov (United States)

    Ye, Wei; Wu, Hongqing; He, Xin; Wang, Lei; Zhang, Weimin; Li, Haohua; Fan, Yunfei; Tan, Guohui; Liu, Taomei; Gao, Xiaoxia

    2016-01-01

    Agarwood is a traditional Chinese medicine used as a clinical sedative, carminative, and antiemetic drug. Agarwood is formed in Aquilaria sinensis when A. sinensis trees are threatened by external physical, chemical injury or endophytic fungal irritation. However, the mechanism of agarwood formation via chemical induction remains unclear. In this study, we characterized the transcriptome of different parts of a chemically induced A. sinensis trunk sample with agarwood. The Illumina sequencing platform was used to identify the genes involved in agarwood formation. A five-year-old Aquilaria sinensis treated by formic acid was selected. The white wood part (B1 sample), the transition part between agarwood and white wood (W2 sample), the agarwood part (J3 sample), and the rotten wood part (F5 sample) were collected for transcriptome sequencing. Accordingly, 54,685,634 clean reads, which were assembled into 83,467 unigenes, were obtained with a Q20 value of 97.5%. A total of 50,565 unigenes were annotated using the Nr, Nt, SWISS-PROT, KEGG, COG, and GO databases. In particular, 171,331,352 unigenes were annotated by various pathways, including the sesquiterpenoid (ko00909) and plant-pathogen interaction (ko03040) pathways. These pathways were related to sesquiterpenoid biosynthesis and defensive responses to chemical stimulation. The transcriptome data of the different parts of the chemically induced A. sinensis trunk provide a rich source of materials for discovering and identifying the genes involved in sesquiterpenoid production and in defensive responses to chemical stimulation. This study is the first to use de novo sequencing and transcriptome assembly for different parts of chemically induced A. sinensis. Results demonstrate that the sesquiterpenoid biosynthesis pathway and WRKY transcription factor play important roles in agarwood formation via chemical induction. The comparative analysis of the transcriptome data of agarwood and A. sinensis lays the foundation

  10. A few examples go a long way: Constructing query models from elaborate query formulations

    NARCIS (Netherlands)

    Balog, K.; Weerkamp, W.; de Rijke, M.; Myaeng, S.-H.; Oard, D.W.; Sebastiani, F.; Chua, T.-S.; Leong, M.-K.

    2008-01-01

    We address a specific enterprise document search scenario, where the information need is expressed in an elaborate manner. In our scenario, information needs are expressed using a short query (of a few keywords) together with examples of key reference pages. Given this setup, we investigate how the

  11. Query-Time Optimization Techniques for Structured Queries in Information Retrieval

    Science.gov (United States)

    Cartright, Marc-Allen

    2013-01-01

    The use of information retrieval (IR) systems is evolving towards larger, more complicated queries. Both the IR industrial and research communities have generated significant evidence indicating that in order to continue improving retrieval effectiveness, increases in retrieval model complexity may be unavoidable. From an operational perspective,…

  12. Infodemiology of status epilepticus: A systematic validation of the Google Trends-based search queries.

    Science.gov (United States)

    Bragazzi, Nicola Luigi; Bacigaluppi, Susanna; Robba, Chiara; Nardone, Raffaele; Trinka, Eugen; Brigo, Francesco

    2016-02-01

    People increasingly use Google looking for health-related information. We previously demonstrated that in English-speaking countries most people use this search engine to obtain information on status epilepticus (SE) definition, types/subtypes, and treatment. Now, we aimed at providing a quantitative analysis of SE-related web queries. This analysis represents an advancement, with respect to what was already previously discussed, in that the Google Trends (GT) algorithm has been further refined and correlational analyses have been carried out to validate the GT-based query volumes. Google Trends-based SE-related query volumes were well correlated with information concerning causes and pharmacological and nonpharmacological treatments. Google Trends can provide both researchers and clinicians with data on realities and contexts that are generally overlooked and underexplored by classic epidemiology. In this way, GT can foster new epidemiological studies in the field and can complement traditional epidemiological tools. Copyright © 2015 Elsevier Inc. All rights reserved.

  13. Identifying medication-related needs of HIV patients: foundation for community pharmacist-based services

    Directory of Open Access Journals (Sweden)

    Yardlee Kauffman

    2014-01-01

    Full Text Available Background: Patients living with HIV/AIDS have complex medication regimens. Pharmacists within community pharmacy settings can have a role managing patients living with HIV/AIDS. Patients' perspectives surrounding implementation about community pharmacist-based services is needed as limited information is available. Objective: To identify medication-related needs of HIV-infected patients who receive prescriptions from a community pharmacy. To determine patient perspectives and knowledge of community pharmacist-based services. Methods: A qualitative research study involving in-depth, semi-structured interviews with patients was conducted. Inclusion criteria included: HIV positive men and women at least 18 years of age who receive care at a HIV clinic, currently take medication(s and use a community pharmacy for all prescription fills. Patients were recruited from one urban and one rural health center. Patients answered questions about their perceptions and knowledge about the role and value of pharmacy services and completed a demographic survey. The recordings of the interviews were transcribed verbatim and were analyzed using principles of Grounded Theory. Results: Twenty-nine interviews were conducted: 15 participants from the urban site and 14 from the rural site. Five main themes emerged including: patients experience ongoing and varying medication-related needs; patients desire a pharmacist who is caring, knowledgeable and integrated with health care providers; patients expect ready access to drug therapy; patients value an individualized patient encounter, and patients need to be informed that a pharmacist-service exists. Conclusion: Patients with HIV value individualized and personal encounters with pharmacists at time intervals that are convenient for the patient. Patients felt that a one-on-one encounter with a pharmacist would be most valuable when initiating or modifying medication therapy. These patient perspectives can be useful for

  14. CrossQuery: a web tool for easy associative querying of transcriptome data.

    Directory of Open Access Journals (Sweden)

    Toni U Wagner

    Full Text Available Enormous amounts of data are being generated by modern methods such as transcriptome or exome sequencing and microarray profiling. Primary analyses such as quality control, normalization, statistics and mapping are highly complex and need to be performed by specialists. Thereafter, results are handed back to biomedical researchers, who are then confronted with complicated data lists. For rather simple tasks like data filtering, sorting and cross-association there is a need for new tools which can be used by non-specialists. Here, we describe CrossQuery, a web tool that enables straight forward, simple syntax queries to be executed on transcriptome sequencing and microarray datasets. We provide deep-sequencing data sets of stem cell lines derived from the model fish Medaka and microarray data of human endothelial cells. In the example datasets provided, mRNA expression levels, gene, transcript and sample identification numbers, GO-terms and gene descriptions can be freely correlated, filtered and sorted. Queries can be saved for later reuse and results can be exported to standard formats that allow copy-and-paste to all widespread data visualization tools such as Microsoft Excel. CrossQuery enables researchers to quickly and freely work with transcriptome and microarray data sets requiring only minimal computer skills. Furthermore, CrossQuery allows growing association of multiple datasets as long as at least one common point of correlated information, such as transcript identification numbers or GO-terms, is shared between samples. For advanced users, the object-oriented plug-in and event-driven code design of both server-side and client-side scripts allow easy addition of new features, data sources and data types.

  15. CrossQuery: a web tool for easy associative querying of transcriptome data.

    Science.gov (United States)

    Wagner, Toni U; Fischer, Andreas; Thoma, Eva C; Schartl, Manfred

    2011-01-01

    Enormous amounts of data are being generated by modern methods such as transcriptome or exome sequencing and microarray profiling. Primary analyses such as quality control, normalization, statistics and mapping are highly complex and need to be performed by specialists. Thereafter, results are handed back to biomedical researchers, who are then confronted with complicated data lists. For rather simple tasks like data filtering, sorting and cross-association there is a need for new tools which can be used by non-specialists. Here, we describe CrossQuery, a web tool that enables straight forward, simple syntax queries to be executed on transcriptome sequencing and microarray datasets. We provide deep-sequencing data sets of stem cell lines derived from the model fish Medaka and microarray data of human endothelial cells. In the example datasets provided, mRNA expression levels, gene, transcript and sample identification numbers, GO-terms and gene descriptions can be freely correlated, filtered and sorted. Queries can be saved for later reuse and results can be exported to standard formats that allow copy-and-paste to all widespread data visualization tools such as Microsoft Excel. CrossQuery enables researchers to quickly and freely work with transcriptome and microarray data sets requiring only minimal computer skills. Furthermore, CrossQuery allows growing association of multiple datasets as long as at least one common point of correlated information, such as transcript identification numbers or GO-terms, is shared between samples. For advanced users, the object-oriented plug-in and event-driven code design of both server-side and client-side scripts allow easy addition of new features, data sources and data types.

  16. Proteomic profiling identifies markers for inflammation-related tumor-fibroblast interaction.

    Science.gov (United States)

    Drev, Daniel; Bileck, Andrea; Erdem, Zeynep N; Mohr, Thomas; Timelthaler, Gerald; Beer, Andrea; Gerner, Christopher; Marian, Brigitte

    2017-01-01

    Cancer associated fibroblasts are activated in the tumor microenvironment and contribute to tumor progression, angiogenesis, extracellular matrix remodeling, and inflammation. To identify proteins characteristic for fibroblasts in colorectal cancer we used liquid chromatography-tandem mass spectrometry to derive protein abundance from whole-tissue homogenates of human colorectal cancer/normal mucosa pairs. Alterations of protein levels were determined by two-sided t test with greater than threefold difference and an FDR of matrix organization, TGFβ receptor signaling and angiogenesis mainly originating from the stroma. Most prominent were increased abundance of SerpinB5 in the parenchyme and latent transforming growth factor β-binding protein, thrombospondin-B2, and secreted protein acidic-and-cysteine-rich in the stroma. Extracellular matrix remodeling involved collagens type VIII, XII, XIV, and VI as well as lysyl-oxidase-2. In silico analysis of mRNA levels demonstrated altered expression in the tumor and the adjacent normal tissue as compared to mucosa of healthy individuals indicating that inflammatory activation affected the surrounding tissue. Immunohistochemistry of 26 tumor specimen confirmed upregulation of SerpinB5, thrombospondin B2 and secreted protein acidic-and-cysteine-rich. This study demonstrates the feasibility of detecting tumor- and compartment-specific protein-signatures that are functionally meaningful by proteomic profiling of whole-tissue extracts together with mining of RNA expression datasets. The results provide the basis for further exploration of inflammation-related stromal markers in larger patient cohorts and experimental models.

  17. Comparative Transcriptome Analysis Identifies Candidate Genes Related to Skin Color Differentiation in Red Tilapia.

    Science.gov (United States)

    Zhu, Wenbin; Wang, Lanmei; Dong, Zaijie; Chen, Xingting; Song, Feibiao; Liu, Nian; Yang, Hui; Fu, Jianjun

    2016-08-11

    Red tilapia is becoming more popular for aquaculture production in China in recent years. However, the pigmentation differentiation in genetic breeding is the main problem limiting its development of commercial red tilapia culture and the genetic basis of skin color variation is still unknown. In this study, we conducted Illumina sequencing of transcriptome on three color variety red tilapia. A total of 224,895,758 reads were generated, resulting in 160,762 assembled contigs that were used as reference contigs. The contigs of red tilapia transcriptome had hits in the range of 53.4% to 86.7% of the unique proteins of zebrafish, fugu, medaka, three-spined stickleback and tilapia. And 44,723 contigs containing 77,423 simple sequence repeats (SSRs) were identified, with 16,646 contigs containing more than one SSR. Three skin transcriptomes were compared pairwise and the results revealed that there were 148 common significantly differentially expressed unigenes and several key genes related to pigment synthesis, i.e. tyr, tyrp1, silv, sox10, slc24a5, cbs and slc7a11, were included. The results will facilitate understanding the molecular mechanisms of skin pigmentation differentiation in red tilapia and accelerate the molecular selection of the specific strain with consistent skin colors.

  18. WANO Activities Related to Identifying and Reducing the Likelihood for Recurring Events

    International Nuclear Information System (INIS)

    Llewellyn, Michael

    2003-01-01

    Since its inception, WANO has encouraged members to share operating experience and events information through the WANO Operating Experience Programme. Preventing recurring events is a prime reason for sharing events information. Over 2500 events have been shared through WANO since 1989. However, a review of WANO activities in 1997 identified that this information was not being used very well by WANO members, and that WANO was not adding much 'value' to the events sharing process. At the time, WANO only provided the 'postal exchange' function for events sharing, and was not reviewing events across WANO to help members focus on the really important issues. Often, these very important issues involve recurring events. As a result of the 1997 review of the WANO operating experience process, WANO re-sourced and developed new analysis capabilities and began producing new types of reports for its members. The resource commitment includes four seconded engineers (one from each WANO Regional Centre) brought to Paris to staff a WANO Operating Experience Central Team. This team analyses events across WANO, writes WANO event reports, provides operating experience-related training, and provides technical support to members to improve their use of operating experience information. One focus area for events analysis by WANO is the identification and subsequent industry notification of recurring events. As a result of the 1997 review of WANO activities, in 1998 WANO began production of several new types of event-based reports to communicate significant industry events to our members. A key focus of analysis of these significant events is whether they are recurring events (that is, very similar to previous events either at that NPP or another NPP in the industry). The reports are called Significant Operating Experience Reports (SOERs) and Significant Event Reports (SERs). Significant Operating Experience Reports (SOERs) are written by WANO when several event reports indicate that

  19. Tissue plasminogen activator; identifying major barriers related to intravenous injection in ischemic acute cerebral infraction

    Directory of Open Access Journals (Sweden)

    Fariborz Khorvash

    2017-01-01

    Full Text Available Background: According to previous publications, in patients with acute ischemic cerebral infarction, thrombolytic therapy using intravenous tissue plasminogen activator (IV-tPA necessitates precise documentation of symptoms' onset. The aim of this study was to identify major barriers related to the IV-tPA injection in such patients. Materials and Methods: Between the year 2014-2015, patients with definitive diagnosis of acute cerebral infarction (n = 180 who attended the neurology ward located at the Isfahan Alzahra Hospital were studied. To investigate barriers related to door to IV-tPA needle time, personal reasons, and criteria for inclusion or exclusion of patients, three questionnaire forms were designed based on the Food and Drug Administration-approved indications or contraindications. Results: The mean age of males versus females was 60 versus 77.5 years (ranged 23–93 vs. 29–70 years, respectively. Out of total population, only 10.7% transferred to hospital in <4.5 h after the onset of symptoms. Regarding to eligibility for IV-tPA, 68.9% of total population have had criteria for such treatment. Concerning to both items such as transferring to hospital in <4.5 h after the onset of symptoms and eligibility for IV-tPA, only 6.6% of total population met the criteria for such management. There was ignorance or inattention to symptoms in 75% of population studied. There was a mean of 195.92 ± 6.65 min (182.8–209.04 min for door to IV-tPA needle time. Conclusion: Despite the international guidelines for IV-tPA injection within 3–4.5 h of ischemic stroke symptoms' onset, the results of this study revealed that falling time due to ignorance of symptoms, literacy, and living alone might need further attention. As a result, to decrease death and disability, educational programs related to the symptoms' onset by consultant neurologist in Isfahan/Iran seem to be advantageous.

  20. Scoping review on search queries and social media for disease surveillance: a chronology of innovation.

    Science.gov (United States)

    Bernardo, Theresa Marie; Rajic, Andrijana; Young, Ian; Robiadek, Katie; Pham, Mai T; Funk, Julie A

    2013-07-18

    The threat of a global pandemic posed by outbreaks of influenza H5N1 (1997) and Severe Acute Respiratory Syndrome (SARS, 2002), both diseases of zoonotic origin, provoked interest in improving early warning systems and reinforced the need for combining data from different sources. It led to the use of search query data from search engines such as Google and Yahoo! as an indicator of when and where influenza was occurring. This methodology has subsequently been extended to other diseases and has led to experimentation with new types of social media for disease surveillance. The objective of this scoping review was to formally assess the current state of knowledge regarding the use of search queries and social media for disease surveillance in order to inform future work on early detection and more effective mitigation of the effects of foodborne illness. Structured scoping review methods were used to identify, characterize, and evaluate all published primary research, expert review, and commentary articles regarding the use of social media in surveillance of infectious diseases from 2002-2011. Thirty-two primary research articles and 19 reviews and case studies were identified as relevant. Most relevant citations were peer-reviewed journal articles (29/32, 91%) published in 2010-11 (28/32, 88%) and reported use of a Google program for surveillance of influenza. Only four primary research articles investigated social media in the context of foodborne disease or gastroenteritis. Most authors (21/32 articles, 66%) reported that social media-based surveillance had comparable performance when compared to an existing surveillance program. The most commonly reported strengths of social media surveillance programs included their effectiveness (21/32, 66%) and rapid detection of disease (21/32, 66%). The most commonly reported weaknesses were the potential for false positive (16/32, 50%) and false negative (11/32, 34%) results. Most authors (24/32, 75%) recommended that

  1. Extending OLAP Querying to External Object

    DEFF Research Database (Denmark)

    Pedersen, Torben Bach; Shoshani, Arie; Gu, Junmin

    On-Line Analytical Processing (OLAP) systems based on a dimensional view of data have found widespread use in business applications and are being used increasingly in non-standard applications. These systems provide good performance and ease-of-use. However, the complex structures and relationships...... inherent in data in nonstandard applications are not accommodated well by OLAP systems. In contrast, object database systems are built to handle such complexity, but do not support OLAP-type querying well. This paper presents the concepts and techniques underlying a flexible, multi-model federated system...... that enables OLAP users to exploit simultaneously the features of OLAP and object systems. The system allows data to be handled using the most appropriate data model and technology: OLAP systems for dimensional data and object database systems for more complex, general data. Additionally, physical data...

  2. Privacy-Aware Relevant Data Access with Semantically Enriched Search Queries for Untrusted Cloud Storage Services.

    Science.gov (United States)

    Pervez, Zeeshan; Ahmad, Mahmood; Khattak, Asad Masood; Lee, Sungyoung; Chung, Tae Choong

    2016-01-01

    Privacy-aware search of outsourced data ensures relevant data access in the untrusted domain of a public cloud service provider. Subscriber of a public cloud storage service can determine the presence or absence of a particular keyword by submitting search query in the form of a trapdoor. However, these trapdoor-based search queries are limited in functionality and cannot be used to identify secure outsourced data which contains semantically equivalent information. In addition, trapdoor-based methodologies are confined to pre-defined trapdoors and prevent subscribers from searching outsourced data with arbitrarily defined search criteria. To solve the problem of relevant data access, we have proposed an index-based privacy-aware search methodology that ensures semantic retrieval of data from an untrusted domain. This method ensures oblivious execution of a search query and leverages authorized subscribers to model conjunctive search queries without relying on predefined trapdoors. A security analysis of our proposed methodology shows that, in a conspired attack, unauthorized subscribers and untrusted cloud service providers cannot deduce any information that can lead to the potential loss of data privacy. A computational time analysis on commodity hardware demonstrates that our proposed methodology requires moderate computational resources to model a privacy-aware search query and for its oblivious evaluation on a cloud service provider.

  3. RCQ-GA: RDF Chain Query Optimization Using Genetic Algorithms

    Science.gov (United States)

    Hogenboom, Alexander; Milea, Viorel; Frasincar, Flavius; Kaymak, Uzay

    The application of Semantic Web technologies in an Electronic Commerce environment implies a need for good support tools. Fast query engines are needed for efficient querying of large amounts of data, usually represented using RDF. We focus on optimizing a special class of SPARQL queries, the so-called RDF chain queries. For this purpose, we devise a genetic algorithm called RCQ-GA that determines the order in which joins need to be performed for an efficient evaluation of RDF chain queries. The approach is benchmarked against a two-phase optimization algorithm, previously proposed in literature. The more complex a query is, the more RCQ-GA outperforms the benchmark in solution quality, execution time needed, and consistency of solution quality. When the algorithms are constrained by a time limit, the overall performance of RCQ-GA compared to the benchmark further improves.

  4. GMB: An Efficient Query Processor for Biological Data

    Directory of Open Access Journals (Sweden)

    Taha Kamal

    2011-06-01

    Full Text Available Bioinformatics applications manage complex biological data stored into distributed and often heterogeneous databases and require large computing power. These databases are too big and complicated to be rapidly queried every time a user submits a query, due to the overhead involved in decomposing the queries, sending the decomposed queries to remote databases, and composing the results. There is also considerable communication costs involved. This study addresses the mentioned problems in Grid-based environment for bioinformatics. We propose a Grid middleware called GMB that alleviates these problems by caching the results of Frequently Used Queries (FUQ. Queries are classified based on their types and frequencies. FUQ are answered from the middleware, which improves their response time. GMB acts as a gateway to TeraGrid Grid: it resides between users’ applications and TeraGrid Grid. We evaluate GMB experimentally.

  5. Application of cluster analysis to geochemical compositional data for identifying ore-related geochemical anomalies

    Science.gov (United States)

    Zhou, Shuguang; Zhou, Kefa; Wang, Jinlin; Yang, Genfang; Wang, Shanshan

    2017-12-01

    Cluster analysis is a well-known technique that is used to analyze various types of data. In this study, cluster analysis is applied to geochemical data that describe 1444 stream sediment samples collected in northwestern Xinjiang with a sample spacing of approximately 2 km. Three algorithms (the hierarchical, k-means, and fuzzy c-means algorithms) and six data transformation methods (the z-score standardization, ZST; the logarithmic transformation, LT; the additive log-ratio transformation, ALT; the centered log-ratio transformation, CLT; the isometric log-ratio transformation, ILT; and no transformation, NT) are compared in terms of their effects on the cluster analysis of the geochemical compositional data. The study shows that, on the one hand, the ZST does not affect the results of column- or variable-based (R-type) cluster analysis, whereas the other methods, including the LT, the ALT, and the CLT, have substantial effects on the results. On the other hand, the results of the row- or observation-based (Q-type) cluster analysis obtained from the geochemical data after applying NT and the ZST are relatively poor. However, we derive some improved results from the geochemical data after applying the CLT, the ILT, the LT, and the ALT. Moreover, the k-means and fuzzy c-means clustering algorithms are more reliable than the hierarchical algorithm when they are used to cluster the geochemical data. We apply cluster analysis to the geochemical data to explore for Au deposits within the study area, and we obtain a good correlation between the results retrieved by combining the CLT or the ILT with the k-means or fuzzy c-means algorithms and the potential zones of Au mineralization. Therefore, we suggest that the combination of the CLT or the ILT with the k-means or fuzzy c-means algorithms is an effective tool to identify potential zones of mineralization from geochemical data.

  6. Identifying seasonal and temporal trends in the pressures experienced by hospitals related to unscheduled care.

    Science.gov (United States)

    Walker, N J; Van Woerden, H C; Kiparoglou, V; Yang, Y

    2016-07-26

    As part of an electronic dashboard operated by Public Health Wales, senior managers at hospitals in Wales report daily "escalation" scores which reflect management opinion on the pressure a hospital is experiencing and ability to meet ongoing demand with respect to unscheduled care. An analysis was undertaken of escalation scores returned for 18 hospitals in Wales between the years 2006 and 2014 inclusive, with a view to identifying systematic temporal patterns in pressure experienced by hospitals in relation to unscheduled care. Exploratory data analysis indicated the presence of within-year cyclicity in average daily scores over all hospitals. In order to quantify this cyclicity, a Generalised Linear Mixed Model was fitted which incorporated a trigonometric function (sine and cosine) to capture within-year change in escalation. In addition, a 7-level categorical day of the week effect was fitted as well as a 3-level categorical Christmas holiday variable based on patterns observed in exploration of the raw data. All of the main effects investigated were found to be statistically significant. Firstly, significant differences emerged in terms of overall pressure reported by individual hospitals. Furthermore, escalation scores were found to vary systematically within-year in a wave-like fashion for all hospitals (but not between hospitals) with the period of highest pressure consistently observed to occur in winter and lowest pressure in summer. In addition to this annual variation, pressure reported by hospitals was also found to be influenced by day of the week (low at weekends, high early in the working week) and especially low over the Christmas period but high immediately afterwards. Whilst unpredictable to a degree, quantifiable pressure experienced by hospitals can be anticipated according to models incorporating systematic temporal patterns. In the context of finite resources for healthcare services, these findings could optimise staffing schedules and

  7. Association of eGFR-Related Loci Identified by GWAS with Incident CKD and ESRD.

    Directory of Open Access Journals (Sweden)

    Carsten A Böger

    2011-09-01

    Full Text Available Family studies suggest a genetic component to the etiology of chronic kidney disease (CKD and end stage renal disease (ESRD. Previously, we identified 16 loci for eGFR in genome-wide association studies, but the associations of these single nucleotide polymorphisms (SNPs for incident CKD or ESRD are unknown. We thus investigated the association of these loci with incident CKD in 26,308 individuals of European ancestry free of CKD at baseline drawn from eight population-based cohorts followed for a median of 7.2 years (including 2,122 incident CKD cases defined as eGFR <60ml/min/1.73m(2 at follow-up and with ESRD in four case-control studies in subjects of European ancestry (3,775 cases, 4,577 controls. SNPs at 11 of the 16 loci (UMOD, PRKAG2, ANXA9, DAB2, SHROOM3, DACH1, STC1, SLC34A1, ALMS1/NAT8, UBE2Q2, and GCKR were associated with incident CKD; p-values ranged from p = 4.1e-9 in UMOD to p = 0.03 in GCKR. After adjusting for baseline eGFR, six of these loci remained significantly associated with incident CKD (UMOD, PRKAG2, ANXA9, DAB2, DACH1, and STC1. SNPs in UMOD (OR = 0.92, p = 0.04 and GCKR (OR = 0.93, p = 0.03 were nominally associated with ESRD. In summary, the majority of eGFR-related loci are either associated or show a strong trend towards association with incident CKD, but have modest associations with ESRD in individuals of European descent. Additional work is required to characterize the association of genetic determinants of CKD and ESRD at different stages of disease progression.

  8. PiCO QL: A software library for runtime interactive queries on program data

    Science.gov (United States)

    Fragkoulis, Marios; Spinellis, Diomidis; Louridas, Panos

    PiCO QL is an open source C/C++ software whose scientific scope is real-time interactive analysis of in-memory data through SQL queries. It exposes a relational view of a system's or application's data structures, which is queryable through SQL. While the application or system is executing, users can input queries through a web-based interface or issue web service requests. Queries execute on the live data structures through the respective relational views. PiCO QL makes a good candidate for ad-hoc data analysis in applications and for diagnostics in systems settings. Applications of PiCO QL include the Linux kernel, the Valgrind instrumentation framework, a GIS application, a virtual real-time observatory of stellar objects, and a source code analyser.

  9. PiCO QL: A software library for runtime interactive queries on program data

    Directory of Open Access Journals (Sweden)

    Marios Fragkoulis

    2016-01-01

    Full Text Available Pico ql is an open source c/c++ software whose scientific scope is real-time interactive analysis of in-memory data through sql queries. It exposes a relational view of a system’s or application’s data structures, which is queryable through sql. While the application or system is executing, users can input queries through a web-based interface or issue web service requests. Queries execute on the live data structures through the respective relational views. pico ql makes a good candidate for ad-hoc data analysis in applications and for diagnostics in systems settings. Applications of pico ql include the Linux kernel, the Valgrind instrumentation framework, a gis application, a virtual real-time observatory of stellar objects, and a source code analyser.

  10. Seasonal trends in tinnitus symptomatology: evidence from Internet search engine query data.

    Science.gov (United States)

    Plante, David T; Ingram, David G

    2015-10-01

    The primary aim of this study was to test the hypothesis that the symptom of tinnitus demonstrates a seasonal pattern with worsening in the winter relative to the summer using Internet search engine query data. Normalized search volume for the term 'tinnitus' from January 2004 through December 2013 was retrieved from Google Trends. Seasonal effects were evaluated using cosinor regression models. Primary countries of interest were the United States and Australia. Secondary exploratory analyses were also performed using data from Germany, the United Kingdom, Canada, Sweden, and Switzerland. Significant seasonal effects for 'tinnitus' search queries were found in the United States and Australia (p search volume in the winter relative to the summer. Our findings indicate that there are significant seasonal trends for Internet search queries for tinnitus, with a zenith in winter months. Further research is indicated to determine the biological mechanisms underlying these findings, as they may provide insights into the pathophysiology of this common and debilitating medical symptom.

  11. Using Assertion Capabilities of an OWL-Based Ontology for Query Formulation

    CERN Document Server

    Munir, Kamran; Bloodsworth, Peter; McClatchey, Richard

    2008-01-01

    This paper reports on the development of a framework to assist users in formulating relational queries without requiring a complete knowledge of the information structure and access mechanisms to underlying data sources. The emphasis here is on exploiting the semantic relationships and assertion capabilities of OWL ontologies to assist clinicians in writing complex queries. This has been achieved using both a bottom-up and top-down approaches to build an ontology model to be the repository for complex end user queries. Relational database schemas are mapped into the newly generated ontology schema to reinforce the current domain ontology being developed. One of the key merits of this approach is that it does not require storing data interpretation, with the added advantage of even not storing database instances as part of the domain ontology, especially for systems with huge volume of data.

  12. Searching Databases without Query-Building Aids: Implications for Dyslexic Users

    Science.gov (United States)

    Berget, Gerd; Sandnes, Frode Eika

    2015-01-01

    Introduction: Few studies document the information searching behaviour of users with cognitive impairments. This paper therefore addresses the effect of dyslexia on information searching in a database with no tolerance for spelling errors and no query-building aids. The purpose was to identify effective search interface design guidelines that…

  13. Hierarchical data security in a Query-By-Example interface for a shared database.

    Science.gov (United States)

    Taylor, Merwyn

    2002-06-01

    Whenever a shared database resource, containing critical patient data, is created, protecting the contents of the database is a high priority goal. This goal can be achieved by developing a Query-By-Example (QBE) interface, designed to access a shared database, and embedding within the QBE a hierarchical security module that limits access to the data. The security module ensures that researchers working in one clinic do not get access to data from another clinic. The security can be based on a flexible taxonomy structure that allows ordinary users to access data from individual clinics and super users to access data from all clinics. All researchers submit queries through the same interface and the security module processes the taxonomy and user identifiers to limit access. Using this system, two different users with different access rights can submit the same query and get different results thus reducing the need to create different interfaces for different clinics and access rights.

  14. The SQL++ Query Language: Configurable, Unifying and Semi-structured

    OpenAIRE

    Ong, Kian Win; Papakonstantinou, Yannis; Vernoux, Romain

    2014-01-01

    NoSQL databases support semi-structured data, typically modeled as JSON. They also provide limited (but expanding) query languages. Their idiomatic, non-SQL language constructs, the many variations, and the lack of formal semantics inhibit deep understanding of the query languages, and also impede progress towards clean, powerful, declarative query languages. This paper specifies the syntax and semantics of SQL++, which is applicable to both JSON native stores and SQL databases. The SQL++ sem...

  15. Adaptive and Optimized RDF Query Interface for Distributed WFS Data

    Directory of Open Access Journals (Sweden)

    Tian Zhao

    2017-04-01

    Full Text Available Web Feature Service (WFS is a protocol for accessing geospatial data stores such as databases and Shapefiles over the Web. However, WFS does not provide direct access to data distributed in multiple servers. In addition, WFS features extracted from their original sources are not convenient for user access due to the lack of connection to high-level concepts. Users are facing the choices of either querying each WFS server first and then integrating the results, or converting the data from all WFS servers to a more expressive format such as RDF (Resource Description Framework and then querying the integrated data. The first choice requires additional programming while the second choice is not practical for large or frequently updated datasets. The new contribution of this paper is that we propose a novel adaptive and optimized RDF query interface to overcome the aforementioned limitation. Specifically, in this paper, we propose a novel algorithm to query and synthesize distributed WFS data through an RDF query interface, where users can specify data requests to multiple WFS servers using a single RDF query. Users can also define a simple configuration to associate WFS feature types, attributes, and values with RDF classes, properties, and values so that user queries can be written using a more uniform and informative vocabulary. The algorithm translates each RDF query written in SPARQL-like syntax to multiple WFS GetFeature requests, and then converts and integrates the multiple WFS results to get the answers to the original query. The generated GetFeature requests are sent asynchronously and simultaneously to WFS servers to take advantage of the server parallelism. The results of each GetFeature request are cached to improve query response time for subsequent queries that involve one or more of the cached requests. A JavaScript-based prototype is implemented and experimental results show that the query response time can be greatly reduced through

  16. Can Internet search queries help to predict stock market volatility?

    OpenAIRE

    Dimpfl, Thomas; Jank, Stephan

    2011-01-01

    This paper studies the dynamics of stock market volatility and retail investor attention measured by internet search queries. We find a strong co-movement of stock market indices’ realized volatility and the search queries for their names. Furthermore, Granger causality is bi-directional: high searches follow high volatility, and high volatility follows high searches. Using the latter feedback effect to predict volatility we find that search queries contain additional information about market...

  17. An informatics supported web-based data annotation and query tool to expedite translational research for head and neck malignancies

    International Nuclear Information System (INIS)

    Amin, Waqas; Kang, Hyunseok P; Egloff, Ann Marie; Singh, Harpreet; Trent, Kerry; Ridge-Hetrick, Jennifer; Seethala, Raja R; Grandis, Jennifer; Parwani, Anil V

    2009-01-01

    The Specialized Program of Research Excellence (SPORE) in Head and Neck Cancer neoplasm virtual biorepository is a bioinformatics-supported system to incorporate data from various clinical, pathological, and molecular systems into a single architecture based on a set of common data elements (CDEs) that provides semantic and syntactic interoperability of data sets. The various components of this annotation tool include the Development of Common Data Elements (CDEs) that are derived from College of American Pathologists (CAP) Checklist and North American Association of Central Cancer Registries (NAACR) standards. The Data Entry Tool is a portable and flexible Oracle-based data entry device, which is an easily mastered web-based tool. The Data Query Tool helps investigators and researchers to search de-identified information within the warehouse/resource through a 'point and click' interface, thus enabling only the selected data elements to be essentially copied into a data mart using a multi dimensional model from the warehouse's relational structure. The SPORE Head and Neck Neoplasm Database contains multimodal datasets that are accessible to investigators via an easy to use query tool. The database currently holds 6553 cases and 10607 tumor accessions. Among these, there are 965 metastatic, 4227 primary, 1369 recurrent, and 483 new primary cases. The data disclosure is strictly regulated by user's authorization. The SPORE Head and Neck Neoplasm Virtual Biorepository is a robust translational biomedical informatics tool that can facilitate basic science, clinical, and translational research. The Data Query Tool acts as a central source providing a mechanism for researchers to efficiently find clinically annotated datasets and biospecimens that are relevant to their research areas. The tool protects patient privacy by revealing only de-identified data in accordance with regulations and approvals of the IRB and scientific review committee

  18. AQBE — QBE Style Queries for Archetyped Data

    Science.gov (United States)

    Sachdeva, Shelly; Yaginuma, Daigo; Chu, Wanming; Bhalla, Subhash

    Large-scale adoption of electronic healthcare applications requires semantic interoperability. The new proposals propose an advanced (multi-level) DBMS architecture for repository services for health records of patients. These also require query interfaces at multiple levels and at the level of semi-skilled users. In this regard, a high-level user interface for querying the new form of standardized Electronic Health Records system has been examined in this study. It proposes a step-by-step graphical query interface to allow semi-skilled users to write queries. Its aim is to decrease user effort and communication ambiguities, and increase user friendliness.

  19. SM4MQ: A Semantic Model for Multidimensional Queries

    DEFF Research Database (Denmark)

    Varga, Jovan; Dobrokhotova, Ekaterina; Romero, Oscar

    2017-01-01

    On-Line Analytical Processing (OLAP) is a data analysis approach to support decision-making. On top of that, Exploratory OLAP is a novel initiative for the convergence of OLAP and the Semantic Web (SW) that enables the use of OLAP techniques on SW data. Moreover, OLAP approaches exploit different......, sharing, and reuse on the SW. As OLAP is based on the underlying multidimensional (MD) data model we denote such queries as MD queries and define SM4MQ: A Semantic Model for Multidimensional Queries. Furthermore, we propose a method to automate the exploitation of queries by means of SPARQL. We apply...

  20. VPipe: Virtual Pipelining for Scheduling of DAG Stream Query Plans

    Science.gov (United States)

    Wang, Song; Gupta, Chetan; Mehta, Abhay

    There are data streams all around us that can be harnessed for tremendous business and personal advantage. For an enterprise-level stream processing system such as CHAOS [1] (Continuous, Heterogeneous Analytic Over Streams), handling of complex query plans with resource constraints is challenging. While several scheduling strategies exist for stream processing, efficient scheduling of complex DAG query plans is still largely unsolved. In this paper, we propose a novel execution scheme for scheduling complex directed acyclic graph (DAG) query plans with meta-data enriched stream tuples. Our solution, called Virtual Pipelined Chain (or VPipe Chain for short), effectively extends the "Chain" pipelining scheduling approach to complex DAG query plans.

  1. Error Checking for Chinese Query by Mining Web Log

    Directory of Open Access Journals (Sweden)

    Jianyong Duan

    2015-01-01

    Full Text Available For the search engine, error-input query is a common phenomenon. This paper uses web log as the training set for the query error checking. Through the n-gram language model that is trained by web log, the queries are analyzed and checked. Some features including query words and their number are introduced into the model. At the same time data smoothing algorithm is used to solve data sparseness problem. It will improve the overall accuracy of the n-gram model. The experimental results show that it is effective.

  2. Multi-Dimensional Top-k Dominating Queries

    DEFF Research Database (Denmark)

    Yiu, Man Lung; Mamoulis, Nikos

    2009-01-01

    The top-k dominating query returns k data objects which dominate the highest number of objects in a dataset. This query is an important tool for decision support since it provides data analysts an intuitive way for finding significant objects. In addition, it combines the advantages of top......-k and skyline queries without sharing their disadvantages: (i) the output size can be controlled, (ii) no ranking functions need to be specified by users, and (iii) the result is independent of the scales at different dimensions. Despite their importance, top-k dominating queries have not received adequate...

  3. The effect of query complexity on Web searching results

    Directory of Open Access Journals (Sweden)

    B.J. Jansen

    2000-01-01

    Full Text Available This paper presents findings from a study of the effects of query structure on retrieval by Web search services. Fifteen queries were selected from the transaction log of a major Web search service in simple query form with no advanced operators (e.g., Boolean operators, phrase operators, etc. and submitted to 5 major search engines - Alta Vista, Excite, FAST Search, Infoseek, and Northern Light. The results from these queries became the baseline data. The original 15 queries were then modified using the various search operators supported by each of the 5 search engines for a total of 210 queries. Each of these 210 queries was also submitted to the applicable search service. The results obtained were then compared to the baseline results. A total of 2,768 search results were returned by the set of all queries. In general, increasing the complexity of the queries had little effect on the results with a greater than 70% overlap in results, on average. Implications for the design of Web search services and directions for future research are discussed.

  4. PAQ: Persistent Adaptive Query Middleware for Dynamic Environments

    Science.gov (United States)

    Rajamani, Vasanth; Julien, Christine; Payton, Jamie; Roman, Gruia-Catalin

    Pervasive computing applications often entail continuous monitoring tasks, issuing persistent queries that return continuously updated views of the operational environment. We present PAQ, a middleware that supports applications' needs by approximating a persistent query as a sequence of one-time queries. PAQ introduces an integration strategy abstraction that allows composition of one-time query responses into streams representing sophisticated spatio-temporal phenomena of interest. A distinguishing feature of our middleware is the realization that the suitability of a persistent query's result is a function of the application's tolerance for accuracy weighed against the associated overhead costs. In PAQ, programmers can specify an inquiry strategy that dictates how information is gathered. Since network dynamics impact the suitability of a particular inquiry strategy, PAQ associates an introspection strategy with a persistent query, that evaluates the quality of the query's results. The result of introspection can trigger application-defined adaptation strategies that alter the nature of the query. PAQ's simple API makes developing adaptive querying systems easily realizable. We present the key abstractions, describe their implementations, and demonstrate the middleware's usefulness through application examples and evaluation.

  5. Joint Top-K Spatial Keyword Query Processing

    DEFF Research Database (Denmark)

    Wu, Dingming; Yiu, Man Lung; Cong, Gao

    2012-01-01

    Web users and content are increasingly being geopositioned, and increased focus is being given to serving local content in response to web queries. This development calls for spatial keyword queries that take into account both the locations and textual descriptions of content. We study the effici......Web users and content are increasingly being geopositioned, and increased focus is being given to serving local content in response to web queries. This development calls for spatial keyword queries that take into account both the locations and textual descriptions of content. We study...... the efficient, joint processing of multiple top-k spatial keyword queries. Such joint processing is attractive during high query loads and also occurs when multiple queries are used to obfuscate a user's true query. We propose a novel algorithm and index structure for the joint processing of top-k spatial...... keyword queries. Empirical studies show that the proposed solution is efficient on real data sets. We also offer analytical studies on synthetic data sets to demonstrate the efficiency of the proposed solution. Index Terms IEEE Terms Electronic mail , Google , Indexes , Joints , Mobile communication...

  6. Facilitating Cohort Discovery by Enhancing Ontology Exploration, Query Management and Query Sharing for Large Clinical Data Repositories

    Science.gov (United States)

    Tao, Shiqiang; Cui, Licong; Wu, Xi; Zhang, Guo-Qiang

    2017-01-01

    To help researchers better access clinical data, we developed a prototype query engine called DataSphere for exploring large-scale integrated clinical data repositories. DataSphere expedites data importing using a NoSQL data management system and dynamically renders its user interface for concept-based querying tasks. DataSphere provides an interactive query-building interface together with query translation and optimization strategies, which enable users to build and execute queries effectively and efficiently. We successfully loaded a dataset of one million patients for University of Kentucky (UK) Healthcare into DataSphere with more than 300 million clinical data records. We evaluated DataSphere by comparing it with an instance of i2b2 deployed at UK Healthcare, demonstrating that DataSphere provides enhanced user experience for both query building and execution. PMID:29854239

  7. Facilitating Cohort Discovery by Enhancing Ontology Exploration, Query Management and Query Sharing for Large Clinical Data Repositories.

    Science.gov (United States)

    Tao, Shiqiang; Cui, Licong; Wu, Xi; Zhang, Guo-Qiang

    2017-01-01

    To help researchers better access clinical data, we developed a prototype query engine called DataSphere for exploring large-scale integrated clinical data repositories. DataSphere expedites data importing using a NoSQL data management system and dynamically renders its user interface for concept-based querying tasks. DataSphere provides an interactive query-building interface together with query translation and optimization strategies, which enable users to build and execute queries effectively and efficiently. We successfully loaded a dataset of one million patients for University of Kentucky (UK) Healthcare into DataSphere with more than 300 million clinical data records. We evaluated DataSphere by comparing it with an instance of i2b2 deployed at UK Healthcare, demonstrating that DataSphere provides enhanced user experience for both query building and execution.

  8. Design and evaluation of a NoSQL database for storing and querying RDF data

    Directory of Open Access Journals (Sweden)

    Kanda Runapongsa Saikaew

    2014-12-01

    Full Text Available Currently the amount of web data has increased excessively. Its metadata is widely used in order to fully exploit web information resources. This causes the need for Semantic Web technology to quickly analyze such big data. Resource Description Framework (RDF is a standard for describing web resources. In this paper, we propose a method to exploit a NoSQL database, specifically MongoDB, to store and query RDF data. We choose MongoDB to represent a NoSQL database because it is one of the most popular high-performance NoSQL databases. We evaluate the proposed design and implementation by using the Berlin SPARQL Benchmark, which is one of the most widely accepted benchmarks for comparing the performance of RDF storage systems. We compare three database systems, which are Apache Jena TDB (native RDF store, MySQL (relational database, and our proposed system with MongoDB (NoSQL database. Based on the experimental results analysis, our proposed system outperforms other database systems for most queries when the data set size is small. However, for a larger data set, MongoDB performs well for queries with simple operators while MySQL offers an efficient solution for complex queries. The result of this work can provide some guideline for choosing an appropriate RDF database system and applying a NoSQL database in storing and querying RDF data.

  9. Analysis of immune-related loci identifies 48 new susceptibility variants for multiple sclerosis

    DEFF Research Database (Denmark)

    Beecham, Ashley H; Patsopoulos, Nikolaos A; Xifara, Dionysia K

    2013-01-01

    Using the ImmunoChip custom genotyping array, we analyzed 14,498 subjects with multiple sclerosis and 24,091 healthy controls for 161,311 autosomal variants and identified 135 potentially associated regions (P...

  10. Ontology based heterogeneous materials database integration and semantic query

    Science.gov (United States)

    Zhao, Shuai; Qian, Quan

    2017-10-01

    Materials digital data, high throughput experiments and high throughput computations are regarded as three key pillars of materials genome initiatives. With the fast growth of materials data, the integration and sharing of data is very urgent, that has gradually become a hot topic of materials informatics. Due to the lack of semantic description, it is difficult to integrate data deeply in semantic level when adopting the conventional heterogeneous database integration approaches such as federal database or data warehouse. In this paper, a semantic integration method is proposed to create the semantic ontology by extracting the database schema semi-automatically. Other heterogeneous databases are integrated to the ontology by means of relational algebra and the rooted graph. Based on integrated ontology, semantic query can be done using SPARQL. During the experiments, two world famous First Principle Computational databases, OQMD and Materials Project are used as the integration targets, which show the availability and effectiveness of our method.

  11. QuerySpaces on Hadoop for the ATLAS EventIndex

    CERN Document Server

    Hrivnac, Julius; The ATLAS collaboration; Cranshaw, Jack; Favareto, Andrea; Prokoshin, Fedor; Glasman, Claudia; Toebbicke, Rainer

    2015-01-01

    A Hadoop-based implementation of the adaptive query engine serving as the back-end for the ATLAS EventIndex. The QuerySpaces implementation handles both original data and search results providing fast and efficient mechanisms for new user queries using already accumulated knowledge for optimization. Detailed descriptions and statistics about user requests are collected in HBase tables and HDFS files. Requests are associated to their results and a graph of relations between them is created to be used to find the most efficient way of providing answers to new requests The environment is completely transparent to users and is accessible over several command-line interfaces, a Web Service and a programming API.

  12. Cyclone: java-based querying and computing with Pathway/Genome databases.

    Science.gov (United States)

    Le Fèvre, François; Smidtas, Serge; Schächter, Vincent

    2007-05-15

    Cyclone aims at facilitating the use of BioCyc, a collection of Pathway/Genome Databases (PGDBs). Cyclone provides a fully extensible Java Object API to analyze and visualize these data. Cyclone can read and write PGDBs, and can write its own data in the CycloneML format. This format is automatically generated from the BioCyc ontology by Cyclone itself, ensuring continued compatibility. Cyclone objects can also be stored in a relational database CycloneDB. Queries can be written in SQL, and in an intuitive and concise object-oriented query language, Hibernate Query Language (HQL). In addition, Cyclone interfaces easily with Java software including the Eclipse IDE for HQL edition, the Jung API for graph algorithms or Cytoscape for graph visualization. Cyclone is freely available under an open source license at: http://sourceforge.net/projects/nemo-cyclone. For download and installation instructions, tutorials, use cases and examples, see http://nemo-cyclone.sourceforge.net.

  13. A SQL-Database Based Meta-CASE System and its Query Subsystem

    Science.gov (United States)

    Eessaar, Erki; Sgirka, Rünno

    Meta-CASE systems simplify the creation of CASE (Computer Aided System Engineering) systems. In this paper, we present a meta-CASE system that provides a web-based user interface and uses an object-relational database system (ORDBMS) as its basis. The use of ORDBMSs allows us to integrate different parts of the system and simplify the creation of meta-CASE and CASE systems. ORDBMSs provide powerful query mechanism. The proposed system allows developers to use queries to evaluate and gradually improve artifacts and calculate values of software measures. We illustrate the use of the systems by using SimpleM modeling language and discuss the use of SQL in the context of queries about artifacts. We have created a prototype of the meta-CASE system by using PostgreSQL™ ORDBMS and PHP scripting language.

  14. EFFECTIVELY SEARCHING SPECIMEN AND OBSERVATION DATA WITH TOQE, THE THESAURUS OPTIMIZED QUERY EXPANDER

    Directory of Open Access Journals (Sweden)

    Anton Güntsch

    2009-09-01

    Full Text Available Today’s specimen and observation data portals lack a flexible mechanism, able to link up thesaurus-enabled data sources such as taxonomic checklist databases and expand user queries to related terms, significantly enhancing result sets. The TOQE system (Thesaurus Optimized Query Expander is a REST-like XML web-service implemented in Python and designed for this purpose. Acting as an interface between portals and thesauri, TOQE allows the implementation of specialized portal systems with a set of thesauri supporting its specific focus. It is both easy to use for portal programmers and easy to configure for thesaurus database holders who want to expose their system as a service for query expansions. Currently, TOQE is used in four specimen and observation data portals. The documentation is available from http://search.biocase.org/toqe/.

  15. Relational Algebra and SQL: Better Together

    Science.gov (United States)

    McMaster, Kirby; Sambasivam, Samuel; Hadfield, Steven; Wolthuis, Stuart

    2013-01-01

    In this paper, we describe how database instructors can teach Relational Algebra and Structured Query Language together through programming. Students write query programs consisting of sequences of Relational Algebra operations vs. Structured Query Language SELECT statements. The query programs can then be run interactively, allowing students to…

  16. Development and empirical user-centered evaluation of semantically-based query recommendation for an electronic health record search engine.

    Science.gov (United States)

    Hanauer, David A; Wu, Danny T Y; Yang, Lei; Mei, Qiaozhu; Murkowski-Steffy, Katherine B; Vydiswaran, V G Vinod; Zheng, Kai

    2017-03-01

    The utility of biomedical information retrieval environments can be severely limited when users lack expertise in constructing effective search queries. To address this issue, we developed a computer-based query recommendation algorithm that suggests semantically interchangeable terms based on an initial user-entered query. In this study, we assessed the value of this approach, which has broad applicability in biomedical information retrieval, by demonstrating its application as part of a search engine that facilitates retrieval of information from electronic health records (EHRs). The query recommendation algorithm utilizes MetaMap to identify medical concepts from search queries and indexed EHR documents. Synonym variants from UMLS are used to expand the concepts along with a synonym set curated from historical EHR search logs. The empirical study involved 33 clinicians and staff who evaluated the system through a set of simulated EHR search tasks. User acceptance was assessed using the widely used technology acceptance model. The search engine's performance was rated consistently higher with the query recommendation feature turned on vs. off. The relevance of computer-recommended search terms was also rated high, and in most cases the participants had not thought of these terms on their own. The questions on perceived usefulness and perceived ease of use received overwhelmingly positive responses. A vast majority of the participants wanted the query recommendation feature to be available to assist in their day-to-day EHR search tasks. Challenges persist for users to construct effective search queries when retrieving information from biomedical documents including those from EHRs. This study demonstrates that semantically-based query recommendation is a viable solution to addressing this challenge. Published by Elsevier Inc.

  17. Analysis of immune-related loci identifies 48 new susceptibility variants for multiple sclerosis

    Science.gov (United States)

    Beecham, Ashley H; Patsopoulos, Nikolaos A; Xifara, Dionysia K; Davis, Mary F; Kemppinen, Anu; Cotsapas, Chris; Shahi, Tejas S; Spencer, Chris; Booth, David; Goris, An; Oturai, Annette; Saarela, Janna; Fontaine, Bertrand; Hemmer, Bernhard; Martin, Claes; Zipp, Frauke; D’alfonso, Sandra; Martinelli-Boneschi, Filippo; Taylor, Bruce; Harbo, Hanne F; Kockum, Ingrid; Hillert, Jan; Olsson, Tomas; Ban, Maria; Oksenberg, Jorge R; Hintzen, Rogier; Barcellos, Lisa F; Agliardi, Cristina; Alfredsson, Lars; Alizadeh, Mehdi; Anderson, Carl; Andrews, Robert; Søndergaard, Helle Bach; Baker, Amie; Band, Gavin; Baranzini, Sergio E; Barizzone, Nadia; Barrett, Jeffrey; Bellenguez, Céline; Bergamaschi, Laura; Bernardinelli, Luisa; Berthele, Achim; Biberacher, Viola; Binder, Thomas M C; Blackburn, Hannah; Bomfim, Izaura L; Brambilla, Paola; Broadley, Simon; Brochet, Bruno; Brundin, Lou; Buck, Dorothea; Butzkueven, Helmut; Caillier, Stacy J; Camu, William; Carpentier, Wassila; Cavalla, Paola; Celius, Elisabeth G; Coman, Irène; Comi, Giancarlo; Corrado, Lucia; Cosemans, Leentje; Cournu-Rebeix, Isabelle; Cree, Bruce A C; Cusi, Daniele; Damotte, Vincent; Defer, Gilles; Delgado, Silvia R; Deloukas, Panos; di Sapio, Alessia; Dilthey, Alexander T; Donnelly, Peter; Dubois, Bénédicte; Duddy, Martin; Edkins, Sarah; Elovaara, Irina; Esposito, Federica; Evangelou, Nikos; Fiddes, Barnaby; Field, Judith; Franke, Andre; Freeman, Colin; Frohlich, Irene Y; Galimberti, Daniela; Gieger, Christian; Gourraud, Pierre-Antoine; Graetz, Christiane; Graham, Andrew; Grummel, Verena; Guaschino, Clara; Hadjixenofontos, Athena; Hakonarson, Hakon; Halfpenny, Christopher; Hall, Gillian; Hall, Per; Hamsten, Anders; Harley, James; Harrower, Timothy; Hawkins, Clive; Hellenthal, Garrett; Hillier, Charles; Hobart, Jeremy; Hoshi, Muni; Hunt, Sarah E; Jagodic, Maja; Jelčić, Ilijas; Jochim, Angela; Kendall, Brian; Kermode, Allan; Kilpatrick, Trevor; Koivisto, Keijo; Konidari, Ioanna; Korn, Thomas; Kronsbein, Helena; Langford, Cordelia; Larsson, Malin; Lathrop, Mark; Lebrun-Frenay, Christine; Lechner-Scott, Jeannette; Lee, Michelle H; Leone, Maurizio A; Leppä, Virpi; Liberatore, Giuseppe; Lie, Benedicte A; Lill, Christina M; Lindén, Magdalena; Link, Jenny; Luessi, Felix; Lycke, Jan; Macciardi, Fabio; Männistö, Satu; Manrique, Clara P; Martin, Roland; Martinelli, Vittorio; Mason, Deborah; Mazibrada, Gordon; McCabe, Cristin; Mero, Inger-Lise; Mescheriakova, Julia; Moutsianas, Loukas; Myhr, Kjell-Morten; Nagels, Guy; Nicholas, Richard; Nilsson, Petra; Piehl, Fredrik; Pirinen, Matti; Price, Siân E; Quach, Hong; Reunanen, Mauri; Robberecht, Wim; Robertson, Neil P; Rodegher, Mariaemma; Rog, David; Salvetti, Marco; Schnetz-Boutaud, Nathalie C; Sellebjerg, Finn; Selter, Rebecca C; Schaefer, Catherine; Shaunak, Sandip; Shen, Ling; Shields, Simon; Siffrin, Volker; Slee, Mark; Sorensen, Per Soelberg; Sorosina, Melissa; Sospedra, Mireia; Spurkland, Anne; Strange, Amy; Sundqvist, Emilie; Thijs, Vincent; Thorpe, John; Ticca, Anna; Tienari, Pentti; van Duijn, Cornelia; Visser, Elizabeth M; Vucic, Steve; Westerlind, Helga; Wiley, James S; Wilkins, Alastair; Wilson, James F; Winkelmann, Juliane; Zajicek, John; Zindler, Eva; Haines, Jonathan L; Pericak-Vance, Margaret A; Ivinson, Adrian J; Stewart, Graeme; Hafler, David; Hauser, Stephen L; Compston, Alastair; McVean, Gil; De Jager, Philip; Sawcer, Stephen; McCauley, Jacob L

    2013-01-01

    Using the ImmunoChip custom genotyping array, we analysed 14,498 multiple sclerosis subjects and 24,091 healthy controls for 161,311 autosomal variants and identified 135 potentially associated regions (p-value multiple sclerosis subjects and 26,703 healthy controls. In these 80,094 individuals of European ancestry we identified 48 new susceptibility variants (p-value multiple sclerosis risk variants in 103 discrete loci outside of the Major Histocompatibility Complex. With high resolution Bayesian fine-mapping, we identified five regions where one variant accounted for more than 50% of the posterior probability of association. This study enhances the catalogue of multiple sclerosis risk variants and illustrates the value of fine-mapping in the resolution of GWAS signals. PMID:24076602

  18. Ontology-Based Querying with Bio2RDF's Linked Open Data.

    Science.gov (United States)

    Callahan, Alison; Cruz-Toledo, José; Dumontier, Michel

    2013-04-15

    A key activity for life scientists in this post "-omics" age involves searching for and integrating biological data from a multitude of independent databases. However, our ability to find relevant data is hampered by non-standard web and database interfaces backed by an enormous variety of data formats. This heterogeneity presents an overwhelming barrier to the discovery and reuse of resources which have been developed at great public expense.To address this issue, the open-source Bio2RDF project promotes a simple convention to integrate diverse biological data using Semantic Web technologies. However, querying Bio2RDF remains difficult due to the lack of uniformity in the representation of Bio2RDF datasets. We describe an update to Bio2RDF that includes tighter integration across 19 new and updated RDF datasets. All available open-source scripts were first consolidated to a single GitHub repository and then redeveloped using a common API that generates normalized IRIs using a centralized dataset registry. We then mapped dataset specific types and relations to the Semanticscience Integrated Ontology (SIO) and demonstrate simplified federated queries across multiple Bio2RDF endpoints. This coordinated release marks an important milestone for the Bio2RDF open source linked data framework. Principally, it improves the quality of linked data in the Bio2RDF network and makes it easier to access or recreate the linked data locally. We hope to continue improving the Bio2RDF network of linked data by identifying priority databases and increasing the vocabulary coverage to additional dataset vocabularies beyond SIO.

  19. Social Networking Privacy Control: Exploring University Variables Related to Young Adults' Sharing of Personally Identifiable Information

    Science.gov (United States)

    Zimmerman, Melisa S.

    2014-01-01

    The growth of the Internet, and specifically social networking sites (SNSs) like Facebook, create opportunities for individuals to share private and identifiable information with a closed or open community. Internet crime has been on the rise and research has shown that criminals are using individuals' personal information pulled from social…

  20. Query-Driven Visualization and Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Ruebel, Oliver; Bethel, E. Wes; Prabhat, Mr.; Wu, Kesheng

    2012-11-01

    This report focuses on an approach to high performance visualization and analysis, termed query-driven visualization and analysis (QDV). QDV aims to reduce the amount of data that needs to be processed by the visualization, analysis, and rendering pipelines. The goal of the data reduction process is to separate out data that is "scientifically interesting'' and to focus visualization, analysis, and rendering on that interesting subset. The premise is that for any given visualization or analysis task, the data subset of interest is much smaller than the larger, complete data set. This strategy---extracting smaller data subsets of interest and focusing of the visualization processing on these subsets---is complementary to the approach of increasing the capacity of the visualization, analysis, and rendering pipelines through parallelism. This report discusses the fundamental concepts in QDV, their relationship to different stages in the visualization and analysis pipelines, and presents QDV's application to problems in diverse areas, ranging from forensic cybersecurity to high energy physics.

  1. Similarity queries for temporal toxicogenomic expression profiles.

    Directory of Open Access Journals (Sweden)

    Adam A Smith

    2008-07-01

    Full Text Available We present an approach for answering similarity queries about gene expression time series that is motivated by the task of characterizing the potential toxicity of various chemicals. Our approach involves two key aspects. First, our method employs a novel alignment algorithm based on time warping. Our time warping algorithm has several advantages over previous approaches. It allows the user to impose fairly strong biases on the form that the alignments can take, and it permits a type of local alignment in which the entirety of only one series has to be aligned. Second, our method employs a relaxed spline interpolation to predict expression responses for unmeasured time points, such that the spline does not necessarily exactly fit every observed point. We evaluate our approach using expression time series from the Edge toxicology database. Our experiments show the value of using spline representations for sparse time series. More significantly, they show that our time warping method provides more accurate alignments and classifications than previous standard alignment methods for time series.

  2. Query by image example: The CANDID approach

    Energy Technology Data Exchange (ETDEWEB)

    Kelly, P.M.; Cannon, M. [Los Alamos National Lab., NM (United States). Computer Research and Applications Group; Hush, D.R. [Univ. of New Mexico, Albuquerque, NM (United States). Dept. of Electrical and Computer Engineering

    1995-02-01

    CANDID (Comparison Algorithm for Navigating Digital Image Databases) was developed to enable content-based retrieval of digital imagery from large databases using a query-by-example methodology. A user provides an example image to the system, and images in the database that are similar to that example are retrieved. The development of CANDID was inspired by the N-gram approach to document fingerprinting, where a ``global signature`` is computed for every document in a database and these signatures are compared to one another to determine the similarity between any two documents. CANDID computes a global signature for every image in a database, where the signature is derived from various image features such as localized texture, shape, or color information. A distance between probability density functions of feature vectors is then used to compare signatures. In this paper, the authors present CANDID and highlight two results from their current research: subtracting a ``background`` signature from every signature in a database in an attempt to improve system performance when using inner-product similarity measures, and visualizing the contribution of individual pixels in the matching process. These ideas are applicable to any histogram-based comparison technique.

  3. Identifying the related compounds using electrospray ionization tandem mass spectrometry: bromotyrosine alkaloids from marine sponge Psammaplysilla purpurea

    Digital Repository Service at National Institute of Oceanography (India)

    Tilvi, S.; DeSouza, L.

    electrospray ionization tandem mass spectrometry (ESI-MS/MS). This sponge has tremendous chemical diversity of bromotyrosine alkaloids. Here we have used the proteomics approach in identifying related bromotyrosine alkaloids based on the predicated mass...

  4. On (dynamic) range minimum queries in external memory

    DEFF Research Database (Denmark)

    Arge, L.; Fischer, Johannes; Sanders, Peter

    2013-01-01

    We study the one-dimensional range minimum query (RMQ) problem in the external memory model. We provide the first space-optimal solution to the batched static version of the problem. On an instance with N elements and Q queries, our solution takes Θ(sort(N + Q)) = Θ( N+QB log M /B N+QB ) I...

  5. Dataflow Query Execution in a Parallel, Main-memory Environment

    NARCIS (Netherlands)

    Wilschut, A.N.; Apers, Peter M.G.

    In this paper, the performance and characteristics of the execution of various join-trees on a parallel DBMS are studied. The results of this study are a step into the direction of the design of a query optimization strategy that is fit for parallel execution of complex queries. Among others,

  6. On the Suitability of Skyline Queries for Data Exploration

    DEFF Research Database (Denmark)

    Chester, Sean; Mortensen, Michael Lind; Assent, Ira

    2014-01-01

    The skyline operator has been studied in database research for multi-criteria decision making. Until now the focus has been on the efficiency or accuracy of single queries. In practice, however, users are increasingly confronted with unknown data collections, where precise query formulation proves...

  7. Efficient processing of 3-sided range queries with probabilistic guarantees

    DEFF Research Database (Denmark)

    Kaporis, Alexis; Papadopoulos, Apostolos; Sioutas, Spyros

    2010-01-01

    This work studies the problem of 2-dimensional searching for the 3-sided range query of the form [a, b] x (-∞, c] in both main and external memory, by considering a variety of input distributions. A dynamic linear main memory solution is proposed, which answers 3-sided queries in O(log n + t) worst...

  8. Efficient external memory structures for range-aggregate queries

    DEFF Research Database (Denmark)

    Agarwal, P.K.; Yang, J.; Arge, L.

    2013-01-01

    We present external memory data structures for efficiently answering range-aggregate queries. The range-aggregate problem is defined as follows: Given a set of weighted points in Rd, compute the aggregate of the weights of the points that lie inside a d-dimensional orthogonal query rectangle. The...

  9. Video Stream Retrieval of Unseen Queries using Semantic Memory

    NARCIS (Netherlands)

    Cappallo, S.; Mensink, T.; Snoek, C.G.M.; Wilson, R.C.; Hancock, E.R.; Smith, W.A.P.

    2016-01-01

    Retrieval of live, user-broadcast video streams is an under-addressed and increasingly relevant challenge. The on-line nature of the problem requires temporal evaluation and the unforeseeable scope of potential queries motivates an approach which can accommodate arbitrary search queries. To account

  10. Efficient processing of containment queries on nested sets

    NARCIS (Netherlands)

    Ibrahim, A.; Fletcher, G.H.L.

    2013-01-01

    We study the problem of computing containment queries on sets which can have both atomic and set-valued objects as elements, i.e., nested sets. Containment is a fundamental query pattern with many basic applications. Our study of nested set containment is motivated by the ubiquity of nested data in

  11. Memory aware query scheduling in a database cluster

    NARCIS (Netherlands)

    F. Waas; M.L. Kersten (Martin)

    2000-01-01

    textabstractQuery throughput is one of the primary optimization goals in interactive web-based information systems in order to achieve the performance necessary to serve large user communities. Queries in this application domain differ significantly from those in traditional database applications:

  12. A Typed Text Retrieval Query Language for XML Documents.

    Science.gov (United States)

    Colazzo, Dario; Sartiani, Carlo; Albano, Antonio; Manghi, Paolo; Ghelli, Giorgio; Lini, Luca; Paoli, Michele

    2002-01-01

    Discussion of XML focuses on a description of Tequyla-TX, a typed text retrieval query language for XML documents that can search on both content and structures. Highlights include motivations; numerous examples; word-based and char-based searches; tag-dependent full-text searches; text normalization; query algebra; data models and term language;…

  13. Real SQL queries 50 challenges : practice for reporting and analysis

    CERN Document Server

    Cohen, Brian; Mishra, Neerja

    2015-01-01

    Queries improve when challenges are authentic. This book sets your learning on the fast track with realistic problems to solve. Topics span sales, marketing, human resources, purchasing, and production. Real SQL Queries: 50 Challenges is perfect for analysts, report writers, or anyone searching for a hands-on approach to learning SQL Server.

  14. Ontology Based Queries - Investigating a Natural Language Interface

    NARCIS (Netherlands)

    van der Sluis, Ielka; Hielkema, F.; Mellish, C.; Doherty, G.

    2010-01-01

    In this paper we look at what may be learned from a comparative study examining non-technical users with a background in social science browsing and querying metadata. Four query tasks were carried out with a natural language interface and with an interface that uses a web paradigm with hyperlinks.

  15. Query Classification and Study of University Students' Search Trends

    Science.gov (United States)

    Maabreh, Majdi A.; Al-Kabi, Mohammed N.; Alsmadi, Izzat M.

    2012-01-01

    Purpose: This study is an attempt to develop an automatic identification method for Arabic web queries and divide them into several query types using data mining. In addition, it seeks to evaluate the impact of the academic environment on using the internet. Design/methodology/approach: The web log files were collected from one of the higher…

  16. A framework for query optimization to support data mining

    NARCIS (Netherlands)

    S.R. Choenni (Sunil); A.P.J.M. Siebes (Arno)

    1996-01-01

    textabstractIn order to extract knowledge from databases, data mining algorithms heavily query the databases. Inefficient processing of these queries will inevitably have its impact on the performance of these algorithms, making them less valuable. In this paper, we describe an optimization

  17. Identifying return-to-work trajectories using sequence analysis in a cohort of workers with work-related musculoskeletal disorders

    NARCIS (Netherlands)

    McLeod, Christopher B.; Reiff, Eline; Maas, Esther; Bultmann, Ute

    2018-01-01

    Objectives This study aimed to identify return-to-work (RTW) trajectories among workers with work-related musculoskeletal disorders (MSD) and examine the associations between different MSD and these RTW trajectories. Methods We used administrative workers' compensation data to identify accepted MSD

  18. Validation of a search strategy to identify nutrition trials in PubMed using the relative recall method.

    Science.gov (United States)

    Durão, Solange; Kredo, Tamara; Volmink, Jimmy

    2015-06-01

    To develop, assess, and maximize the sensitivity of a search strategy to identify diet and nutrition trials in PubMed using relative recall. We developed a search strategy to identify diet and nutrition trials in PubMed. We then constructed a gold standard reference set to validate the identified trials using the relative recall method. Relative recall was calculated by dividing the number of references from the gold standard our search strategy identified by the total number of references in the gold standard. Our gold standard comprised 298 trials, derived from 16 included systematic reviews. The initial search strategy identified 242 of 298 references, with a relative recall of 81.2% [95% confidence interval (CI): 76.3%, 85.5%]. We analyzed titles and abstracts of the 56 missed references for possible additional terms. We then modified the search strategy accordingly. The relative recall of the final search strategy was 88.6% (95% CI: 84.4%, 91.9%). We developed a search strategy to identify diet and nutrition trials in PubMed with a high relative recall (sensitivity). This could be useful for establishing a nutrition trials register to support the conduct of future research, including systematic reviews. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  19. Menopause and big data: Word Adjacency Graph modeling of menopause-related ChaCha data.

    Science.gov (United States)

    Carpenter, Janet S; Groves, Doyle; Chen, Chen X; Otte, Julie L; Miller, Wendy R

    2017-07-01

    To detect and visualize salient queries about menopause using Big Data from ChaCha. We used Word Adjacency Graph (WAG) modeling to detect clusters and visualize the range of menopause-related topics and their mutual proximity. The subset of relevant queries was fully modeled. We split each query into token words (ie, meaningful words and phrases) and removed stopwords (ie, not meaningful functional words). The remaining words were considered in sequence to build summary tables of words and two and three-word phrases. Phrases occurring at least 10 times were used to build a network graph model that was iteratively refined by observing and removing clusters of unrelated content. We identified two menopause-related subsets of queries by searching for questions containing menopause and menopause-related terms (eg, climacteric, hot flashes, night sweats, hormone replacement). The first contained 263,363 queries from individuals aged 13 and older and the second contained 5,892 queries from women aged 40 to 62 years. In the first set, we identified 12 topic clusters: 6 relevant to menopause and 6 less relevant. In the second set, we identified 15 topic clusters: 11 relevant to menopause and 4 less relevant. Queries about hormones were pervasive within both WAG models. Many of the queries reflected low literacy levels and/or feelings of embarrassment. We modeled menopause-related queries posed by ChaCha users between 2009 and 2012. ChaCha data may be used on its own or in combination with other Big Data sources to identify patient-driven educational needs and create patient-centered interventions.

  20. A Fuzzy Query Mechanism for Human Resource Websites

    Science.gov (United States)

    Lai, Lien-Fu; Wu, Chao-Chin; Huang, Liang-Tsung; Kuo, Jung-Chih

    Users' preferences often contain imprecision and uncertainty that are difficult for traditional human resource websites to deal with. In this paper, we apply the fuzzy logic theory to develop a fuzzy query mechanism for human resource websites. First, a storing mechanism is proposed to store fuzzy data into conventional database management systems without modifying DBMS models. Second, a fuzzy query language is proposed for users to make fuzzy queries on fuzzy databases. User's fuzzy requirement can be expressed by a fuzzy query which consists of a set of fuzzy conditions. Third, each fuzzy condition associates with a fuzzy importance to differentiate between fuzzy conditions according to their degrees of importance. Fourth, the fuzzy weighted average is utilized to aggregate all fuzzy conditions based on their degrees of importance and degrees of matching. Through the mutual compensation of all fuzzy conditions, the ordering of query results can be obtained according to user's preference.

  1. Query Log Analysis of an Electronic Health Record Search Engine

    Science.gov (United States)

    Yang, Lei; Mei, Qiaozhu; Zheng, Kai; Hanauer, David A.

    2011-01-01

    We analyzed a longitudinal collection of query logs of a full-text search engine designed to facilitate information retrieval in electronic health records (EHR). The collection, 202,905 queries and 35,928 user sessions recorded over a course of 4 years, represents the information-seeking behavior of 533 medical professionals, including frontline practitioners, coding personnel, patient safety officers, and biomedical researchers for patient data stored in EHR systems. In this paper, we present descriptive statistics of the queries, a categorization of information needs manifested through the queries, as well as temporal patterns of the users’ information-seeking behavior. The results suggest that information needs in medical domain are substantially more sophisticated than those that general-purpose web search engines need to accommodate. Therefore, we envision there exists a significant challenge, along with significant opportunities, to provide intelligent query recommendations to facilitate information retrieval in EHR. PMID:22195150

  2. Multiple k Nearest Neighbor Query Processing in Spatial Network Databases

    DEFF Research Database (Denmark)

    Xuegang, Huang; Jensen, Christian Søndergaard; Saltenis, Simonas

    2006-01-01

    This paper concerns the efficient processing of multiple k nearest neighbor queries in a road-network setting. The assumed setting covers a range of scenarios such as the one where a large population of mobile service users that are constrained to a road network issue nearest-neighbor queries...... for points of interest that are accessible via the road network. Given multiple k nearest neighbor queries, the paper proposes progressive techniques that selectively cache query results in main memory and subsequently reuse these for query processing. The paper initially proposes techniques for the case...... where an upper bound on k is known a priori and then extends the techniques to the case where this is not so. Based on empirical studies with real-world data, the paper offers insight into the circumstances under which the different proposed techniques can be used with advantage for multiple k nearest...

  3. Pareto-depth for multiple-query image retrieval.

    Science.gov (United States)

    Hsiao, Ko-Jen; Calder, Jeff; Hero, Alfred O

    2015-02-01

    Most content-based image retrieval systems consider either one single query, or multiple queries that include the same object or represent the same semantic information. In this paper, we consider the content-based image retrieval problem for multiple query images corresponding to different image semantics. We propose a novel multiple-query information retrieval algorithm that combines the Pareto front method with efficient manifold ranking. We show that our proposed algorithm outperforms state of the art multiple-query retrieval algorithms on real-world image databases. We attribute this performance improvement to concavity properties of the Pareto fronts, and prove a theoretical result that characterizes the asymptotic concavity of the fronts.

  4. Identifying and Assessing Life-Cycle-Related Critical Technology Elements (CTEs) for Technology Readiness Assessments (TRAs)

    National Research Council Canada - National Science Library

    Mandelbaum, Jay

    2006-01-01

    .... Because these technologies are not emphasized in the current Technology Readiness Assessment (TRA) process this document is intended to improve the focus on life-cycle-related technologies in TRAs...

  5. Identifying amyloid pathology?related cerebrospinal fluid biomarkers for Alzheimer's disease in a multicohort study

    OpenAIRE

    Leung, Yuk Yee; Toledo, Jon B.; Nefedov, Alexey; Polikar, Robi; Raghavan, Nandini; Xie, Sharon X.; Farnum, Michael; Schultz, Tim; Baek, Young; Van Deerlin, Vivianna M.; Hu, William T.; Holtzman, David M.; Fagan, Anne M.; Perrin, Richard J.; Grossman, Murray

    2015-01-01

    Introduction The dynamic range of cerebrospinal fluid (CSF) amyloid ? (A?1?42) measurement does not parallel to cognitive changes in Alzheimer's disease (AD) and cognitively normal (CN) subjects across different studies. Therefore, identifying novel proteins to characterize symptomatic AD samples is important. Methods Proteins were profiled using a multianalyte platform by Rules Based Medicine (MAP-RBM). Due to underlying heterogeneity and unbalanced sample size, we combined subjects (344 AD ...

  6. Identifying musical difficulties as they relate to congenital amusia in the pediatric population.

    Science.gov (United States)

    Wilcox, Lyndy J; He, Kaidi; Derkay, Craig S

    2015-12-01

    Approximately 4% of the population fails to develop basic music skills and can be identified as "amusic". Congenital amusia (CA), or "tone deafness", is thought to be a hereditary disordera predominantly affecting the perception and production of music. The gold standard for diagnosis is the Montreal Battery for Evaluation of Amusia (MBEA). This study aims to pinpoint factors in the history that may help identify amusic children and to determine if amusic pediatric patients can be identified using a widely available, shorter test validated in adults. Subjects ages 7-17 years were recruited to take an online test, validated against the MBEA, for CA. The sections tested recognition of "off-beat" (OB), "mistuned" (MT), and "out-of-key" (OOK) conditions. Parents filled out a questionnaire regarding the subject's past medical, educational, musical exposure, and family history. Of 114 subjects recruited, complete data was available on 105 with a mean age of 12.5 years. According to adult criteria, 63/105 (60%) of subjects scored in the "amusic" range. Children >10 years of age scored significantly higher on the off-beat section (p=0.001) and total scores (p=0.025). Subjects who were born prematurely scored significantly lower (p=0.045). Children whose father had difficulties with music scored significantly lower on the off-beat section (p=0.003) and total scores (p=0.008). CA is a disorder that has implications for quality of life. Earlier identification may help elucidate the pathogenesis of the condition and, in the future, the institution of prompt treatment. Further studies are needed to identify the most appropriate and convenient tests, as well as the optimal timing of testing, for reliably diagnosing CA in children. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  7. Duodenoscope-Related Outbreak of a Carbapenem-Resistant Klebsiella pneumoniae Identified Using Advanced Molecular Diagnostics.

    Science.gov (United States)

    Humphries, Romney M; Yang, Shuan; Kim, Stephen; Muthusamy, Venkatara Raman; Russell, Dana; Trout, Alisa M; Zaroda, Teresa; Cheng, Quen J; Aldrovandi, Grace; Uslan, Daniel Zachary; Hemarajata, Peera; Rubin, Zachary Aaron

    2017-10-01

    Carbapenem-resistant Klebsiella pneumoniae infections are increasingly prevalent in North American hospitals. We describe an outbreak of carbapenem-resistant K. pneumoniae containing the blaOXA-232 gene transmitted by contaminated duodenoscopes during endoscopic retrograde cholangiopancreatography (ERCP) procedures. An outbreak investigation was performed when 9 patients with blaOXA-232 carbapenem-resistant K. pneumoniae infections were identified at a tertiary care hospital. The investigation included 2 case-control studies, review of duodenoscope reprocessing procedures, and culture of devices. Carbapenem-resistant Enterobacteriacieae (CRE) isolates were evaluated with polymerase chain reaction analysis for carbapenemase genes, and isolates with the blaOXA-232 gene were subjected to whole-genome sequencing and chromosome single-nucleotide polymorphism analysis. On recognition of ERCP as a key risk factor for infection, targeted patient notification and CRE screening cultures were performed. Molecular testing ultimately identified 17 patients with blaOxa-232 carbapenem-resistant K. pneumoniae isolates, including 9 with infections, 7 asymptomatic carriers who had undergone ERCP, and 1 additional patient who had been hospitalized in India and was probably the initial carrier. Two case-control studies established a point-source outbreak associated with 2 specific duodenoscopes. A field investigation of the use, reprocessing, and storage of deuodenoscopes did not identify deviations from US Food and Drug Administration or manufacturer recommendations for reprocessing. This outbreak demonstrated the previously underappreciated potential for duodenoscopes to transmit disease, even after undergoing high-level disinfection according to manufacturers' guidelines.

  8. Processing SPARQL queries with regular expressions in RDF databases

    Science.gov (United States)

    2011-01-01

    Background As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users’ requests for extracting information from the RDF data as well as the lack of users’ knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. Results In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Conclusions Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns. PMID:21489225

  9. Processing SPARQL queries with regular expressions in RDF databases

    Directory of Open Access Journals (Sweden)

    Cho Hune

    2011-03-01

    Full Text Available Abstract Background As the Resource Description Framework (RDF data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf or Bio2RDF (bio2rdf.org, SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users’ requests for extracting information from the RDF data as well as the lack of users’ knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. Results In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1 We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2 We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3 We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Conclusions Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns.

  10. Processing SPARQL queries with regular expressions in RDF databases.

    Science.gov (United States)

    Lee, Jinsoo; Pham, Minh-Duc; Lee, Jihwan; Han, Wook-Shin; Cho, Hune; Yu, Hwanjo; Lee, Jeong-Hoon

    2011-03-29

    As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users' requests for extracting information from the RDF data as well as the lack of users' knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns.

  11. Long-Term Cardiovascular Risk in Heterozygous Familial Hypercholesterolemia Relatives Identified by Cascade Screening

    DEFF Research Database (Denmark)

    Kjærgaard, Kasper Aalbæk; Christiansen, Morten Krogh; Schmidt, Morten

    2017-01-01

    ischemic attack, peripheral artery disease, and coronary revascularization. We included 220 relatives. Median age was 37 years (interquartile range: 27-52 years) of which 118 (54%) had an LDLR mutation. By 2004, when prescription data became available, 89% of mutation-carrying participants were taking...

  12. Combat-Related Heterotopic Ossification: Development of Animal Models for Identifying Mechanisms and Testing Therapeutics

    Science.gov (United States)

    2015-07-01

    approximately 65% of combat- related amputations. Symptomatic HO is characterized by persistent pain and skin ulceration disrupting prosthetic ...rehabilitation. Surgical excision is the only definitive management option when physical therapy and prosthetic modification fail to provide adequate relief...but did not undergo inoculation. Rats were followed for visual evidence of wound infection requiring irrigation and debridement and euthanized if

  13. Identifying the Risk of Swallowing-Related Pulmonary Complications in Older Patients With Hip Fracture.

    Science.gov (United States)

    Meals, Clifton; Roy, Siddharth; Medvedev, Gleb; Wallace, Matthew; Neviaser, Robert J; O'Brien, Joseph

    2016-01-01

    To identify and potentially modify the risk of pulmonary complications in a group of older patients with hip fracture, the authors obtained speech and language pathology consultations for these patients. Then they performed a retrospective chart review of all patients 65 years and older who were admitted to their institution between June 2011 and July 2013 with acute hip fracture, were treated surgically, and had a speech and language pathology evaluation in the immediate perioperative period. The authors identified 52 patients who met the study criteria. According to the American Society of Anesthesiologists (ASA) classification system, at the time of surgery, 1 patient (2%) was classified as ASA I, 12 patients (23%) were ASA II, 26 (50%) were ASA III, and 12 (23%) were ASA IV. Based on a speech and language pathology evaluation, 22 patients (42%) were diagnosed with dysphagia. Statistical analysis showed that ASA III status and ASA IV status were meaningful predictors of dysphagia and that dysphagia itself was a strong risk factor for pulmonary aspiration, pneumonia, and aspiration pneumonitis. Evaluation by a speech and language pathologist, particularly of patients classified as ASA III or ASA IV, may be an efficient means of averting pulmonary morbidity that is common in older patients with hip fracture. Copyright 2016, SLACK Incorporated.

  14. Relative validity of a food frequency questionnaire to identify dietary patterns in an adult Mexican population.

    Science.gov (United States)

    Denova-Gutiérrez, Edgar; Tucker, Katherine L; Salmerón, Jorge; Flores, Mario; Barquera, Simón

    2016-01-01

    To examine the validity of a semi-quantitative food frequency questionnaire (SFFQ) to identify dietary patterns in an adult Mexican population. A 140-item SFFQ and two 24-hour dietary recalls (24DRs) were administered. Foods were categorized into 29 food groups used to derive dietary patterns via factor analysis. Pearson and intraclass correlations coefficients between dietary pattern scores identified from the SFFQ and 24DRs were assessed. Pattern 1 was high in snacks, fast food, soft drinks, processed meats and refined grains; pattern 2 was high in fresh vegetables, fresh fruits, and dairy products; and pattern 3 was high in legumes, eggs, sweetened foods and sugars. Pearson correlation coefficients between the SFFQ and the 24DRs for these patterns were 0.66 (P<0.001), 0.41 (P<0.001) and 0.29 (P=0.193) respectively. Our data indicate reasonable validity of the SFFQ, using factor analysis, to derive major dietary patterns in comparison with two 24DR.

  15. Relative validity of a food frequency questionnaire to identify dietary patterns in an adult Mexican population

    Directory of Open Access Journals (Sweden)

    Edgar Denova-Gutiérrez

    2016-12-01

    Full Text Available Objective. To examine the validity of a semi-quantitative food frequency questionnaire (SFFQ to identify dietary patterns in an adult Mexican population. Materials and methods. A 140-item SFFQ and two 24-hour dietary recalls (24DRs were administered. Foods were categorized into 29 food groups used to derive dietary patterns via factor analy­sis. Pearson and intraclass correlations coefficients between dietary pattern scores identified from the SFFQ and 24DRs were assessed. Results. Pattern 1 was high in snacks, fast food, soft drinks, processed meats and refined grains; pattern 2 was high in fresh vegetables, fresh fruits, and dairy products; and pattern 3 was high in legumes, eggs, sweetened foods and sugars. Pearson correlation oefficients between the SFFQ and the 24DRs for these patterns were 0.66 (P<0.001, 0.41 (P<0.001 and 0.29 (P=0.193 respectively. Conclusions. Our data indicate reasonable validity of the SFFQ, using fac­tor analysis, to derive major dietary patterns in comparison with two 24DR.

  16. jQuery UI 1.7 the user interface library for jQuery

    CERN Document Server

    Wellman, Dan

    2009-01-01

    An example-based approach leads you step-by-step through the implementation and customization of each library component and its associated resources in turn. To emphasize the way that jQuery UI takes the difficulty out of user interface design and implementation, each chapter ends with a 'fun with' section that puts together what you've learned throughout the chapter to make a usable and fun page. In these sections you'll often get to experiment with the latest associated technologies like AJAX and JSON. This book is for front-end designers and developers who need to quickly learn how to use t

  17. Transcriptomic analysis identifies genes and pathways related to myrmecophagy in the Malayan pangolin (Manis javanica

    Directory of Open Access Journals (Sweden)

    Jing-E Ma

    2017-12-01

    Full Text Available The Malayan pangolin (Manis javanica is an unusual, scale-covered, toothless mammal that specializes in myrmecophagy. Due to their threatened status and continuing decline in the wild, concerted efforts have been made to conserve and rescue this species in captivity in China. Maintaining this species in captivity is a significant challenge, partly because little is known of the molecular mechanisms of its digestive system. Here, the first large-scale sequencing analyses of the salivary gland, liver and small intestine transcriptomes of an adult M. javanica genome were performed, and the results were compared with published liver transcriptome profiles for a pregnant M. javanica female. A total of 24,452 transcripts were obtained, among which 22,538 were annotated on the basis of seven databases. In addition, 3,373 new genes were predicted, of which 1,459 were annotated. Several pathways were found to be involved in myrmecophagy, including olfactory transduction, amino sugar and nucleotide sugar metabolism, lipid metabolism, and terpenoid and polyketide metabolism pathways. Many of the annotated transcripts were involved in digestive functions: 997 transcripts were related to sensory perception, 129 were related to digestive enzyme gene families, and 199 were related to molecular transporters. One transcript for an acidic mammalian chitinase was found in the annotated data, and this might be closely related to the unique digestive function of pangolins. These pathways and transcripts are involved in specialization processes related to myrmecophagy (a form of insectivory and carbohydrate, protein and lipid digestive pathways, probably reflecting adaptations to myrmecophagy. Our study is the first to investigate the molecular mechanisms underlying myrmecophagy in M. javanica, and we hope that our results may play a role in the conservation of this species.

  18. Identifying features of pocket parks that may be related to health promoting use

    DEFF Research Database (Denmark)

    Peschardt, Karin Kragsig; Stigsdotter, Ulrika K.; Schipperijn, Jasper

    2016-01-01

    . The results show that ‘green features’ do not seem to be of crucial importance for ‘socialising’ whereas, as expected, features promoting gathering should be prioritised. For ‘rest and restitution’, the main results show that ‘green ground cover’ and ‘enclosed green niches’ are important, while ‘disturbing......Urban green spaces have been shown to promote health and well-being and recent research indicates that the two primary potentially health promoting uses of pocket parks are ‘rest and restitution’ and ‘socialising’. The aim of this study is to identify features in pocket parks that may support...... features’ (playground, view outside park) should be avoided. The results add knowledge about the features which support the health promoting use of pocket parks to the existing body of research....

  19. Identifying and defining the terms and elements related to a digital health innovation ecosystem

    CSIR Research Space (South Africa)

    Iyawa, G

    2016-12-01

    Full Text Available environment in which digital health systems have to be implemented in South Africa provides for specific challenges relating to environmental, community and physical challenges. By using an in-depth comparative case study within the design science..., Approaches and Experiences: Towards building a South African Digital Health Innovation Ecosystem 2 Strategies, Approaches and Experiences: Towards building a South African Digital Health Innovation Ecosystem First published in December 2016...

  20. Use of multiple correspondence analysis (MCA) to identify interactive meteorological conditions affecting relative throughfall

    Science.gov (United States)

    Van Stan, John T.; Gay, Trent E.; Lewis, Elliott S.

    2016-02-01

    Forest canopies alter rainfall reaching the surface by redistributing it as throughfall. Throughfall supplies water and nutrients to a variety of ecohydrological components (soil microbial communities, stream water discharge/chemistry, and stormflow pathways) and is controlled by canopy structural interactions with meteorological conditions across temporal scales. This work introduces and applies multiple correspondence analyses (MCAs) to a range of meteorological thresholds (median intensity, median absolute deviation (MAD) of intensity, median wind-driven droplet inclination angle, and MAD of wind speed) for an example throughfall problem: identification of interacting storm conditions corresponding to temporal concentration in relative throughfall beyond the median observation (⩾73% of rain). MCA results from the example show that equalling or exceeding rain intensity thresholds (median and MAD) corresponded with temporal concentration of relative throughfall across all storms. Under these intensity conditions, two wind mechanisms produced significant correspondences: (1) high, steady wind-driven droplet inclination angles increased surface wetting; and (2) sporadic winds shook entrained droplets from surfaces. A discussion is provided showing that these example MCA findings agree well with previous work relying on more historically common methods (e.g., multiple regression and analytical models). Meteorological threshold correspondences to temporal concentration of relative throughfall at our site may be a function of heavy Tillandsia usneoides coverage. Applications of MCA within other forests may provide useful insights to how temporal throughfall dynamics are affected for drainage pathways dependent on different structures (leaves, twigs, branches, etc.).

  1. City-scale analysis of water-related energy identifies more cost-effective solutions.

    Science.gov (United States)

    Lam, Ka Leung; Kenway, Steven J; Lant, Paul A

    2017-02-01

    Energy and greenhouse gas management in urban water systems typically focus on optimising within the direct system boundary of water utilities that covers the centralised water supply and wastewater treatment systems, despite a greater energy influence by the water end use. This work develops a cost curve of water-related energy management options from a city perspective for a hypothetical Australian city. It is compared with that from the water utility perspective. The curves are based on 18 water-related energy management options that have been implemented or evaluated in Australia. In the studied scenario, the cost-effective energy saving potential from a city perspective (292 GWh/year) is far more significant than that from a utility perspective (65 GWh/year). In some cases, for similar capital cost, if regional water planners invested in end use options instead of utility options, a greater energy saving potential at a greater cost-effectiveness could be achieved in urban water systems. For example, upgrading a wastewater treatment plant for biogas recovery at a capital cost of $27.2 million would save 31 GWh/year with a marginal cost saving of $63/MWh, while solar hot water system rebates at a cost of $28.6 million would save 67 GWh/year with a marginal cost saving of $111/MWh. Options related to hot water use such as water-efficient shower heads, water-efficient clothes washers and solar hot water system rebates are among the most cost-effective city-scale opportunities. This study demonstrates the use of cost curves to compare both utility and end use options in a consistent framework. It also illustrates that focusing solely on managing the energy use within the utility would miss substantial non-utility water-related energy saving opportunities. There is a need to broaden the conventional scope of cost curve analysis to include water-related energy and greenhouse gas at the water end use, and to value their management from a city perspective. This

  2. Executing SPARQL Queries over the Web of Linked Data

    Science.gov (United States)

    Hartig, Olaf; Bizer, Christian; Freytag, Johann-Christoph

    The Web of Linked Data forms a single, globally distributed dataspace. Due to the openness of this dataspace, it is not possible to know in advance all data sources that might be relevant for query answering. This openness poses a new challenge that is not addressed by traditional research on federated query processing. In this paper we present an approach to execute SPARQL queries over the Web of Linked Data. The main idea of our approach is to discover data that might be relevant for answering a query during the query execution itself. This discovery is driven by following RDF links between data sources based on URIs in the query and in partial results. The URIs are resolved over the HTTP protocol into RDF data which is continuously added to the queried dataset. This paper describes concepts and algorithms to implement our approach using an iterator-based pipeline. We introduce a formalization of the pipelining approach and show that classical iterators may cause blocking due to the latency of HTTP requests. To avoid blocking, we propose an extension of the iterator paradigm. The evaluation of our approach shows its strengths as well as the still existing challenges.

  3. Query construction, entropy, and generalization in neural-network models

    Science.gov (United States)

    Sollich, Peter

    1994-05-01

    We study query construction algorithms, which aim at improving the generalization ability of systems that learn from examples by choosing optimal, nonredundant training sets. We set up a general probabilistic framework for deriving such algorithms from the requirement of optimizing a suitable objective function; specifically, we consider the objective functions entropy (or information gain) and generalization error. For two learning scenarios, the high-low game and the linear perceptron, we evaluate the generalization performance obtained by applying the corresponding query construction algorithms and compare it to training on random examples. We find qualitative differences between the two scenarios due to the different structure of the underlying rules (nonlinear and ``noninvertible'' versus linear); in particular, for the linear perceptron, random examples lead to the same generalization ability as a sequence of queries in the limit of an infinite number of examples. We also investigate learning algorithms which are ill matched to the learning environment and find that, in this case, minimum entropy queries can in fact yield a lower generalization ability than random examples. Finally, we study the efficiency of single queries and its dependence on the learning history, i.e., on whether the previous training examples were generated randomly or by querying, and the difference between globally and locally optimal query construction.

  4. Queries and Views of Programs Using a Relational Database System

    Science.gov (United States)

    1983-12-01

    look different but you have changed. I’m looking through you, you’re not the same! - from the song I’m looking through you by the Beatles Seeing...Berkeley, CA 94720 December 1983 Submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science...in the Graduate Division of the University of California, Berkeley. Copyright© 1983 by Mark A. Linton Research supported by NSF grant MCS-8010686

  5. Identifying food-related life style segments by a cross-culturally valid scaling device

    DEFF Research Database (Denmark)

    Brunsø, Karen; Grunert, Klaus G.

    1994-01-01

    -related life style in a cross-culturally valid way. To this end, we have col-lected a pool of 202 items, collected data in three countries, and have con-structed scales based on cross-culturally stable patterns. These scales have then been subjected to a number of tests of reliability and vali-dity. We have...... then applied the set of scales to a fourth country, Germany, based on a representative sample of 1000 respondents. The scales had, with a fe exceptions, moderately good reliabilities. A cluster ana-ly-sis led to the identification of 5 segments, which differed on all 23 scales....

  6. What makes astronomical heritage valuable? Identifying potential Outstanding Universal Value in cultural properties relating to astronomy

    Science.gov (United States)

    Cotte, Michel

    2016-10-01

    This communication presents the situation regarding astronomical and archaeoastronomical heritage related to the World Heritage Convention through recent years up until today. Some parallel events and works were promoted strongly within the IAU-UNESCO Initiative during the International Year of Astronomy (2009). This was followed by a joint program by the IAU and ICOMOS-an official advisory body assisting the World Heritage Committee in the evaluation of nomination dossiers. The result of that work is an important publication by around 40 authors from 20 different countries all around the world: Heritage Sites of Astronomy and Archaeoastronomy in the Context of the UNESCO World Heritage Convention (Ruggles & Cotte 2010). A second volume is under preparation (2015). It was also accompanied by some initiatives such as the ``Windows to the Universe" organisation and the parallel constitution of local ``Starlight Reserves''. Some regional meetings studying specific facets or regional heritage in the field giving significant knowledge progresses also accompanied the global trend for astronomical heritage. WH assessment is defined by a relatively strict format and methodology. A key phrase is ``demonstration of Outstanding Universal Value'' to justify the WH Listing by the Committee. This communication first examines the requirements and evaluation practices about of demonstrating OUV for a given place in the context of astronomical or archaeoastronomical heritage. That means the examination of the tangible attributes, an inventory of the property in terms of immoveable and moveable components and an inventory of intangible issues related to the history (history of the place in the context of the history of astronomy and cultural history). This is also related to the application to the site of the concept of integrity and authenticity, as regards the place itself and in comparison with other similar places (WH sites already listed, sites on national WH Tentative Lists

  7. Contribution of WUSCHEL-related homeobox (WOX genes to identify the phylogenetic relationships among Petunia species

    Directory of Open Access Journals (Sweden)

    Ana Lúcia Anversa Segatto

    Full Text Available Abstract Developmental genes are believed to contribute to major changes during plant evolution, from infrageneric to higher levels. Due to their putative high sequence conservation, developmental genes are rarely used as molecular markers, and few studies including these sequences at low taxonomic levels exist. WUSCHEL-related homeobox genes (WOX are transcription factors exclusively present in plants and are involved in developmental processes. In this study, we characterized the infrageneric genetic variation of Petunia WOX genes. We obtained phylogenetic relationships consistent with other phylogenies based on nuclear markers, but with higher statistical support, resolution in terminals, and compatibility with flower morphological changes.

  8. Identifying Flood-Related Infectious Diseases in Anhui Province, China: A Spatial and Temporal Analysis

    Science.gov (United States)

    Gao, Lu; Zhang, Ying; Ding, Guoyong; Liu, Qiyong; Jiang, Baofa

    2016-01-01

    The aim of this study was to explore infectious diseases related to the 2007 Huai River flood in Anhui Province, China. The study was based on the notified incidences of infectious diseases between June 29 and July 25 from 2004 to 2011. Daily incidences of notified diseases in 2007 were compared with the corresponding daily incidences during the same period in the other years (from 2004 to 2011, except 2007) by Poisson regression analysis. Spatial autocorrelation analysis was used to test the distribution pattern of the diseases. Spatial regression models were then performed to examine the association between the incidence of each disease and flood, considering lag effects and other confounders. After controlling the other meteorological and socioeconomic factors, malaria (odds ratio [OR] = 3.67, 95% confidence interval [CI] = 1.77–7.61), diarrhea (OR = 2.16, 95% CI = 1.24–3.78), and hepatitis A virus (HAV) infection (OR = 6.11, 95% CI = 1.04–35.84) were significantly related to the 2007 Huai River flood both from the spatial and temporal analyses. Special attention should be given to develop public health preparation and interventions with a focus on malaria, diarrhea, and HAV infection, in the study region. PMID:26903612

  9. Human-Chromatin-Related Protein Interactions Identify a Demethylase Complex Required for Chromosome Segregation

    Directory of Open Access Journals (Sweden)

    Edyta Marcon

    2014-07-01

    Full Text Available Chromatin regulation is driven by multicomponent protein complexes, which form functional modules. Deciphering the components of these modules and their interactions is central to understanding the molecular pathways these proteins are regulating, their functions, and their relation to both normal development and disease. We describe the use of affinity purifications of tagged human proteins coupled with mass spectrometry to generate a protein-protein interaction map encompassing known and predicted chromatin-related proteins. On the basis of 1,394 successful purifications of 293 proteins, we report a high-confidence (85% precision network involving 11,464 protein-protein interactions among 1,738 different human proteins, grouped into 164 often overlapping protein complexes with a particular focus on the family of JmjC-containing lysine demethylases, their partners, and their roles in chromatin remodeling. We show that RCCD1 is a partner of histone H3K36 demethylase KDM8 and demonstrate that both are important for cell-cycle-regulated transcriptional repression in centromeric regions and accurate mitotic division.

  10. Identifying the null subject: evidence from event-related brain potentials.

    Science.gov (United States)

    Demestre, J; Meltzer, S; García-Albea, J E; Vigil, A

    1999-05-01

    Event-related brain potentials (ERPs) were recorded during spoken language comprehension to study the on-line effects of gender agreement violations in controlled infinitival complements. Spanish sentences were constructed in which the complement clause contained a predicate adjective marked for syntactic gender. By manipulating the gender of the antecedent (i.e., the controller) of the implicit subject while holding constant the gender of the adjective, pairs of grammatical and ungrammatical sentences were created. The detection of such a gender agreement violation would indicate that the parser had established the coreference relation between the null subject and its antecedent. The results showed a complex biphasic ERP (i.e., an early negativity with prominence at anterior and central sites, followed by a centroparietal positivity) in the violating condition as compared to the non-violating conditions. The brain reacts to NP-adjective gender agreement violations within a few hundred milliseconds of their occurrence. The data imply that the parser has properly coindexed the null subject of an infinitive clause with its antecedent.

  11. Identifying attentional bias and emotional response after appearance-related stimuli exposure.

    Science.gov (United States)

    Cho, Ara; Kwak, Soo-Min; Lee, Jang-Han

    2013-01-01

    The effect of media images has been regarded as a significant variable in the construction or in the activation of body images. Individuals who have a negative body image use avoidance coping strategies to minimize damage to their body image. We identified attentional biases and negative emotional responses following exposure to body stimuli. Female university students were divided into two groups based on their use of avoidance coping strategies (high-level group: high avoidance [HA]; low-group: low avoidance [LA]), and were assigned to two different conditions (exposure to thin body pictures, ET, and exposure to oversized body pictures, EO). Results showed that the HA group paid more attention to slim bodies and reported more negative emotions than the LA group, and that the EO had more negative effects than the ET. We suggest that HAs may attend more to slim bodies as a way of avoiding overweight bodies, influenced by social pressure, and in the search for a compensation of a positive emotional balance. However, attentional bias toward slim bodies can cause an upward comparison process, leading to increased body dissatisfaction, which is the main factor in the development of eating disorders (EDs). Therefore, altering avoidance coping strategies should be considered for people at risk of EDs.

  12. Identifying and tracking attacks on networks: C3I displays and related technologies

    Science.gov (United States)

    Manes, Gavin W.; Dawkins, J.; Shenoi, Sujeet; Hale, John C.

    2003-09-01

    Converged network security is extremely challenging for several reasons; expanded system and technology perimeters, unexpected feature interaction, and complex interfaces all conspire to provide hackers with greater opportunities for compromising large networks. Preventive security services and architectures are essential, but in and of themselves do not eliminate all threat of compromise. Attack management systems mitigate this residual risk by facilitating incident detection, analysis and response. There are a wealth of attack detection and response tools for IP networks, but a dearth of such tools for wireless and public telephone networks. Moreover, methodologies and formalisms have yet to be identified that can yield a common model for vulnerabilities and attacks in converged networks. A comprehensive attack management system must coordinate detection tools for converged networks, derive fully-integrated attack and network models, perform vulnerability and multi-stage attack analysis, support large-scale attack visualization, and orchestrate strategic responses to cyber attacks that cross network boundaries. We present an architecture that embodies these principles for attack management. The attack management system described engages a suite of detection tools for various networking domains, feeding real-time attack data to a comprehensive modeling, analysis and visualization subsystem. The resulting early warning system not only provides network administrators with a heads-up cockpit display of their entire network, it also supports guided response and predictive capabilities for multi-stage attacks in converged networks.

  13. Implementing and evaluating a regional strategy to improve testing rates in VA patients at risk for HIV, utilizing the QUERI process as a guiding framework: QUERI Series.

    Science.gov (United States)

    Goetz, Matthew B; Bowman, Candice; Hoang, Tuyen; Anaya, Henry; Osborn, Teresa; Gifford, Allen L; Asch, Steven M

    2008-03-19

    audit/feedback, provider activation, and organizational change can increase HIV testing rates for at-risk patients. We are refining our program prior to extending our work to a small-scale, multi-site evaluation (QUERI step 4/phase 2). We also plan to evaluate the durability/sustainability of the intervention effect, the costs of HIV testing, and the number of newly identified HIV-infected patients. Ultimately, we will evaluate this program in other geographically dispersed stations (QUERI step 4/phases 3 and 4).

  14. Implementing and evaluating a regional strategy to improve testing rates in VA patients at risk for HIV, utilizing the QUERI process as a guiding framework: QUERI Series

    Directory of Open Access Journals (Sweden)

    Osborn Teresa

    2008-03-01

    . Preliminary unadjusted results show that the coordinated use of audit/feedback, provider activation, and organizational change can increase HIV testing rates for at-risk patients. We are refining our program prior to extending our work to a small-scale, multi-site evaluation (QUERI step 4/phase 2. We also plan to evaluate the durability/sustainability of the intervention effect, the costs of HIV testing, and the number of newly identified HIV-infected patients. Ultimately, we will evaluate this program in other geographically dispersed stations (QUERI step 4/phases 3 and 4.

  15. A journey to Semantic Web query federation in the life sciences.

    Science.gov (United States)

    Cheung, Kei-Hoi; Frost, H Robert; Marshall, M Scott; Prud'hommeaux, Eric; Samwald, Matthias; Zhao, Jun; Paschke, Adrian

    2009-10-01

    As interest in adopting the Semantic Web in the biomedical domain continues to grow, Semantic Web technology has been evolving and maturing. A variety of technological approaches including triplestore technologies, SPARQL endpoints, Linked Data, and Vocabulary of Interlinked Datasets have emerged in recent years. In addition to the data warehouse construction, these technological approaches can be used to support dynamic query federation. As a community effort, the BioRDF task force, within the Semantic Web for Health Care and Life Sciences Interest Group, is exploring how these emerging approaches can be utilized to execute distributed queries across different neuroscience data sources. We have created two health care and life science knowledge bases. We have explored a variety of Semantic Web approaches to describe, map, and dynamically query multiple datasets. We have demonstrated several federation approaches that integrate diverse types of information about neurons and receptors that play an important role in basic, clinical, and translational neuroscience research. Particularly, we have created a prototype receptor explorer which uses OWL mappings to provide an integrated list of receptors and executes individual queries against different SPARQL endpoints. We have also employed the AIDA Toolkit, which is directed at groups of knowledge workers who cooperatively search, annotate, interpret, and enrich large collections of heterogeneous documents from diverse locations. We have explored a tool called "FeDeRate", which enables a global SPARQL query to be decomposed into subqueries against the remote databases offering either SPARQL or SQL query interfaces. Finally, we have explored how to use the vocabulary of interlinked Datasets (voiD) to create metadata for describing datasets exposed as Linked Data URIs or SPARQL endpoints. We have demonstrated the use of a set of novel and state-of-the-art Semantic Web technologies in support of a neuroscience query

  16. Algebra-Based Optimization of XML-Extended OLAP Queries

    DEFF Research Database (Denmark)

    Yin, Xuepeng; Pedersen, Torben Bach

    In today’s OLAP systems, integrating fast changing data, e.g., stock quotes, physically into a cube is complex and time-consuming. The widespread use of XML makes it very possible that this data is available in XML format on the WWW; thus, making XML data logically federated with OLAP systems...... is desirable. This report presents a complete foundation for such OLAP-XML federations. This includes a prototypical query engine, a simplified query semantics based on previous work, and a complete physical algebra which enables precise modeling of the execution tasks of an OLAP-XML query. Effective algebra...

  17. In-route skyline querying for location-based services

    DEFF Research Database (Denmark)

    Xuegang, Huang; Jensen, Kristian S.

    2005-01-01

    With the emergence of an infrastructure for location-aware mobile services, the processing of advanced, location-based queries that are expected to underlie such services is gaining in relevance, While much work has assumed that users move in Euclidean space, this paper assumes that movement...... their efficient computation. The queries take into account several spatial preferences. and they intuitively return a set of most interesting results for each result returned by the corresponding non-skyline queries. The paper also covers a performance study of the proposed techniques based on real point...

  18. Intelligent query processing for semantic mediation of information systems

    Directory of Open Access Journals (Sweden)

    Saber Benharzallah

    2011-11-01

    Full Text Available We propose an intelligent and an efficient query processing approach for semantic mediation of information systems. We propose also a generic multi agent architecture that supports our approach. Our approach focuses on the exploitation of intelligent agents for query reformulation and the use of a new technology for the semantic representation. The algorithm is self-adapted to the changes of the environment, offers a wide aptitude and solves the various data conflicts in a dynamic way; it also reformulates the query using the schema mediation method for the discovered systems and the context mediation for the other systems.

  19. A new weighted fuzzy grammar on object oriented database queries

    Directory of Open Access Journals (Sweden)

    Ali Haroonabadi

    2012-08-01

    Full Text Available The fuzzy object oriented database model is often used to handle the existing imprecise and complicated objects for many real-world applications. The main focus of this paper is on fuzzy queries and tries to analyze a complicated and complex query to get more meaningful and closer responses. The method permits the user to provide the possibility of allocating the weight to various parts of the query, which makes it easier to follow better goals and return the target objects.

  20. Relaxing rdf queries based on user and domain preferences

    DEFF Research Database (Denmark)

    Dolog, Peter; Stueckenschmidt, Heiner; Wache, Holger

    2009-01-01

    Research in cooperative query answering is triggered by the observation that users are often not able to correctly formulate queries to databases such that they return the intended result. Due to lacking knowledge about the contents and the structure of a database, users will often only be able t...... application in the context of e-learning systems....... knowledge and user preferences. We describe a framework for information access that combines query refinement and relaxation in order to provide robust, personalized access to heterogeneous resource description framework data as well as an implementation in terms of rewriting rules and explain its...

  1. Blink and it's done: Interactive queries on very large data

    OpenAIRE

    Agarwal, Sameer; Iyer, Anand P.; Panda, Aurojit; Mozafari, Barzan; Stoica, Ion; Madden, Samuel R.

    2012-01-01

    In this demonstration, we present BlinkDB, a massively parallel, sampling-based approximate query processing framework for running interactive queries on large volumes of data. The key observation in BlinkDB is that one can make reasonable decisions in the absence of perfect answers. BlinkDB extends the Hive/HDFS stack and can handle the same set of SPJA (selection, projection, join and aggregate) queries as supported by these systems. BlinkDB provides real-time answers along with statistical...

  2. jQuery 2.0 animation techniques beginner's guide

    CERN Document Server

    Culpepper, Adam

    2013-01-01

    This book is a guide to help you create attractive web page animations using jQuery. Written in a friendly and engaging approach this book is designed to be placed alongside your computer as a mentor.If you are a web designer or a frontend developer or if you want to learn how to animate the user interface of your web applications with jQuery, this book is for you. Experience with jQuery or Javascript would be helpful but solid knowledge base of HTML and CSS is assumed.

  3. "When 'Bad' is 'Good'": Identifying Personal Communication and Sentiment in Drug-Related Tweets.

    Science.gov (United States)

    Daniulaityte, Raminta; Chen, Lu; Lamy, Francois R; Carlson, Robert G; Thirunarayan, Krishnaprasad; Sheth, Amit

    2016-10-24

    To harness the full potential of social media for epidemiological surveillance of drug abuse trends, the field needs a greater level of automation in processing and analyzing social media content. The objective of the study is to describe the development of supervised machine-learning techniques for the eDrugTrends platform to automatically classify tweets by type/source of communication (personal, official/media, retail) and sentiment (positive, negative, neutral) expressed in cannabis- and synthetic cannabinoid-related tweets. Tweets were collected using Twitter streaming Application Programming Interface and filtered through the eDrugTrends platform using keywords related to cannabis, marijuana edibles, marijuana concentrates, and synthetic cannabinoids. After creating coding rules and assessing intercoder reliability, a manually labeled data set (N=4000) was developed by coding several batches of randomly selected subsets of tweets extracted from the pool of 15,623,869 collected by eDrugTrends (May-November 2015). Out of 4000 tweets, 25% (1000/4000) were used to build source classifiers and 75% (3000/4000) were used for sentiment classifiers. Logistic Regression (LR), Naive Bayes (NB), and Support Vector Machines (SVM) were used to train the classifiers. Source classification (n=1000) tested Approach 1 that used short URLs, and Approach 2 where URLs were expanded and included into the bag-of-words analysis. For sentiment classification, Approach 1 used all tweets, regardless of their source/type (n=3000), while Approach 2 applied sentiment classification to personal communication tweets only (2633/3000, 88%). Multiclass and binary classification tasks were examined, and machine-learning sentiment classifier performance was compared with Valence Aware Dictionary for sEntiment Reasoning (VADER), a lexicon and rule-based method. The performance of each classifier was assessed using 5-fold cross validation that calculated average F-scores. One-tailed t test was

  4. Occupational health surveillance: a means to identify work-related risks.

    Science.gov (United States)

    Froines, J R; Dellenbaugh, C A; Wegman, D H

    1986-09-01

    The lack of successful disease surveillance methods has resulted in few reliable estimates of workplace-related disease. Hazard surveillance--the ongoing assessment of chemical use and worker exposure to the chemicals--is presented as a way to supplement occupational disease surveillance. Existing OSHA (Occupational Safety and Health Administration) and NIOSH (National Institute for Occupational Health) data systems are adapted to this function to characterize the distribution and type of hazardous industry in Los Angeles County. A new method is developed for ranking potentially hazardous industries in the county using actual exposure measurements from federal OSHA compliance inspections. The strengths of the different systems are presented along with considerations of industrial employment and types of specific chemical exposures. Applications for information from hazard surveillance are discussed in terms of intervention, monitoring exposure control, planning, research, and as a complement to disease surveillance.

  5. SPARQL-enabled identifier conversion with Identifiers.org

    Science.gov (United States)

    Wimalaratne, Sarala M.; Bolleman, Jerven; Juty, Nick; Katayama, Toshiaki; Dumontier, Michel; Redaschi, Nicole; Le Novère, Nicolas; Hermjakob, Henning; Laibe, Camille

    2015-01-01

    Motivation: On the semantic web, in life sciences in particular, data is often distributed via multiple resources. Each of these sources is likely to use their own International Resource Identifier for conceptually the same resource or database record. The lack of correspondence between identifiers introduces a barrier when executing federated SPARQL queries across life science data. Results: We introduce a novel SPARQL-based service to enable on-the-fly integration of life science data. This service uses the identifier patterns defined in the Identifiers.org Registry to generate a plurality of identifier variants, which can then be used to match source identifiers with target identifiers. We demonstrate the utility of this identifier integration approach by answering queries across major producers of life science Linked Data. Availability and implementation: The SPARQL-based identifier conversion service is available without restriction at http://identifiers.org/services/sparql. Contact: sarala@ebi.ac.uk PMID:25638809

  6. SPARQL-enabled identifier conversion with Identifiers.org.

    Science.gov (United States)

    Wimalaratne, Sarala M; Bolleman, Jerven; Juty, Nick; Katayama, Toshiaki; Dumontier, Michel; Redaschi, Nicole; Le Novère, Nicolas; Hermjakob, Henning; Laibe, Camille

    2015-06-01

    On the semantic web, in life sciences in particular, data is often distributed via multiple resources. Each of these sources is likely to use their own International Resource Identifier for conceptually the same resource or database record. The lack of correspondence between identifiers introduces a barrier when executing federated SPARQL queries across life science data. We introduce a novel SPARQL-based service to enable on-the-fly integration of life science data. This service uses the identifier patterns defined in the Identifiers.org Registry to generate a plurality of identifier variants, which can then be used to match source identifiers with target identifiers. We demonstrate the utility of this identifier integration approach by answering queries across major producers of life science Linked Data. The SPARQL-based identifier conversion service is available without restriction at http://identifiers.org/services/sparql. © The Author 2015. Published by Oxford University Press.

  7. Identify Normative Values of Balance Tests Toward Neurological Assessment of Sports Related Concussion

    Directory of Open Access Journals (Sweden)

    Samaneh Eimanipure

    2012-04-01

    Full Text Available Objectives: Deterioration in postural control mechanisms is termed postural instability and results increased postural sway and many laboratory techniques and instruments are characterized by a wide range of neurological signs and symptoms to the medical management. Thus the current study designed to assess the reliability of commonly used clinical measures of balance and determined normal values. Also, the second purpose was scrutiny of effect age, length weight and body mass index (BMI on perform clinical balance tests. Methods: One hundred and thirty three participants (18-59 years, that have at least three time sports activity in one week, performed three timed tests: Time- up and Go (TUG, Tandem Gait (TG, and Walking on Balance Beam (WOBB on firm surface. Results: Reliability data were produced for each tests of motor performance. We found that the first performance of three trials was slower, and the relationship between some factors and these battery tests were examined. Means(±SD for each measure were averaged across three trials. Time to complete TG was 13.6±1.1s. TUG value was 6.9±1.03 and WOBB was 6.9±1.03s. Discussion: our results revealed that three clinical balance test batteries-TUG, TG and WOBB tests are the stability measures to assess of sports related concussion. Also, the results of current study appeared that the time to perform these tests was slower than the other studies.

  8. ALGORITMA RC4 DALAM PROTEKSI TRANSMISI DAN HASIL QUERY UNTUK ORDBMS POSTGRESQL

    Directory of Open Access Journals (Sweden)

    Yuri Ariyanto

    2009-01-01

    Full Text Available In this research will be worked through about how cryptography RC4's algorithm implementation in protection to query result and of query, security by encryption and descryption up to both is in network. Implementation of this research which is build software in client that function access databases that is placed by the side of server. Software that building to have facility for encryption and descryption query result and of query that is sent from client goes to server and. transmission query result and of query can secure its security. Well guaranted transmission security him of query result and of query can be told to succeed if success software can encryption query result and of query which transmission so that in the event of scanning to both, scanning will not understand data content. Conclusion of this research that is woke up software succeed encryption query and result of query which transmission between application of client and of server databases. Abstract in Bahasa Indonesia: Pada penelitian ini dibahas mengenai bagaimana mengimplementasikan algoritma kriptografi RC4 dalam proteksi terhadap query dan hasil query, pengamanan dilakukan dengan cara melakukan enkripsi dan dekripsi selama keduanya berada di dalam jaringan. Pengimplementasian dari penelitian ini yaitu membangun sebuah software yang akan diletakkan di sisi client yang berfungsi mengakses database yang diletakkan di sisi server. Software yang dibangun memiliki fasilitas untuk mengenkripsi dan mendektipsi query dan hasil query yang dikirimkan dari client ke server dan juga sebaliknya. Dengan demikian tramsmisi query dan hasil query dapat terjamin keamanannya.Terjaminnya keamanan transmisi query dan hasil query dapat dikatakan berhasil jika software berhasil mengenkripsi query dan hasil query yang ditransmisikan sehingga apabila terjadi penyadapan terhadap keduanya, penyadap tidak akan mengerti isi data tersebut. Kesimpulan dari penelitian ini yaitu software yang dibangun

  9. Identifying best practice in relation to Iodine-131 ablation discharges to sewers in Ireland

    International Nuclear Information System (INIS)

    Ryan, Thomas P.; Fennell, Stepehn; McGarry, Ann; Punt, Adrian

    2008-01-01

    Full text: In line with a commitment in Ireland's strategy on the implementation of the Oslo-Paris Convention (OSPAR) as well as recent developments in the provision of national oncology services, best practice in relation to Iodine-131 (I-131) ablation discharges to sewers in Ireland is under review. Preparatory to this review the Radiological Protection Institute of Ireland (RPII) commissioned a study of current practices in Ireland and associated doses as well as a review of international best practice and advice. Currently, there are three hospitals in Ireland at which thyroid ablation therapy procedures (large therapeutic administrations of I-131 for thyroid cancer treatment) are carried out. Only one facility has a limited capacity for delay and decay storage prior to discharge. Based on current administrations and discharges, the potentially most exposed workers comprises hospital plumbers dealing with specific incidents with estimated doses in the order of 50 to 70 μSv per incident. Doses to sewage workers and fishing communities are estimated at less than 4 μSv y -1 and 0.4 μSv y -1 respectively. Based on anticipated future service requirements in Ireland, a marginal increase in doses to some of these groups is estimated. Iodine-131 discharges to the environment and associated doses to workers and members of the public may be significantly reduced by the introduction of state-of-the-art delay and decay tanks using multi-tank vacuum systems. The justification for the introduction of a regulatory requirement to install such systems is examined taking account of: discharges to the environment, concentrations in the environment, doses to workers and members of the public, best available techniques (BAT), international best practice and advice as well as the financial implications for medical facilities. The case for retrofitting delay and decay tanks to existing medical facilities is also examined as well as the option of including such tanks in the

  10. Using event related potentials to identify a user's behavioural intention aroused by product form design.

    Science.gov (United States)

    Ding, Yi; Guo, Fu; Zhang, Xuefeng; Qu, Qingxing; Liu, Weilin

    2016-07-01

    The capacity of product form to arouse user's behavioural intention plays a decisive role in further user experience, even in purchase decision, while traditional methods rarely give a fully understanding of user experience evoked by product form, especially the feeling of anticipated use of product. Behavioural intention aroused by product form designs has not yet been investigated electrophysiologically. Hence event related potentials (ERPs) were applied to explore the process of behavioural intention when users browsed different smart phone form designs with brand and price not taken into account for mainly studying the brain activity evoked by variety of product forms. Smart phone pictures with different anticipated user experience were displayed with equiprobability randomly. Participants were asked to click the left mouse button when certain picture gave them a feeling of behavioural intention to interact with. The brain signal of each participant was recorded by Curry 7.0. The results show that pictures with an ability to arouse participants' behavioural intention for further experience can evoke enhanced N300 and LPPs (late positive potentials) in central-parietal, parietal and occipital regions. The scalp topography shows that central-parietal, parietal and occipital regions are more activated. The results indicate that the discrepancy of ERPs can reflect the neural activities of behavioural intention formed or not. Moreover, amplitude of ERPs occurred in corresponding brain areas can be used to measure user experience. The exploring of neural correlated with behavioural intention provide an accurate measurement method of user's perception and help marketers to know which product can arouse users' behavioural intention, maybe taken as an evaluating indicator of product design. Copyright © 2016 Elsevier Ltd and The Ergonomics Society. All rights reserved.

  11. Domainwise Web Page Optimization Based On Clustered Query Sessions Using Hybrid Of Trust And ACO For Effective Information Retrieval

    Directory of Open Access Journals (Sweden)

    Dr. Suruchi Chawla

    2015-08-01

    Full Text Available Abstract In this paper hybrid of Ant Colony OptimizationACO and trust has been used for domainwise web page optimization in clustered query sessions for effective Information retrieval. The trust of the web page identifies its degree of relevance in satisfying specific information need of the user. The trusted web pages when optimized using pheromone updates in ACO will identify the trusted colonies of web pages which will be relevant to users information need in a given domain. Hence in this paper the hybrid of Trust and ACO has been used on clustered query sessions for identifying more and more relevant number of documents in a given domain in order to better satisfy the information need of the user. Experiment was conducted on the data set of web query sessions to test the effectiveness of the proposed approach in selected three domains Academics Entertainment and Sports and the results confirm the improvement in the precision of search results.

  12. Technologies for conceptual modelling and intelligent query formulation

    CSIR Research Space (South Africa)

    Alberts, R

    2008-11-01

    Full Text Available The aim of the project is to devise and evaluate algorithms, methodologies, techniques and interaction paradigms to build a tool for conceptual modelling and query management of complex data repositories based on a framework with solid formal...

  13. External Data Structures for Shortest Path Queries on Planar Digraphs

    DEFF Research Database (Denmark)

    Arge, Lars; Toma, Laura

    2005-01-01

    In this paper we present space-query trade-offs for external memory data structures that answer shortest path queries on planar directed graphs. For any S = Ω(N 1 + ε) and S = O(N2/B), our main result is a family of structures that use S space and answer queries in O(N2/ S B) I/Os, thus obtaining...... optimal space-query product O(N2/B). An S space structure can be constructed in O(√S · sort(N)) I/Os, where sort(N) is the number of I/Os needed to sort N elements, B is the disk block size, and N is the size of the graph....

  14. An Approach to Assist Designers With Their Queries and Designs

    DEFF Research Database (Denmark)

    Ahmed, Saeema

    2006-01-01

    Recent research investigating how engineers search for information has concluded that engineering designers acquire assistance when formulating queries. An approach to assist designers with their queries is presented. This approach forms part of a knowledge management system, where indexed...... documents are entered into the system (or are automatically indexed by tools within a system). The method builds up a network based upon indices assigned to documents. The network (or chunk) is presented back to the user once a search for knowledge has been completed. The network is build up as indexed...... documents are entered in to a knowledge-based system and is generated dynamically. The network can be used to assist a designer in searching for information; reformulating a query and; to prompt design tasks. This paper presents an approach to prompt designers with their design queries, along with some...

  15. An introduction to XML query processing and keyword search

    CERN Document Server

    Lu, Jiaheng

    2013-01-01

    This book systematically and comprehensively covers the latest advances in XML data searching. It presents an extensive overview of the current query processing and keyword search techniques on XML data.

  16. Determinacy in Static Analysis of jQuery

    DEFF Research Database (Denmark)

    Andreasen, Esben; Møller, Anders

    2014-01-01

    Static analysis for JavaScript can potentially help programmers find errors early during development. Although much progress has been made on analysis techniques, a major obstacle is the prevalence of libraries, in particular jQuery, which apply programming patterns that have detrimental conseque......Static analysis for JavaScript can potentially help programmers find errors early during development. Although much progress has been made on analysis techniques, a major obstacle is the prevalence of libraries, in particular jQuery, which apply programming patterns that have detrimental...... present a static dataflow analysis for JavaScript that infers and exploits determinacy information on-the-fly, to enable analysis of some of the most complex parts of jQuery. The techniques are implemented in the TAJS analysis tool and evaluated on a collection of small programs that use jQuery. Our...

  17. Parasol: An Architecture for Cross-Cloud Federated Graph Querying

    Energy Technology Data Exchange (ETDEWEB)

    Lieberman, Michael; Choudhury, Sutanay; Hughes, Marisa; Patrone, Dennis; Hider, Sandy; Piatko, Christine; Chapman, Matthew; Marple, JP; Silberberg, David

    2014-06-22

    Large scale data fusion of multiple datasets can often provide in- sights that examining datasets individually cannot. However, when these datasets reside in different data centers and cannot be collocated due to technical, administrative, or policy barriers, a unique set of problems arise that hamper querying and data fusion. To ad- dress these problems, a system and architecture named Parasol is presented that enables federated queries over graph databases residing in multiple clouds. Parasol’s design is flexible and requires only minimal assumptions for participant clouds. Query optimization techniques are also described that are compatible with Parasol’s lightweight architecture. Experiments on a prototype implementation of Parasol indicate its suitability for cross-cloud federated graph queries.

  18. Matching health information seekers' queries to medical terms.

    Science.gov (United States)

    Soualmia, Lina F; Prieur-Gaston, Elise; Moalla, Zied; Lecroq, Thierry; Darmoni, Stéfan J

    2012-01-01

    The Internet is a major source of health information but most seekers are not familiar with medical vocabularies. Hence, their searches fail due to bad query formulation. Several methods have been proposed to improve information retrieval: query expansion, syntactic and semantic techniques or knowledge-based methods. However, it would be useful to clean those queries which are misspelled. In this paper, we propose a simple yet efficient method in order to correct misspellings of queries submitted by health information seekers to a medical online search tool. In addition to query normalizations and exact phonetic term matching, we tested two approximate string comparators: the similarity score function of Stoilos and the normalized Levenshtein edit distance. We propose here to combine them to increase the number of matched medical terms in French. We first took a sample of query logs to determine the thresholds and processing times. In the second run, at a greater scale we tested different combinations of query normalizations before or after misspelling correction with the retained thresholds in the first run. According to the total number of suggestions (around 163, the number of the first sample of queries), at a threshold comparator score of 0.3, the normalized Levenshtein edit distance gave the highest F-Measure (88.15%) and at a threshold comparator score of 0.7, the Stoilos function gave the highest F-Measure (84.31%). By combining Levenshtein and Stoilos, the highest F-Measure (80.28%) is obtained with 0.2 and 0.7 thresholds respectively. However, queries are composed by several terms that may be combination of medical terms. The process of query normalization and segmentation is thus required. The highest F-Measure (64.18%) is obtained when this process is realized before spelling-correction. Despite the widely known high performance of the normalized edit distance of Levenshtein, we show in this paper that its combination with the Stoilos algorithm improved

  19. Mining Genotype-Phenotype Associations from Public Knowledge Sources via Semantic Web Querying.

    Science.gov (United States)

    Kiefer, Richard C; Freimuth, Robert R; Chute, Christopher G; Pathak, Jyotishman

    2013-01-01

    Gene Wiki Plus (GeneWiki+) and the Online Mendelian Inheritance in Man (OMIM) are publicly available resources for sharing information about disease-gene and gene-SNP associations in humans. While immensely useful to the scientific community, both resources are manually curated, thereby making the data entry and publication process time-consuming, and to some degree, error-prone. To this end, this study investigates Semantic Web technologies to validate existing and potentially discover new genotype-phenotype associations in GWP and OMIM. In particular, we demonstrate the applicability of SPARQL queries for identifying associations not explicitly stated for commonly occurring chronic diseases in GWP and OMIM, and report our preliminary findings for coverage, completeness, and validity of the associations. Our results highlight the benefits of Semantic Web querying technology to validate existing disease-gene associations as well as identify novel associations although further evaluation and analysis is required before such information can be applied and used effectively.

  20. GBA manager: an online tool for querying low-complexity regions in proteins.

    Science.gov (United States)

    Bandyopadhyay, Nirmalya; Kahveci, Tamer

    2010-01-01

    Abstract We developed GBA Manager, an online software that facilitates the Graph-Based Algorithm (GBA) we proposed in our earlier work. GBA identifies the low-complexity regions (LCR) of protein sequences. GBA exploits a similarity matrix, such as BLOSUM62, to compute the complexity of the subsequences of the input protein sequence. It uses a graph-based algorithm to accurately compute the regions that have low complexities. GBA Manager is a user friendly web-service that enables online querying of protein sequences using GBA. In addition to querying capabilities of the existing GBA algorithm, GBA Manager computes the p-values of the LCR identified. The p-value gives an estimate of the possibility that the region appears by chance. GBA Manager presents the output in three different understandable formats. GBA Manager is freely accessible at http://bioinformatics.cise.ufl.edu/GBA/GBA.htm .

  1. Query-by-Example Music Information Retrieval by Score-Informed Source Separation and Remixing Technologies

    Directory of Open Access Journals (Sweden)

    Goto Masataka

    2010-01-01

    Full Text Available We describe a novel query-by-example (QBE approach in music information retrieval that allows a user to customize query examples by directly modifying the volume of different instrument parts. The underlying hypothesis of this approach is that the musical mood of retrieved results changes in relation to the volume balance of different instruments. On the basis of this hypothesis, we aim to clarify the relationship between the change in the volume balance of a query and the genre of the retrieved pieces, called genre classification shift. Such an understanding would allow us to instruct users in how to generate alternative queries without finding other appropriate pieces. Our QBE system first separates all instrument parts from the audio signal of a piece with the help of its musical score, and then it allows users remix these parts to change the acoustic features that represent the musical mood of the piece. Experimental results showed that the genre classification shift was actually caused by the volume change in the vocal, guitar, and drum parts.

  2. Practical Forward-Secure Range and Sort Queries with Update-Oblivious Linked Lists

    Directory of Open Access Journals (Sweden)

    Blass Erik-Oliver

    2015-06-01

    Full Text Available We revisit the problem of privacy-preserving range search and sort queries on encrypted data in the face of an untrusted data store. Our new protocol RASP has several advantages over existing work. First, RASP strengthens privacy by ensuring forward security: after a query for range [a, b], any new record added to the data store is indistinguishable from random, even if the new record falls within range [a, b]. We are able to accomplish this using only traditional hash and block cipher operations, abstaining from expensive asymmetric cryptography and bilinear pairings. Consequently, RASP is highly practical, even for large database sizes. Additionally, we require only cloud storage and not a computational cloud like related works, which can reduce monetary costs significantly. At the heart of RASP, we develop a new update-oblivious bucket-based data structure. We allow for data to be added to buckets without leaking into which bucket it has been added. As long as a bucket is not explicitly queried, the data store does not learn anything about bucket contents. Furthermore, no information is leaked about data additions following a query. Besides formally proving RASP’s privacy, we also present a practical evaluation of RASP on Amazon Dynamo, demonstrating its efficiency and real world applicability.

  3. Spatial Search Techniques for Mobile 3D Queries in Sensor Web Environments

    Directory of Open Access Journals (Sweden)

    James D. Carswell

    2013-03-01

    Full Text Available Developing mobile geo-information systems for sensor web applications involves technologies that can access linked geographical and semantically related Internet information. Additionally, in tomorrow’s Web 4.0 world, it is envisioned that trillions of inexpensive micro-sensors placed throughout the environment will also become available for discovery based on their unique geo-referenced IP address. Exploring these enormous volumes of disparate heterogeneous data on today’s location and orientation aware smartphones requires context-aware smart applications and services that can deal with “information overload”. 3DQ (Three Dimensional Query is our novel mobile spatial interaction (MSI prototype that acts as a next-generation base for human interaction within such geospatial sensor web environments/urban landscapes. It filters information using “Hidden Query Removal” functionality that intelligently refines the search space by calculating the geometry of a three dimensional visibility shape (Vista space at a user’s current location. This 3D shape then becomes the query “window” in a spatial database for retrieving information on only those objects visible within a user’s actual 3D field-of-view. 3DQ reduces information overload and serves to heighten situation awareness on constrained commercial off-the-shelf devices by providing visibility space searching as a mobile web service. The effects of variations in mobile spatial search techniques in terms of query speed vs. accuracy are evaluated and presented in this paper.

  4. Linked data querying through FCA-based schema indexing

    OpenAIRE

    Brosius, Dominik; Staab, Steffen

    2016-01-01

    The effciency of SPARQL query evaluation against Linked Open Data may benefit from schema-based indexing. However, many data items come with incomplete schema information or lack schema descriptions entirely. In this position paper, we outline an approach to an indexing of linked data graphs based on schemata induced through Formal Concept Analysis. We show how to map queries onto RDF graphs based on such derived schema information. We sketch next steps for realizing and optimizing the sugges...

  5. Inductive queries for a drug designing robot scientist

    OpenAIRE

    King, Ross D.; Schierz, Amanda; Clare, Amanda; Rowland, Jem; Sparkes, Andrew; Nijssen, Siegfried; Ramon, Jan

    2010-01-01

    It is increasingly clear that machine learning algorithms need to be integrated in an iterative scientific discovery loop, in which data is queried repeatedly by means of inductive queries and where the computer provides guidance to the experiments that are being performed. In this chapter, we summarise several key challenges in achieving this integration of machine learning and data mining algorithms in methods for the discovery of Quantitative Structure Activity Relationships (QSARs). We in...

  6. Two Dimensional Range Minimum Queries and Fibonacci Lattices

    DEFF Research Database (Denmark)

    Brodal, Gerth Stølting; Davoodi, Pooya; Lewenstein, Moshe

    2012-01-01

    technique—the discrepancy properties of Fibonacci lattices—we give an indexing data structure for 2D-RMQs that uses O(N/c) bits additional space with O(clogc(loglogc)2) query time, for any parameter c, 4 ≤ c ≤ N. Also, when the entries of the input matrix are from {0,1}, we show that the query time can...

  7. [On the seasonality of dermatoses: a retrospective analysis of search engine query data depending on the season].

    Science.gov (United States)

    Köhler, M J; Springer, S; Kaatz, M

    2014-09-01

    The volume of search engine queries about disease-relevant items reflects public interest and correlates with disease prevalence as proven by the example of flu (influenza). Other influences include media attention or holidays. The present work investigates if the seasonality of prevalence or symptom severity of dermatoses correlates with search engine query data. The relative weekly volume of dermatological relevant search terms was assessed by the online tool Google Trends for the years 2009-2013. For each item, the degree of seasonality was calculated via frequency analysis and a geometric approach. Many dermatoses show a marked seasonality, reflected by search engine query volumes. Unexpected seasonal variations of these queries suggest a previously unknown variability of the respective disease prevalence. Furthermore, using the example of allergic rhinitis, a close correlation of search engine query data with actual pollen count can be demonstrated. In many cases, search engine query data are appropriate to estimate seasonal variability in prevalence of common dermatoses. This finding may be useful for real-time analysis and formation of hypotheses concerning pathogenetic or symptom aggravating mechanisms and may thus contribute to improvement of diagnostics and prevention of skin diseases.

  8. Development and Assessment of a Diagnostic Tool to Identify Organic Chemistry Students' Alternative Conceptions Related to Acid Strength

    Science.gov (United States)

    McClary, LaKeisha M.; Bretz, Stacey Lowery

    2012-01-01

    The central goal of this study was to create a new diagnostic tool to identify organic chemistry students' alternative conceptions related to acid strength. Twenty years of research on secondary and college students' conceptions about acids and bases has shown that these important concepts are difficult for students to apply to qualitative problem…

  9. Representation and alignment of sung queries for music information retrieval

    Science.gov (United States)

    Adams, Norman H.; Wakefield, Gregory H.

    2005-09-01

    The pursuit of robust and rapid query-by-humming systems, which search melodic databases using sung queries, is a common theme in music information retrieval. The retrieval aspect of this database problem has received considerable attention, whereas the front-end processing of sung queries and the data structure to represent melodies has been based on musical intuition and historical momentum. The present work explores three time series representations for sung queries: a sequence of notes, a ``smooth'' pitch contour, and a sequence of pitch histograms. The performance of the three representations is compared using a collection of naturally sung queries. It is found that the most robust performance is achieved by the representation with highest dimension, the smooth pitch contour, but that this representation presents a formidable computational burden. For all three representations, it is necessary to align the query and target in order to achieve robust performance. The computational cost of the alignment is quadratic, hence it is necessary to keep the dimension small for rapid retrieval. Accordingly, iterative deepening is employed to achieve both robust performance and rapid retrieval. Finally, the conventional iterative framework is expanded to adapt the alignment constraints based on previous iterations, further expediting retrieval without degrading performance.

  10. How to improve your PubMed/MEDLINE searches: 2. display settings, complex search queries and topic searching.

    Science.gov (United States)

    Fatehi, Farhad; Gray, Leonard C; Wootton, Richard

    2014-01-01

    The way that PubMed results are displayed can be changed using the Display Settings drop-down menu in the result screen. There are three groups of options: Format, Items per page and Sort by, which allow a good deal of control. The results from several searches can be temporarily stored on the Clipboard. Records of interest can be selected on the results page using check boxes and can then be combined, for example to form a reference list. The Related Citations is a valuable feature of PubMed that can provide a set of similar articles when you have identified a record of interest among the results. You can easily search for RCTs or reviews using the appropriate filters or field tags. If you are interested in clinical articles, rather than basic science or health service research, then the Clinical Queries tool on the PubMed home page can be used to retrieve them.

  11. TopFed: TCGA tailored federated query processing and linking to LOD.

    Science.gov (United States)

    Saleem, Muhammad; Padmanabhuni, Shanmukha S; Ngomo, Axel-Cyrille Ngonga; Iqbal, Aftab; Almeida, Jonas S; Decker, Stefan; Deus, Helena F

    2014-01-01

    The Cancer Genome Atlas (TCGA) is a multidisciplinary, multi-institutional effort to catalogue genetic mutations responsible for cancer using genome analysis techniques. One of the aims of this project is to create a comprehensive and open repository of cancer related molecular analysis, to be exploited by bioinformaticians towards advancing cancer knowledge. However, devising bioinformatics applications to analyse such large dataset is still challenging, as it often requires downloading large archives and parsing the relevant text files. Therefore, it is making it difficult to enable virtual data integration in order to collect the critical co-variates necessary for analysis. We address these issues by transforming the TCGA data into the Semantic Web standard Resource Description Format (RDF), link it to relevant datasets in the Linked Open Data (LOD) cloud and further propose an efficient data distribution strategy to host the resulting 20.4 billion triples data via several SPARQL endpoints. Having the TCGA data distributed across multiple SPARQL endpoints, we enable biomedical scientists to query and retrieve information from these SPARQL endpoints by proposing a TCGA tailored federated SPARQL query processing engine named TopFed. We compare TopFed with a well established federation engine FedX in terms of source selection and query execution time by using 10 different federated SPARQL queries with varying requirements. Our evaluation results show that TopFed selects on average less than half of the sources (with 100% recall) with query execution time equal to one third to that of FedX. With TopFed, we aim to offer biomedical scientists a single-point-of-access through which distributed TCGA data can be accessed in unison. We believe the proposed system can greatly help researchers in the biomedical domain to carry out their research effectively with TCGA as the amount and diversity of data exceeds the ability of local resources to handle its retrieval and

  12. Identifying insects with incomplete DNA barcode libraries, African fruit flies (Diptera: Tephritidae) as a test case.

    Science.gov (United States)

    Virgilio, Massimiliano; Jordaens, Kurt; Breman, Floris C; Backeljau, Thierry; De Meyer, Marc

    2012-01-01

    We propose a general working strategy to deal with incomplete reference libraries in the DNA barcoding identification of species. Considering that (1) queries with a large genetic distance with their best DNA barcode match are more likely to be misidentified and (2) imposing a distance threshold profitably reduces identification errors, we modelled relationships between identification performances and distance thresholds in four DNA barcode libraries of Diptera (n = 4270), Lepidoptera (n = 7577), Hymenoptera (n = 2067) and Tephritidae (n = 602 DNA barcodes). In all cases, more restrictive distance thresholds produced a gradual increase in the proportion of true negatives, a gradual decrease of false positives and more abrupt variations in the proportions of true positives and false negatives. More restrictive distance thresholds improved precision, yet negatively affected accuracy due to the higher proportions of queries discarded (viz. having a distance query-best match above the threshold). Using a simple linear regression we calculated an ad hoc distance threshold for the tephritid library producing an estimated relative identification error DNA barcodes and should be used as cut-off mark defining whether we can proceed identifying the query with a known estimated error probability (e.g. 5%) or whether we should discard the query and consider alternative/complementary identification methods.

  13. Refining Current Scientific Priorities and Identifying New Scientific Gaps in HIV-Related Heart, Lung, Blood, and Sleep Research.

    Science.gov (United States)

    Twigg, Homer L; Crystal, Ronald; Currier, Judith; Ridker, Paul; Berliner, Nancy; Kiem, Hans-Peter; Rutherford, George; Zou, Shimian; Glynn, Simone; Wong, Renee; Peprah, Emmanuel; Engelgau, Michael; Creazzo, Tony; Colombini-Hatch, Sandra; Caler, Elisabet

    2017-09-01

    The National Heart, Lung, and Blood Institute (NHLBI) AIDS Program's goal is to provide direction and support for research and training programs in areas of HIV-related heart, lung, blood, and sleep (HLBS) diseases. To better define NHLBI current HIV-related scientific priorities and with the goal of identifying new scientific priorities and gaps in HIV-related HLBS research, a wide group of investigators gathered for a scientific NHLBI HIV Working Group on December 14-15, 2015, in Bethesda, MD. The core objectives of the Working Group included discussions on: (1) HIV-related HLBS comorbidities in the antiretroviral era; (2) HIV cure; (3) HIV prevention; and (4) mechanisms to implement new scientific discoveries in an efficient and timely manner so as to have the most impact on people living with HIV. The 2015 Working Group represented an opportunity for the NHLBI to obtain expert advice on HIV/AIDS scientific priorities and approaches over the next decade.

  14. Four Closely Related HIV-1 CRF01_AE/CRF07_BC Recombinant Forms Identified in East China.

    Science.gov (United States)

    Li, Fan; Li, Yuxueyun; Feng, Yi; Hu, Jing; Ruan, Yuhua; Xing, Hui; Shao, Yiming

    2017-07-01

    Five near full-length genomes of novel second-generation HIV-1 recombinant virus (JS150021, JS150029, JS150129, JS150132, and AH150183) were identified from five HIV-positive people in Jiangsu and Anhui province, east China. Phylogenic analyses showed that these five sequences are all composed of two well-established circulating recombinant forms (CRFs) CRF07_BC and CRF01_AE, grouped into four new discovered recombinant forms, which show several very similar but not identical recombinant breakpoints. The four recombinant forms are also identified to be a sort of family or related viruses, seems to be the results of different recombination events. The emergence of a serious new closely related CRF07_BC/CRF01_AE recombinant strain indicates the increasing complexity of sexual transmission of the HIV-1 epidemic in China.

  15. The Shared Health Research Information Network (SHRINE): a prototype federated query tool for clinical data repositories.

    Science.gov (United States)

    Weber, Griffin M; Murphy, Shawn N; McMurry, Andrew J; Macfadden, Douglas; Nigrin, Daniel J; Churchill, Susanne; Kohane, Isaac S

    2009-01-01

    The authors developed a prototype Shared Health Research Information Network (SHRINE) to identify the technical, regulatory, and political challenges of creating a federated query tool for clinical data repositories. Separate Institutional Review Boards (IRBs) at Harvard's three largest affiliated health centers approved use of their data, and the Harvard Medical School IRB approved building a Query Aggregator Interface that can simultaneously send queries to each hospital and display aggregate counts of the number of matching patients. Our experience creating three local repositories using the open source Informatics for Integrating Biology and the Bedside (i2b2) platform can be used as a road map for other institutions. The authors are actively working with the IRBs and regulatory groups to develop procedures that will ultimately allow investigators to obtain identified patient data and biomaterials through SHRINE. This will guide us in creating a future technical architecture that is scalable to a national level, compliant with ethical guidelines, and protective of the interests of the participating hospitals.

  16. Accelerating SPARQL Queries and Analytics on RDF Data

    KAUST Repository

    Al-Harbi, Razen

    2016-11-09

    The complexity of SPARQL queries and RDF applications poses great challenges on distributed RDF management systems. SPARQL workloads are dynamic and con- sist of queries with variable complexities. Hence, systems that use static partitioning su↵er from communication overhead for workloads that generate excessive communi- cation. Concurrently, RDF applications are becoming more sophisticated, mandating analytical operations that extend beyond SPARQL queries. Being primarily designed and optimized to execute SPARQL queries, which lack procedural capabilities, exist- ing systems are not suitable for rich RDF analytics. This dissertation tackles the problem of accelerating SPARQL queries and RDF analytics on distributed shared-nothing RDF systems. First, a distributed RDF en- gine, coined AdPart, is introduced. AdPart uses lightweight hash partitioning for sharding triples using their subject values; rendering its startup overhead very low. The locality-aware query optimizer of AdPart takes full advantage of the partition- ing to (i) support the fully parallel processing of join patterns on subjects and (ii) minimize data communication for general queries by applying hash distribution of intermediate results instead of broadcasting, wherever possible. By exploiting hash- based locality, AdPart achieves better or comparable performance to systems that employ sophisticated partitioning schemes. To cope with workloads dynamism, AdPart is extended to dynamically adapt to workload changes. AdPart monitors the data access patterns and dynamically redis- tributes and replicates the instances of the most frequent patterns among workers.Consequently, the communication cost for future queries is drastically reduced or even eliminated. Experiments with synthetic and real data verify that AdPart starts faster than all existing systems and gracefully adapts to the query load. Finally, to support and accelerate rich RDF analytical tasks, a vertex-centric RDF analytics framework is

  17. Developing a national dissemination plan for collaborative care for depression: QUERI Series

    Directory of Open Access Journals (Sweden)

    Rubenstein Lisa V

    2008-12-01

    Full Text Available Abstract Background Little is known about effective strategies for disseminating and implementing complex clinical innovations across large healthcare systems. This paper describes processes undertaken and tools developed by the U.S. Department of Veterans Affairs (VA Mental Health Quality Enhancement Research Initiative (MH-QUERI to guide its efforts to partner with clinical leaders to prepare for national dissemination and implementation of collaborative care for depression. Methods An evidence-based quality improvement (EBQI process was used to develop an initial set of goals to prepare the VA for national dissemination and implementation of collaborative care. The resulting product of the EBQI process is referred to herein as a "National Dissemination Plan" (NDP. EBQI participants included: a researchers with expertise on the collaborative care model for depression, clinical quality improvement, and implementation science, and b VA clinical and administrative leaders with experience and expertise on how to adapt research evidence to organizational needs, resources and capacity. Based on EBQI participant feedback, drafts of the NDP were revised and refined over multiple iterations before a final version was approved by MH-QUERI leadership. 'Action Teams' were created to address each goal. A formative evaluation framework and related tools were developed to document processes, monitor progress, and identify and act upon barriers and facilitators in addressing NDP goals. Results The National Dissemination Plan suggests that effectively disseminating collaborative care for depression in the VA will likely require attention to: Guidelines and Quality Indicators (4 goals, Training in Clinical Processes and Evidence-based Quality Improvement (6 goals, Marketing (7 goals, and Informatics Support (1 goal. Action Teams are using the NDP as a blueprint for developing infrastructure to support system-wide adoption and sustained implementation of

  18. Parallel Index and Query for Large Scale Data Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Chou, Jerry; Wu, Kesheng; Ruebel, Oliver; Howison, Mark; Qiang, Ji; Prabhat,; Austin, Brian; Bethel, E. Wes; Ryne, Rob D.; Shoshani, Arie

    2011-07-18

    Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies are critical for facilitating interactive exploration of large datasets, but numerous challenges remain in terms of designing a system for process- ing general scientific datasets. The system needs to be able to run on distributed multi-core platforms, efficiently utilize underlying I/O infrastructure, and scale to massive datasets. We present FastQuery, a novel software framework that address these challenges. FastQuery utilizes a state-of-the-art index and query technology (FastBit) and is designed to process mas- sive datasets on modern supercomputing platforms. We apply FastQuery to processing of a massive 50TB dataset generated by a large scale accelerator modeling code. We demonstrate the scalability of the tool to 11,520 cores. Motivated by the scientific need to search for inter- esting particles in this dataset, we use our framework to reduce search time from hours to tens of seconds.

  19. Secure Nearest Neighbor Query on Crowd-Sensing Data

    Directory of Open Access Journals (Sweden)

    Ke Cheng

    2016-09-01

    Full Text Available Nearest neighbor queries are fundamental in location-based services, and secure nearest neighbor queries mainly focus on how to securely and quickly retrieve the nearest neighbor in the outsourced cloud server. However, the previous big data system structure has changed because of the crowd-sensing data. On the one hand, sensing data terminals as the data owner are numerous and mistrustful, while, on the other hand, in most cases, the terminals find it difficult to finish many safety operation due to computation and storage capability constraints. In light of they Multi Owners and Multi Users (MOMU situation in the crowd-sensing data cloud environment, this paper presents a secure nearest neighbor query scheme based on the proxy server architecture, which is constructed by protocols of secure two-party computation and secure Voronoi diagram algorithm. It not only preserves the data confidentiality and query privacy but also effectively resists the collusion between the cloud server and the data owners or users. Finally, extensive theoretical and experimental evaluations are presented to show that our proposed scheme achieves a superior balance between the security and query performance compared to other schemes.

  20. Application of Machine Learning Algorithms for the Query Performance Prediction

    Directory of Open Access Journals (Sweden)

    MILICEVIC, M.

    2015-08-01

    Full Text Available This paper analyzes the relationship between the system load/throughput and the query response time in a real Online transaction processing (OLTP system environment. Although OLTP systems are characterized by short transactions, which normally entail high availability and consistent short response times, the need for operational reporting may jeopardize these objectives. We suggest a new approach to performance prediction for concurrent database workloads, based on the system state vector which consists of 36 attributes. There is no bias to the importance of certain attributes, but the machine learning methods are used to determine which attributes better describe the behavior of the particular database server and how to model that system. During the learning phase, the system's profile is created using multiple reference queries, which are selected to represent frequent business processes. The possibility of the accurate response time prediction may be a foundation for automated decision-making for database (DB query scheduling. Possible applications of the proposed method include adaptive resource allocation, quality of service (QoS management or real-time dynamic query scheduling (e.g. estimation of the optimal moment for a complex query execution.

  1. Effective Density Queries of Continuously Moving Objects

    DEFF Research Database (Denmark)

    Jensen, Christian Søndergaard; Lin, D.; Ooi, B.C.

    2006-01-01

    control system, we need to identify the places that are or would be affected by a traffic jam, and report this information to drivers so that they can choose a less congested route. As a naive way to solve the problem is prohibitively expensive, we first introduce a framework which makes the problem...... manageable. Then we propose efficient algorithms to realize this framework based on the Bx-tree index. Our extensive experimentation proves the efficiency of our algorithms....

  2. Feasibility of a self-administered survey to identify primary care patients at risk of medication-related problems

    Directory of Open Access Journals (Sweden)

    Makowsky MJ

    2014-02-01

    Full Text Available Mark J Makowsky,1 Andrew J Cave,2 Scot H Simpson1 1Faculty of Pharmacy and Pharmaceutical Sciences, 2Department of Family Medicine, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, AB, Canada Background and objectives: Pharmacists working in primary care clinics are well positioned to help optimize medication management of community-dwelling patients who are at high risk of experiencing medication-related problems. However, it is often difficult to identify these patients. Our objective was to test the feasibility of a self-administered patient survey, to facilitate identification of patients at high risk of medication-related problems in a family medicine clinic. Methods: We conducted a cross-sectional, paper-based survey at the University of Alberta Hospital Family Medicine Clinic in Edmonton, Alberta, which serves approximately 7,000 patients, with 25,000 consultations per year. Adult patients attending the clinic were invited to complete a ten-item questionnaire, adapted from previously validated surveys, while waiting to be seen by the physician. Outcomes of interest included: time to complete the questionnaire, staff feedback regarding impact on workflow, and the proportion of patients who reported three or more risk factors for medication-related problems. Results: The questionnaire took less than 5 minutes to complete, according to the patient's report on the last page of the questionnaire. The median age (and interquartile range of respondents was 57 (45–69 years; 59% were women; 47% reported being in very good or excellent health; 43 respondents of 100 had three or more risk factors, and met the definition for being at high risk of a medication-related problem. Conclusions: Distribution of a self-administered questionnaire did not disrupt patients, or the clinic workflow, and identified an important proportion of patients at high risk of medication-related problems. Keywords: screening tool, pharmacists, primary

  3. Generating and Executing Complex Natural Language Queries across Linked Data.

    Science.gov (United States)

    Hamon, Thierry; Mougin, Fleur; Grabar, Natalia

    2015-01-01

    With the recent and intensive research in the biomedical area, the knowledge accumulated is disseminated through various knowledge bases. Links between these knowledge bases are needed in order to use them jointly. Linked Data, SPARQL language, and interfaces in Natural Language question-answering provide interesting solutions for querying such knowledge bases. We propose a method for translating natural language questions in SPARQL queries. We use Natural Language Processing tools, semantic resources, and the RDF triples description. The method is designed on 50 questions over 3 biomedical knowledge bases, and evaluated on 27 questions. It achieves 0.78 F-measure on the test set. The method for translating natural language questions into SPARQL queries is implemented as Perl module available at http://search.cpan.org/ thhamon/RDF-NLP-SPARQLQuery.

  4. Evolutionary Multiobjective Query Workload Optimization of Cloud Data Warehouses

    Science.gov (United States)

    Dokeroglu, Tansel; Sert, Seyyit Alper; Cinar, Muhammet Serkan

    2014-01-01

    With the advent of Cloud databases, query optimizers need to find paretooptimal solutions in terms of response time and monetary cost. Our novel approach minimizes both objectives by deploying alternative virtual resources and query plans making use of the virtual resource elasticity of the Cloud. We propose an exact multiobjective branch-and-bound and a robust multiobjective genetic algorithm for the optimization of distributed data warehouse query workloads on the Cloud. In order to investigate the effectiveness of our approach, we incorporate the devised algorithms into a prototype system. Finally, through several experiments that we have conducted with different workloads and virtual resource configurations, we conclude remarkable findings of alternative deployments as well as the advantages and disadvantages of the multiobjective algorithms we propose. PMID:24892048

  5. Investigation in Query System Framework for High Energy Physics

    CERN Document Server

    Jatuphattharachat, Thanat

    2017-01-01

    We summarize an investigation in query system framework for HEP (High Energy Physics). Our work was an investigation on distributed server part of Femtocode, which is a query language that provides the ability for physicists to make plots and other aggregations in real-time. To make the system more robust and capable of processing large amount of data quickly, it is necessary to deploy the system on a redundant and distributed computing cluster. This project aims to investigate third party coordination and resource management frameworks which fit into the design of real-time distributed query system. Zookeeper, Mesos and Marathon are the main frameworks for this investigation. The results indicate that Zookeeper is good for job coordinator and job tracking as it provides robust, fast, simple and transparent read and write process for all connecting client across distributed Zookeeper server. Furthermore, it also supports high availability access and consistency guarantee within specific time bound.

  6. Identifying mismatches between institutional perceptions of water-related risk drivers and water management strategies in three river basin areas

    Science.gov (United States)

    Räsänen, Aleksi; Juhola, Sirkku; Monge Monge, Adrián; Käkönen, Mira; Kanninen, Markku; Nygren, Anja

    2017-07-01

    Water-related risks and vulnerabilities are driven by variety of stressors, including climate and land use change, as well as changes in socio-economic positions and political landscapes. Hence, water governance, which addresses risks and vulnerabilities, should target multiple stressors. We analyze the institutional perceptions of the drivers and strategies for managing water-related risks and vulnerabilities in three regionally important river basin areas located in Finland, Mexico, and Laos. Our analysis is based on data gathered through participatory workshops and complemented by qualitative content analysis of relevant policy documents. The identified drivers and proposed risk reduction strategies showed the multidimensionality and context-specificity of water-related risks and vulnerabilities across study areas. Most of the identified drivers were seen to increase risks, but some of the drivers were positive trends, and drivers also included also policy instruments that can both increase or decrease risks. Nevertheless, all perceived drivers were not addressed with suggested risk reduction strategies. In particular, most of the risk reduction strategies were incremental adjustments, although many of the drivers classified as most important were large-scale trends, such as climate change, land use changes and increase in foreign investments. We argue that there is a scale mismatch between the identified drivers and suggested strategies, which questions the opportunity to manage the drivers by single-scale incremental adjustments. Our study suggests that for more sustainable risk and vulnerability reduction, the root causes of water-related risks and vulnerabilities should be addressed through adaptive multi-scale governance that carefully considers the context-specificity and the multidimensionality of the associated drivers and stressors.

  7. RDF-GL : a SPARQL-based graphical query language for RDF

    NARCIS (Netherlands)

    Hogenboom, F.P.; Milea, D.V.; Frasincar, F.; Kaymak, U.; Chbeir, R.; Badr, Y.; Abraham, A.; Hassanien, A.-E.

    2010-01-01

    This chapter presents RDF-GL, a graphical query language (GQL) for RDF. The GQL is based on the textual query language SPARQL and mainly focuses on SPARQL SELECT queries. The advantage of a GQL over textual query languages is that complexity is hidden through the use of graphical symbols. RDF-GL is

  8. Evaluating XML-Extended OLAP Queries Based on a Physical Algebra

    DEFF Research Database (Denmark)

    Yin, Xuepeng; Pedersen, Torben Bach

    2006-01-01

    . In this paper, we extend previous work on the logical federation of OLAP and XML data sources by presenting a simplified query semantics, a physical query algebra and a robust OLAP-XML query engine as well as the query evaluation techniques. Performance experiments with a prototypical implementation suggest...

  9. Query Expansion: Is It Necessary In Textual Case-Based Reasoning ...

    African Journals Online (AJOL)

    Query expansion (QE) is the process of transforming a seed query to improve retrieval performance in information retrieval operations. It is often intended to overcome a vocabulary mismatch between the query and the document collection. Query expansion is known to improve retrieval effectiveness of some information ...

  10. Energy-aware SQL query acceleration through FPGA-based dynamic partial reconfiguration

    NARCIS (Netherlands)

    Becher, Andreas; Bauer, Florian; Ziener, Daniel; Teich, Jürgen

    2014-01-01

    In this paper, we propose an approach for energy-aware FPGA-based query acceleration for databases on embedded devices. After the analysis of an incoming query, a query-specific hardware accelerator is generated on-the-fly and loaded on the FPGA for subsequent query execution using partial dynamic

  11. Live Visual Relevance Feedback for Query Formulation

    NARCIS (Netherlands)

    Hoenkamp, E.C.M.; Dinther, G.J.W. van

    2005-01-01

    Users browsing the Internet seem relatively satisfied with the performance of search engines. An optimistic explanation would be the high quality of search engines. A more pessimistic one would be that people just adapt easily to any new technology. A third explanation is people's ignorance about

  12. A systems biology pipeline identifies new immune and disease related molecular signatures and networks in human cells during microgravity exposure.

    Science.gov (United States)

    Mukhopadhyay, Sayak; Saha, Rohini; Palanisamy, Anbarasi; Ghosh, Madhurima; Biswas, Anupriya; Roy, Saheli; Pal, Arijit; Sarkar, Kathakali; Bagh, Sangram

    2016-05-17

    Microgravity is a prominent health hazard for astronauts, yet we understand little about its effect at the molecular systems level. In this study, we have integrated a set of systems-biology tools and databases and have analysed more than 8000 molecular pathways on published global gene expression datasets of human cells in microgravity. Hundreds of new pathways have been identified with statistical confidence for each dataset and despite the difference in cell types and experiments, around 100 of the new pathways are appeared common across the datasets. They are related to reduced inflammation, autoimmunity, diabetes and asthma. We have identified downregulation of NfκB pathway via Notch1 signalling as new pathway for reduced immunity in microgravity. Induction of few cancer types including liver cancer and leukaemia and increased drug response to cancer in microgravity are also found. Increase in olfactory signal transduction is also identified. Genes, based on their expression pattern, are clustered and mathematically stable clusters are identified. The network mapping of genes within a cluster indicates the plausible functional connections in microgravity. This pipeline gives a new systems level picture of human cells under microgravity, generates testable hypothesis and may help estimating risk and developing medicine for space missions.

  13. A systems biology pipeline identifies new immune and disease related molecular signatures and networks in human cells during microgravity exposure

    Science.gov (United States)

    Mukhopadhyay, Sayak; Saha, Rohini; Palanisamy, Anbarasi; Ghosh, Madhurima; Biswas, Anupriya; Roy, Saheli; Pal, Arijit; Sarkar, Kathakali; Bagh, Sangram

    2016-05-01

    Microgravity is a prominent health hazard for astronauts, yet we understand little about its effect at the molecular systems level. In this study, we have integrated a set of systems-biology tools and databases and have analysed more than 8000 molecular pathways on published global gene expression datasets of human cells in microgravity. Hundreds of new pathways have been identified with statistical confidence for each dataset and despite the difference in cell types and experiments, around 100 of the new pathways are appeared common across the datasets. They are related to reduced inflammation, autoimmunity, diabetes and asthma. We have identified downregulation of NfκB pathway via Notch1 signalling as new pathway for reduced immunity in microgravity. Induction of few cancer types including liver cancer and leukaemia and increased drug response to cancer in microgravity are also found. Increase in olfactory signal transduction is also identified. Genes, based on their expression pattern, are clustered and mathematically stable clusters are identified. The network mapping of genes within a cluster indicates the plausible functional connections in microgravity. This pipeline gives a new systems level picture of human cells under microgravity, generates testable hypothesis and may help estimating risk and developing medicine for space missions.

  14. Improving overlay control through proper use of multilevel query APC

    Science.gov (United States)

    Conway, Timothy H.; Carlson, Alan; Crow, David A.

    2003-06-01

    Many state-of-the-art fabs are operating with increasingly diversified product mixes. For example, at Cypress Semiconductor, it is not unusual to be concurrently running multiple technologies and many devices within each technology. This diverse product mix significantly increases the difficulty of manually controlling overlay process corrections. As a result, automated run-to-run feedforward-feedback control has become a necessary and vital component of manufacturing. However, traditional run-to-run controllers rely on highly correlated historical events to forecast process corrections. For example, the historical process events typically are constrained to match the current event for exposure tool, device, process level and reticle ID. This narrowly defined process stream can result in insufficient data when applied to lowvolume or new-release devices. The run-to-run controller implemented at Cypress utilizes a multi-level query (Level-N) correlation algorithm, where each subsequent level widens the search criteria for available historical data. The paper discusses how best to widen the search criteria and how to determine and apply a known bias to account for tool-to-tool and device-to-device differences. Specific applications include offloading lots from one tool to another when the first tool is down for preventive maintenance, utilizing related devices to determine a default feedback vector for new-release devices, and applying bias values to account for known reticle-to-reticle differences. In this study, we will show how historical data can be leveraged from related devices or tools to overcome the limitations of narrow process streams. In particular, this paper discusses how effectively handling narrow process streams allows Cypress to offload lots from a baseline tool to an alternate tool.

  15. Tag cloud generation for results of multiple keywords queries

    DEFF Research Database (Denmark)

    Leginus, Martin; Dolog, Peter; Lage, Ricardo Gomes

    2013-01-01

    In this paper we study tag cloud generation for retrieved results of multiple keyword queries. It is motivated by many real world scenarios such as personalization tasks, surveillance systems and information retrieval tasks defined with multiple keywords. We adjust the state-of-the-art tag cloud...... generation techniques for multiple keywords query results. Consequently, we conduct the extensive evaluation on top of three distinct collaborative tagging systems. The graph-based methods perform significantly better for the Movielens and Bibsonomy datasets. Tag cloud generation based on maximal coverage...

  16. On a Fuzzy Algebra for Querying Graph Databases

    OpenAIRE

    Pivert , Olivier; Thion , Virginie; Jaudoin , Hélène; Smits , Grégory

    2014-01-01

    International audience; This paper proposes a notion of fuzzy graph database and describes a fuzzy query algebra that makes it possible to handle such database, which may be fuzzy or not, in a flexible way. The algebra, based on fuzzy set theory and the concept of a fuzzy graph, is composed of a set of operators that can be used to express preference queries on fuzzy graph databases. The preferences concern i) the content of the vertices of the graph and ii) the structure of the graph. In a s...

  17. Visually defining and querying consistent multi-granular clinical temporal abstractions.

    Science.gov (United States)

    Combi, Carlo; Oliboni, Barbara

    2012-02-01

    the component abstractions. Moreover, we propose a visual query language where different temporal abstractions can be composed to build complex queries: temporal abstractions are visually connected through the usual logical connectives AND, OR, and NOT. The proposed visual language allows one to simply define temporal abstractions by using intuitive metaphors, and to specify temporal intervals related to abstractions by using different temporal granularities. The physician can interact with the designed and implemented tool by point-and-click selections, and can visually compose queries involving several temporal abstractions. The evaluation of the proposed granularity-related metaphors consisted in two parts: (i) solving 30 interpretation exercises by choosing the correct interpretation of a given screenshot representing a possible scenario, and (ii) solving a complex exercise, by visually specifying through the interface a scenario described only in natural language. The exercises were done by 13 subjects. The percentage of correct answers to the interpretation exercises were slightly different with respect to the considered metaphors (54.4--striped wall, 73.3--plastered wall, 61--brick wall, and 61--no wall), but post hoc statistical analysis on means confirmed that differences were not statistically significant. The result of the user's satisfaction questionnaire related to the evaluation of the proposed granularity-related metaphors ratified that there are no preferences for one of them. The evaluation of the proposed logical notation consisted in two parts: (i) solving five interpretation exercises provided by a screenshot representing a possible scenario and by three different possible interpretations, of which only one was correct, and (ii) solving five exercises, by visually defining through the interface a scenario described only in natural language. Exercises had an increasing difficulty. The evaluation involved a total of 31 subjects. Results related to

  18. Evaluation of multiple approaches to identify genome-wide polymorphisms in closely related genotypes of sweet cherry (Prunus avium L.

    Directory of Open Access Journals (Sweden)

    Seanna Hewitt

    Full Text Available Identification of genetic polymorphisms and subsequent development of molecular markers is important for marker assisted breeding of superior cultivars of economically important species. Sweet cherry (Prunus avium L. is an economically important non-climacteric tree fruit crop in the Rosaceae family and has undergone a genetic bottleneck due to breeding, resulting in limited genetic diversity in the germplasm that is utilized for breeding new cultivars. Therefore, it is critical to recognize the best platforms for identifying genome-wide polymorphisms that can help identify, and consequently preserve, the diversity in a genetically constrained species. For the identification of polymorphisms in five closely related genotypes of sweet cherry, a gel-based approach (TRAP, reduced representation sequencing (TRAPseq, a 6k cherry SNParray, and whole genome sequencing (WGS approaches were evaluated in the identification of genome-wide polymorphisms in sweet cherry cultivars. All platforms facilitated detection of polymorphisms among the genotypes with variable efficiency. In assessing multiple SNP detection platforms, this study has demonstrated that a combination of appropriate approaches is necessary for efficient polymorphism identification, especially between closely related cultivars of a species. The information generated in this study provides a valuable resource for future genetic and genomic studies in sweet cherry, and the insights gained from the evaluation of multiple approaches can be utilized for other closely related species with limited genetic diversity in the breeding germplasm. Keywords: Polymorphisms, Prunus avium, Next-generation sequencing, Target region amplification polymorphism (TRAP, Genetic diversity, SNParray, Reduced representation sequencing, Whole genome sequencing (WGS

  19. Patterns of genomic variation in the poplar rust fungus Melampsora larici-populina identify pathogenesis-related factors

    Directory of Open Access Journals (Sweden)

    Antoine ePersoons

    2014-09-01

    Full Text Available Melampsora larici-populina is a fungal pathogen responsible for foliar rust disease on poplar trees, which causes damage to forest plantations worldwide, particularly in Northern Europe. The reference genome of the isolate 98AG31 was previously sequenced using a whole genome shotgun strategy, revealing a large genome of 101 megabases containing 16,399 predicted genes, which included secreted protein genes representing poplar rust candidate effectors. In the present study, the genomes of 15 isolates collected over the past 20 years throughout the French territory, representing distinct virulence profiles, were characterized by massively parallel sequencing to assess genetic variation in the poplar rust fungus. Comparison to the reference genome revealed striking structural variations. Analysis of coverage and sequencing depth identified large missing regions between isolates related to the mating type loci. More than 611,824 single-nucleotide polymorphism (SNP positions were uncovered overall, indicating a remarkable level of polymorphism. Based on the accumulation of non-synonymous substitutions in coding sequences and the relative frequencies of synonymous and non-synonymous polymorphisms (i.e. PN/PS, we identify candidate genes that may be involved in fungal pathogenesis. Correlation between non-synonymous SNPs in genes encoding secreted proteins and pathotypes of the studied isolates revealed candidate genes potentially related to virulences 1, 6 and 8 of the poplar rust fungus.

  20. Genome-Wide Temporal Expression Profiling in Caenorhabditis elegans Identifies a Core Gene Set Related to Long-Term Memory.

    Science.gov (United States)

    Freytag, Virginie; Probst, Sabine; Hadziselimovic, Nils; Boglari, Csaba; Hauser, Yannick; Peter, Fabian; Gabor Fenyves, Bank; Milnik, Annette; Demougin, Philippe; Vukojevic, Vanja; de Quervain, Dominique J-F; Papassotiropoulos, Andreas; Stetak, Attila

    2017-07-12

    The identification of genes related to encoding, storage, and retrieval of memories is a major interest in neuroscience. In the current study, we analyzed the temporal gene expression changes in a neuronal mRNA pool during an olfactory long-term associative memory (LTAM) in Caenorhabditis elegans hermaphrodites. Here, we identified a core set of 712 (538 upregulated and 174 downregulated) genes that follows three distinct temporal peaks demonstrating multiple gene regulation waves in LTAM. Compared with the previously published positive LTAM gene set (Lakhina et al., 2015), 50% of the identified upregulated genes here overlap with the previous dataset, possibly representing stimulus-independent memory-related genes. On the other hand, the remaining genes were not previously identified in positive associative memory and may specifically regulate aversive LTAM. Our results suggest a multistep gene activation process during the formation and retrieval of long-term memory and define general memory-implicated genes as well as conditioning-type-dependent gene sets. SIGNIFICANCE STATEMENT The identification of genes regulating different steps of memory is of major interest in neuroscience. Identification of common memory genes across different learning paradigms and the temporal activation of the genes are poorly studied. Here, we investigated the temporal aspects of Caenorhabditis elegans gene expression changes using aversive olfactory associative long-term memory (LTAM) and identified three major gene activation waves. Like in previous studies, aversive LTAM is also CREB dependent, and CREB activity is necessary immediately after training. Finally, we define a list of memory paradigm-independent core gene sets as well as conditioning-dependent genes. Copyright © 2017 the authors 0270-6474/17/376661-12$15.00/0.