Collaborative OLAP with Tag Clouds: Web 2.0 OLAP Formalism and Experimental Evaluation
Increasingly, business projects are ephemeral. New Business Intelligence tools must support ad-lib data sources and quick perusal. Meanwhile, tag clouds are a popular community-driven visualization technique. Hence, we investigate tag-cloud views with support for OLAP operations such as roll-ups, slices, dices, clustering, and drill-downs. As a case study, we implemented an application where users can upload data and immediately navigate through its ad hoc dimensions. To support social networking, views can be easily shared and embedded in other Web sites. Algorithmically, our tag-cloud views are approximate range top-k queries over spontaneous data cubes. We present experimental evidence that iceberg cuboids provide adequate online approximations. We benchmark several browser-oblivious tag-cloud layout optimizations.
2007-01-01
Web 2.0 OLAP: From Data Cubes to Tag Clouds
Increasingly, business projects are ephemeral. New Business Intelligence tools must support ad-lib data sources and quick perusal. Meanwhile, tag clouds are a popular community-driven visualization technique. Hence, we investigate tag-cloud views with support for OLAP operations such as roll-ups, slices, dices, clustering, and drill-downs. As a case study, we implemented an application where users can upload data and immediately navigate through its ad hoc dimensions. To support social networking, views can be easily shared and embedded in other Web sites. Algorithmically, our tag-cloud views are approximate range top-k queries over spontaneous data cubes. We present experimental evidence that iceberg cuboids provide adequate online approximations. We benchmark several browser-oblivious tag-cloud layout optimizations.
2009-01-01
Using data warehousing and OLAP in public health care.
UK PubMed Central (United Kingdom)
The paper describes the possibilities of using data warehousing and OLAP technologies in public health care in general and then our own experience with these technologies gained during the implementation...Full Text Available
2000-01-01
Simrank++: Query rewriting through link analysis of the click graph
We focus on the problem of query rewriting for sponsored search. We base rewrites on a historical click graph that records the ads that have been clicked on in response to past user queries. Given a query q, we first consider Simrank as a way to identify queries similar to q, i.e., queries whose ads a user may be interested in. We argue that Simrank fails to properly identify query similarities in our application, and we present two enhanced version of Simrank: one that exploits weights on click graph edges and another that exploits ``evidence.'' We experimentally evaluate our new schemes against Simrank, using actual click graphs and queries form Yahoo!, and using a variety of metrics. Our results show that the enhanced methods can yield more and better query rewrites.
2007-01-01
Assisting Consumer Health Information Retrieval with Query Recommendations
UK PubMed Central (United Kingdom)
Objective: Health information retrieval (HIR) on the Internet has become an important practice for millions of people, many of whom have problems forming effective queries. We have developed...Full Text Available
2006-01-01
IQARIS : a tool for the intelligent querying, analysis, and retrieval from information systems.
Energy Technology Data Exchange (ETDEWEB)
Information glut is one of the primary characteristics of the electronic age. Managing such large volumes of information (e.g., keeping track of the types, where they are, their relationships, who controls them, etc.) can be done efficiently with an intelligent, user-oriented information management system. The purpose of this paper is to describe a concept for managing information resources based on an intelligent information technology system developed by the Argonne National Laboratory for managing digital libraries. The Argonne system, Intelligent Query (IQ), enables users to query digital libraries and view the holdings that match the query from different perspectives.
2002-04-26
Privately Releasing Conjunctions and the Statistical Query Barrier
Suppose we would like to know all answers to a set of statistical queries C on a data set up to small error, but we can only access the data itself using statistical queries. A trivial solution is to exhaustively ask all queries in C. Can we do any better? + We show that the number of statistical queries necessary and sufficient for this task is---up to polynomial factors---equal to the agnostic learning complexity of C in Kearns' statistical query (SQ) model. This gives a complete answer to the question when running time is not a concern. + We then show that the problem can be solved efficiently (allowing arbitrary error on a small fraction of queries) whenever the answers to C can be described by a submodular function. This includes many natural concept classes, such as graph cuts and Boolean disjunctions and conjunctions. In doing so we also give a new ...
2010-01-01
On Metric Skyline Processing by PM-tree
The task of similarity search in multimedia databases is usually accomplished by range or k nearest neighbor queries. However, the expressing power of these "single-example" queries fails when the user's delicate query intent is not available as a single example. Recently, the well-known skyline operator was reused in metric similarity search as a "multi-example" query type. When applied on a multi-dimensional database (i.e., on a multi-attribute table), the traditional skyline operator selects all database objects that are not dominated by other objects. The metric skyline query adopts the skyline operator such that the multiple attributes are represented by distances (similarities) to multiple query examples. Hence, we can view the metric skyline as a set of representative database objects which are as similar to all the examples as possible and, ...
2009-01-01
Mining associations in text in the presence of background knowledge
Energy Technology Data Exchange (ETDEWEB)
This paper describes the FACT system for knowledge discovery from text. It discovers associations - patterns of co-occurrence - amongst keywords labeling the items in a collection of textual documents. In addition, FACT is able to use background knowledge about the keywords labeling the documents in its discovery process. FACT takes a query-centered view of knowledge discovery, in which a discovery request is viewed as a query over the implicit set of possible results supported by a collection of documents, and where background knowledge is used to specify constraints on the desired results of this query process. Execution of a knowledge-discovery query is structured so that these back-ground-knowledge constraints can be exploited in the search for possible results. Finally, rather than requiring a user to specify an explicit query expression in the knowledge-discovery ...
1996-12-31
Sexual information seeking on web search engines.
Sexual information seeking is an important element within human information behavior. Seeking sexually related information on the Internet takes many forms and channels, including chat rooms discussions, accessing Websites or searching Web search engines for sexual materials. The study of sexual Web queries provides insight into sexually-related information-seeking behavior, of value to Web users and providers alike. We qualitatively analyzed queries from logs of 1,025,910 Alta Vista and AlltheWeb.com Web user queries from 2001. We compared the differences in sexually-related Web searching between Alta Vista and AlltheWeb.com users. Differences were found in session duration, query outcomes, and search term choices. Implications of the findings for sexual information seeking are discussed. PMID:15006171
2004-02-01
Reformulation of Consumer Health Queries with Professional Terminology: A Pilot Study
UK PubMed Central (United Kingdom)
Background The Internet is becoming an increasingly important resource for health-information seekers. However, consumers often do not use effective search...Full Text Available
Monthly electricity usage at Australian Antarctic Stations - NASA
Record Search Query: Science Keywords>HUMAN DIMENSIONS>INFRASTRUCTURE> ELECTRICITY. Monthly electricity usage at Australian Antarctic Stations ...
Efficient Clustering with Limited Distance Information
Given a point set S and an unknown metric d on S, we study the problem of efficiently partitioning S into k clusters while querying few distances between the points. In our model we assume that we have access to one versus all queries that given a point s in S return the distances between s and all other points. We show that given a natural assumption about the structure of the instance, we can efficiently find an accurate clustering using only O(k) distance queries. We use our algorithm to cluster proteins by sequence similarity. This setting nicely fits our model because we can use a fast sequence database search program to query a sequence against an entire dataset. We conduct an empirical study that shows that even though we query a small fraction of the distances between the points, we produce clusterings that are close to a desired clustering given by manual classification.
2010-01-01
Index structures for structured documents
Energy Technology Data Exchange (ETDEWEB)
Much research has been carried out in order to manage structured documents such as SGML documents and to provide powerful query facilities which exploit document structures as well as document contents. In order to perform structure queries efficiently in a structured document management system, an index structure which supports fast document element access must be provided. However, there has been little research on the index structures for structured documents. In this paper, we propose various kinds of new inverted indexing schemes and signature file schemes for efficient structure query processing. We evaluate the storage requirements and disk access time of our schemes and present the analytical and experimental results.
1996-12-31
There Goes the Neighborhood: Relational Algebra for Spatial Data Search
We explored ways of doing spatial search within a relational database: (1) hierarchical triangular mesh (a tessellation of the sphere), (2) a zoned bucketing system, and (3) representing areas as disjunctive-normal form constraints. Each of these approaches has merits. They all allow efficient point-in-region queries. A relational representation for regions allows Boolean operations among them and allows quick tests for point-in-region, regions-containing-point, and region-overlap. The speed of these algorithms is much improved by a zone and multi-scale zone-pyramid scheme. The approach has the virtue that the zone mechanism works well on B-Trees native to all SQL systems and integrates naturally with current query optimizers - rather than requiring a new spatial access method and concomitant query optimizer extensions. Over the last 5 years, we have used these techniques extensively in our work on SkyServer.sdss.org, and ...
2004-01-01
Quantum query complexity of minor-closed graph properties
We study the quantum query complexity of minor-closed graph properties, which include such problems as determining whether a graph is planar, is a forest, or does not contain a path of a given length. We show that most minor-closed properties---those that cannot be characterized by a finite set of forbidden subgraphs---have quantum query complexity \\Theta(n^{3/2}). To establish this, we prove an adversary lower bound using a detailed analysis of the structure of minor-closed properties with respect to forbidden topological minors and forbidden subgraphs. On the other hand, we show that minor-closed properties (and more generally, sparse graph properties) that can be characterized by finitely many forbidden subgraphs can be solved strictly faster, in o(n^{3/2}) queries. Our algorithms are a novel application of the quantum walk search framework and give improved upper bounds for several subgraph-finding problems.
2010-01-01
Integrating Query of Relational and Textual Data in Clinical Databases: A Case Study
UK PubMed Central (United Kingdom)
Objectives: The authors designed and implemented a clinical data mart composed of an integrated information retrieval (IR) and relational database management system (RDBMS).Design:...Full Text Available
2003-01-01
Measuring e-CRM service quality in the library context: a preliminary study
British Library Electronic Table of Contents (United Kingdom)
Purpose - Customer relationship management (CRM) indicates a comprehensive strategy and an interactive process intended to achieve an optimum balance between corporate investment and the satisfaction of customer needs to generate the maximum profit. E-CRM refers to CRM using internet technology plus a database, OLAP, data warehouse, data mining, etc. In order to gain an understanding of the efficiency of implementing an e-CRM system within the library context, to develop theoretically and empirically an evaluation process for the e-CRM system and survey its impact on service quality, a pilot scheme was initiated in 2004. The pilot scheme was to design and implement an e-CRM prototype system for a particular academic library in Taiwan and to survey the system's performances. This paper aims...
2008-01-01
Multi-Resolution Modeling of Large Scale Scientific Simulation Data
Energy Technology Data Exchange (ETDEWEB)
Data produced by large scale scientific simulations, experiments, and observations can easily reach tera-bytes in size. The ability to examine data-sets of this magnitude, even in moderate detail, is problematic at best. Generally this scientific data consists of multivariate field quantities with complex inter-variable correlations and spatial-temporal structure. To provide scientists and engineers with the ability to explore and analyze such data sets we are using a twofold approach. First, we model the data with the objective of creating a compressed yet manageable representation. Second, with that compressed representation, we provide the user with the ability to query the resulting approximation to obtain approximate yet sufficient answers; a process called adhoc querying. This paper is concerned with a wavelet modeling technique that seeks to capture the important physical characteristics of the target scientific data. Our approach is ...
2002-02-25
From a set of parts to an indivisible whole. Part II: Operations in an open comparative mode
This paper describes a new method, HGV2C, for pattern analysis. The HGV2C method involves the construction of a computer ego (CE) based on an individual object that can be either a part of the system under analysis or a newly created object based on a certain hypothesis. The CE provides a capability to analyze data from a specific standpoint, e.g. from a viewpoint of a certain object. The CE is constructed from two identical copies of a query object, and its functioning mechanism involves: a hypothesis-parameter (HP) and infothyristor (IT). HP is a parameter that is introduced into an existing set of parameters. The HP value for one of the clones of a query object is set to equal 1, whereas for another clone it is greater than 1. The IT is based on the previously described algorithm of iterative averaging and performs three functions: 1) computation of a similarity matrix for the group of three objects including two clones of a ...
2008-01-01
A New Email Retrieval Ranking Approach
Email Retrieval task has recently taken much attention to help the user retrieve the email(s) related to the submitted query. Up to our knowledge, existing email retrieval ranking approaches sort the retrieved emails based on some heuristic rules, which are either search clues or some predefined user criteria rooted in email fields. Unfortunately, the user usually does not know the effective rule that acquires best ranking related to his query. This paper presents a new email retrieval ranking approach to tackle this problem. It ranks the retrieved emails based on a scoring function that depends on crucial email fields, namely subject, content, and sender. The paper also proposes an architecture to allow every user in a network/group of users to be able, if permissible, to know the most important network senders who are interested in his submitted query words. The experimental evaluation on Enron corpus prove that our ...
2010-01-01
SASE: Complex Event Processing over Streams
RFID technology is gaining adoption on an increasing scale for tracking and monitoring purposes. Wide deployments of RFID devices will soon generate an unprecedented volume of data. Emerging applications require the RFID data to be filtered and correlated for complex pattern detection and transformed to events that provide meaningful, actionable information to end applications. In this work, we design and develop SASE, a com-plex event processing system that performs such data-information transformation over real-time streams. We design a complex event language for specifying application logic for such transformation, devise new query processing techniques to effi-ciently implement the language, and develop a comprehensive system that collects, cleans, and processes RFID data for deliv-ery of relevant, timely information as well as storing necessary data for future querying. We demonstrate an initial prototype of SASE through a real-world ...
2006-01-01
Fluxnet Synthesis Dataset Collaboration Infrastructure
Energy Technology Data Exchange (ETDEWEB)
The Fluxnet synthesis dataset originally compiled for the La Thuile workshop contained approximately 600 site years. Since the workshop, several additional site years have been added and the dataset now contains over 920 site years from over 240 sites. A data refresh update is expected to increase those numbers in the next few months. The ancillary data describing the sites continues to evolve as well. There are on the order of 120 site contacts and 60proposals have been approved to use thedata. These proposals involve around 120 researchers. The size and complexity of the dataset and collaboration has led to a new approach to providing access to the data and collaboration support and the support team attended the workshop and worked closely with the attendees and the Fluxnet project office to define the requirements for the support infrastructure. As a result of this effort, a new website (http://www.fluxdata.org) has been created to provide access to the Fluxnet synthesis dataset. ...
2008-02-06
A preliminary database has been developed that will allow mineralogy and bulk-rock geochemical information to be managed under configuration control and facilitate electronic querying. The database is currently developed in Microsoft Access as a collection of tables, views, and entry forms. Each field and table has been described in a data dictionary.
2008-06-30
New Polynomial Classes for Logic-Based Abduction
We address the problem of propositional logic-based abduction, i.e., the problem of searching for a best explanation for a given propositional observation according to a given propositional knowledge base. We give a general algorithm, based on the notion of projection; then we study restrictions over the representations of the knowledge base and of the query, and find new polynomial classes of abduction problems.
2011-01-01
ADS as Information Management Service in an M-Learning Environment
Leveraging the potential power of even small handheld devices able to communicate wirelessly requires dedicated support. In particular, collaborative applications need sophisticated assistance in terms of querying and exchanging different kinds of data. Using a concrete example from the domain of mobile learning, the general need for information dissemination is motivated. Subsequently, and driven by infrastructural conditions, realization strategies of an appropriate middleware service are discussed.
2007-01-01
Tea catechins prevent contractile dysfunction in unloaded murine soleus muscle: A pilot study
British Library Electronic Table of Contents (United Kingdom)
ObjectiveExtended periods of muscle disuse, physical inactivity, immobilization, and bedrest result in a loss of muscle mass and a decrease in muscle force, which are accompanied by an increase in oxidative stress. We investigated the effects of the intake of green tea catechins on unloading-induced muscle dysfunction in tail-suspended mice. MethodsTen-week-old male BALB/c mice were fed a purified control diet or a diet containing 0.5% tea catechins for 14 d. Thereafter, the mice were subjected to continuous tail suspension for 10 d. On the final day, muscle mass, contractile force production, antioxidant potential, and carbonylated protein levels were evaluated. ResultsHind limb unloading caused a loss of soleus muscle weight and muscle force. Intake of tea catechins significantly inhibit...
2011-01-01
Skip-webs: Efficient distributed data structures for multi-dimensional data sets
DEFF Research Database (Denmark)
We present a framework for designing efficient distributed data structures for multi-dimensional data. Our structures, which we call skip-webs, extend and improve previous randomized distributed data structures, including skipnets and skip graphs. Our framework applies to a general class of data querying scenarios, which include linear (one-dimensional) data, such as sorted sets, as well as multi-dimensional data, such as d-dimensional octrees and digital tries of character strings defined over a fixed alphabet. We show how to perform a query over such a set of n items spread among n hosts using O(log n / log log n) messages for one-dimensional data, or O(log n) messages for fixed-dimensional data, while using only O(log n) space per host. We also show how to make such structures dynamic so as to allow for insertions and deletions in O(log n) messages for quadtrees, octrees, and digital tries, and O(log n / log log n) messages for ...
2005-01-01
Dynamic User-Defined Similarity Searching in Semi-Structured Text Retrieval
Modern text retrieval systems often provide a similarity search utility, that allows the user to find efficiently a fixed number k of documents in the data set that are most similar to a given query (here a query is either a simple sequence of keywords or the identifier of a full document found in previous searches that is considered of interest). We consider the case of a textual database made of semi-structured documents. Each field, in turns, is modelled with a specific vector space. The problem is more complex when we also allow each such vector space to have an associated user-defined dynamic weight that influences its contribution to the overall dynamic aggregated and weighted similarity. This dynamic problem has been tackled in a recent paper by Singitham et al. in in VLDB 2004. Their proposed solution, which we take as baseline, is a variant of the cluster-pruning technique that has the potential for scaling to very large corpora of ...
2007-01-01
Database management system for instrument data management
Energy Technology Data Exchange (ETDEWEB)
Data from many measuring devices throughout the Savannah River Site (SRS) is transmitted to a central location for processing as a vital component in the SRS emergency preparedness and response program. The data processing is currently accomplished using VAX-based FORTRAN programs with the data stored in Digital's Record Management System (RMS) files which is shared using global COMMON. A program is underway to store and process this data using a Structured Query Language (SQL)-based Database Management System (DBMS). The advantages of replacing the current system with one using an SQL-based DBMS are discussed.
1990-01-01
Verification of knowledge bases based on containment checking
Energy Technology Data Exchange (ETDEWEB)
Building complex knowledge based applications requires encoding large amounts of domain knowledge. After acquiring knowledge from domain experts, much of the effort in building a knowledge base goes into verifying that the knowledge is encoded correctly. We consider the problem of verifying hybrid knowledge bases that contain both Horn rules and a terminology in a description logic. Our approach to the verification problem is based on showing a close relationship to the problem of query containment. Our first contribution, based on this relationship, is presenting a thorough analysis of the decidability and complexity of the verification problem, for knowledge bases containing recursive rules and the interpreted predicates =, {le}, < and {ne}. Second, we show that important new classes of constraints on correct inputs and outputs can be expressed in a hybrid setting, in which a description logic class hierarchy is also considered, and we present the first ...
1996-12-31
Trends in adrenalectomy: a recent national review
British Library Electronic Table of Contents (United Kingdom)
Background Adrenalectomy remains the definitive therapy for most adrenal neoplasms. Introduced in the 1990s, laparoscopic adrenalectomy is reported to have lower associated morbidity and mortality. This study aimed to evaluate national adrenalectomy trends, including major postoperative complications and perioperative mortality. Methods The Nationwide Inpatient Sample was queried to identify all adrenalectomies performed during 1998?2006. Univariate and multivariate logistic regression were performed, with adjustments for patient age, sex, comorbidities, indication, year of surgery, laparoscopy, hospital teaching status, and hospital volume. Annual incidence, major in-hospital postoperative complications, and in-hospital mortality were evaluated. Results Using weighted national estimate, 4...
2010-01-01
The LSST Data Mining Research Agenda
We describe features of the LSST science database that are amenable to scientific data mining, object classification, outlier identification, anomaly detection, image quality assurance, and survey science validation. The data mining research agenda includes: scalability (at petabytes scales) of existing machine learning and data mining algorithms; development of grid-enabled parallel data mining algorithms; designing a robust system for brokering classifications from the LSST event pipeline (which may produce 10,000 or more event alerts per night); multi-resolution methods for exploration of petascale databases; indexing of multi-attribute multi-dimensional astronomical databases (beyond spatial indexing) for rapid querying of petabyte databases; and more.
2008-01-01
Scientific Data Management in the Coming Decade
This is a thought piece on data-intensive science requirements for databases and science centers. It argues that peta-scale datasets will be housed by science centers that provide substantial storage and processing for scientists who access the data via smart notebooks. Next-generation science instruments and simulations will generate these peta-scale datasets. The need to publish and share data and the need for generic analysis and visualization tools will finally create a convergence on common metadata standards. Database systems will be judged by their support of these metadata standards and by their ability to manage and access peta-scale datasets. The procedural stream-of-bytes-file-centric approach to data analysis is both too cumbersome and too serial for such large datasets. Non-procedural query and analysis of schematized self-describing data is both easier to use and allows much more parallelism.
2005-01-01
British Library Electronic Table of Contents (United Kingdom)
Many existing agents for diabetes therapy are unable to restore or maintain normal glucose homeostasis or prevent the eventual emergence of hyperglycemia-related complication. Therefore, agents based on novel mechanisms are sought to complement and extend the current therapeutic approaches. Based on the initial paper research, we focused on active STAT3 as an attractive pharmacological target for type 2 diabetes. The subsequent text mining with a unique query to identify suppressors but not activators of STAT3 revealed the ERK2/STAT3 pathway as a novel diabetes target. The description of ERK2 inhibitors as diabetes target had not been found in our text mining research at present. The mechanism-based peptide inhibitor for ERK2 was identified using the knowledge of the KIM sequence, which ha...
2011-01-01
Experimental Comparison of Representation Methods and Distance Measures for Time Series Data
The previous decade has brought a remarkable increase of the interest in applications that deal with querying and mining of time series data. Many of the research efforts in this context have focused on introducing new representation methods for dimensionality reduction or novel similarity measures for the underlying data. In the vast majority of cases, each individual work introducing a particular method has made specific claims and, aside from the occasional theoretical justifications, provided quantitative experimental observations. However, for the most part, the comparative aspects of these experiments were too narrowly focused on demonstrating the benefits of the proposed methods over some of the previously introduced ones. In order to provide a comprehensive validation, we conducted an extensive experimental study re-implementing eight different time series representations and nine similarity measures and their variants, and testing their effectiveness on ...
2010-01-01
An analytical framework for data stream mining techniques based on challenges and requirements
A growing number of applications that generate massive streams of data need intelligent data processing and online analysis. Real-time surveillance systems, telecommunication systems, sensor networks and other dynamic environments are such examples. The imminent need for turning such data into useful information and knowledge augments the development of systems, algorithms and frameworks that address streaming challenges. The storage, querying and mining of such data sets are highly computationally challenging tasks. Mining data streams is concerned with extracting knowledge structures represented in models and patterns in non stopping streams of information. Generally, two main challenges are designing fast mining methods for data streams and need to promptly detect changing concepts and data distribution because of highly dynamic nature of data streams. The goal of this article is to analyze and classify the application of diverse data mining techniques in ...
2011-01-01
An Effective Method of Image Retrieval using Image Mining Techniques
The present research scholars are having keen interest in doing their research activities in the area of Data mining all over the world. Especially, [13]Mining Image data is the one of the essential features in this present scenario since image data plays vital role in every aspect of the system such as business for marketing, hospital for surgery, engineering for construction, Web for publication and so on. The other area in the Image mining system is the Content-Based Image Retrieval (CBIR) which performs retrieval based on the similarity defined in terms of extracted features with more objectiveness. The drawback in CBIR is the features of the query image alone are considered. Hence, a new technique called Image retrieval based on optimum clusters is proposed for improving user interaction with image retrieval systems by fully exploiting the similarity information. The index is created by describing the images according to their color characteristics, with ...
2010-01-01
A CONCEPT FOR NATIONAL NUCLEAR FORENSIC LIBRARIES
International Nuclear Information System (INIS)
The interpretation of data from the nuclear forensic analysis of illicit nuclear material of unknown origin requires comparative data from samples of known origin. One way to provide such comparative data is to create a system of national nuclear forensics libraries, in which each participating country stores information about nuclear or other radioactive material that either resides in or was manufactured by that country. Such national libraries could provide an authoritative record of the material located in or produced by a particular country, and thus forms an essential prerequisite for a government to investigate illicit uses of nuclear or other radioactive material within its borders. We describe the concept of the national nuclear forensic library, recommendations for content and structure, and suggested querying methods for utilizing the information for addressing nuclear smuggling.
2010-07-11
Comparing compressed sequences for faster nucleotide BLAST searches.
Molecular biologists, geneticists, and other life scientists use the BLAST homology search package as their first step for discovery of information about unknown or poorly annotated genomic sequences. There are two main variants of BLAST: BLASTP for searching protein collections and BLASTN for nucleotide collections. Surprisingly, BLASTN has had very little attention; for example, the algorithms it uses do not follow those described in the 1997 BLAST paper and no exact description has been published. It is important that BLASTN is state-of-the-art: Nucleotide collections such as GenBank dwarf the protein collections in size, they double in size almost yearly, and they take many minutes to search on modern general purpose workstations. This paper proposes significant improvements to the BLASTN algorithms. Each of our schemes is based on compressed bytepacked formats that allow queries and collection sequences to be compared four bases at a time, permitting very fast ...
Towards the end of the 19th century, Kelvin pronounced as the "clouds of physics" 1) the failure of the Michelson-Morely experiment to detect an ether wind, 2) the violation of the classical mechanical equipartition theorem in statistical thermodynamics. And he believed that the removal of these clouds would bring physics to an end. But as we know, the removal of these clouds led to the two great breakthoughts of modern physics: 1) The theory of relativity, and 2) to quantum mechanics. Towards the end of the 20th century more clouds of physics became apparent. They are 1) the riddle of quantum gravity, 2) the superluminal quantum correlations, 3) the small cosmological constant. Furthermore, there is the riddle of dark energy making up 70% of the physical universe, the non-baryonic cold dark matter making up 26% and the very small initial entropy of the universe. An attempt is made to explain the importance of these clouds for the future of physics. Conjectures for a possible solution ...
2008-01-01
In this work, we query the Chlamydomonas reinhardtii copper regulon at a whole-genome level. Our RNA-Seq data simulation and analysis pipeline validated a 2-fold cutoff and 10 RPKM (reads per kilobase of mappable length per million mapped reads) (~1 mRNA per cell) to reveal 63 CRR1 targets plus another 86 copper-responsive genes. Proteomic and immunoblot analyses captured 25% of the corresponding proteins, whose abundance was also dependent on copper nutrition, validating transcriptional regulation as a major control mechanism for copper signaling in Chlamydomonas. The impact of copper deficiency on the expression of several O2-dependent enzymes included steps in lipid modification pathways. Quantitative lipid profiles indicated increased polyunsaturation of fatty acids on thylakoid membrane digalactosyldiglycerides, indicating a global impact of copper deficiency on the photosynthetic apparatus. Discovery of a putative plastid copper chaperone and a membrane ...
2011-04-01
Hawaii technology utilization experiment
A one-year technology-transfer project involving ERDA installations and Hawaii consisted of sending teams from the Lawrence Livermore Laboratory on week-long field trips every two months to test the effectiveness of different methods of transferring technology information from federal sources to civilian clients. The team was questioned primarily on non-energy matters, and the energy questions asked related mostly to individuals or small industries. The team responed to all questions and found that a wide range of knowledge was more effective than having a sequence of experts. Hawaiians considered current major ERDA projects to be irrelevant to their needs. The team was most successful on a one-to-one basis because large groups and state agencies tend to be more policy- than action-oriented. Personal followup was considered essential. The team also learned that their visits generated ten times as many inquiries as were received unsolicited by the laboratory. Most inquiries involved ...
1976-12-08
Decision tree modeling with relational views
Data mining is a useful decision support technique that can be used to discover production rules in warehouses or corporate data. Data mining research has made much effort to apply various mining algorithms efficiently on large databases. However, a serious problem in their practical application is the long processing time of such algorithms. Nowadays, one of the key challenges is to integrate data mining methods within the framework of traditional database systems. Indeed, such implementations can take advantage of the efficiency provided by SQL engines. In this paper, we propose an integrating approach for decision trees within a classical database system. In other words, we try to discover knowledge from relational databases, in the form of production rules, via a procedure embedding SQL queries. The obtained decision tree is defined by successive, related relational views. Each view corresponds to a given population in the underlying decision tree. We selected ...
2002-01-01
Data Stream Clustering: Challenges and Issues
Very large databases are required to store massive amounts of data that are continuously inserted and queried. Analyzing huge data sets and extracting valuable pattern in many applications are interesting for researchers. We can identify two main groups of techniques for huge data bases mining. One group refers to streaming data and applies mining techniques whereas second group attempts to solve this problem directly with efficient algorithms. Recently many researchers have focused on data stream as an efficient strategy against huge data base mining instead of mining on entire data base. The main problem in data stream mining means evolving data is more difficult to detect in this techniques therefore unsupervised methods should be applied. However, clustering techniques can lead us to discover hidden information. In this survey, we try to clarify: first, the different problem definitions related to data stream clustering in general; second, the specific ...
2010-01-01
Energy Technology Data Exchange (ETDEWEB)
Ontario's Countdown Acid Rain program was formulated in 1985 and placed an annual SO{sub 2} emissions cap of 885 kilotonnes over the province from 1994. Caps were placed on emissions from the Inco nickel/copper smelter in Sudbury, the Falconbridge nickel/copper smelter in Sudbury, the Algoma iron ore sintering plant in Wawa, and Ontario Hydro fossil-fuel plants province-wide. Semi-annual reports are required of the companies, with the 6th report outlining the final planning phase and the 7th and later reports outlining implementation progress. This document summarizes the contents of the 7th set of reports and the responses of the government review team to each company's efforts. All companies have begun implementing abatement plans outlined in the final planning phase reports of December 1988 at a rapid pace and are generally on track in controlling emissions. However, all 4 company reports lack the necessary detail for a complete review of the implementation plan in ...
1990-01-01
Computational AstroStatistics Fast and Efficient Tools for Analysing Huge Astronomical Data Sources
I present here a review of past and present multi-disciplinary research of the Pittsburgh Computational AstroStatistics (PiCA) group. This group is dedicated to developing fast and efficient statistical algorithms for analysing huge astronomical data sources. I begin with a short review of multi-resolutional kd-trees which are the building blocks for many of our algorithms. For example, quick range queries and fast n-point correlation functions. I will present new results from the use of Mixture Models (Connolly et al. 2000) in density estimation of multi-color data from the Sloan Digital Sky Survey (SDSS). Specifically, the selection of quasars and the automated identification of X-ray sources. I will also present a brief overview of the False Discovery Rate (FDR) procedure (Miller et al. 2001a) and show how it has been used in the detection of ``Baryon Wiggles'' in the local galaxy power spectrum and source identification in radio data. Finally, I will look ...
2001-01-01
Compressed Neighbor Discovery for Wireless Networks
This paper studies neighbor discovery problem in wireless networks. A novel scheme, called compressed neighbor discovery is proposed, which assigns each node a unique signature and let nodes simultaneously transmit their signatures during the discovery period. The query node then determines, based on the superposition of the signatures, a small number of nodes as its neighbors, out of a large number of nodes in the network. This is fundamentally a sparse recovery problem. Using the proposed scheme, a single frame time suffices to achieve reliable discovery for large networks. This is in contrast to conventional schemes, where each node repeatedly transmits its identity with random delay, so that a receiver can identify each neighbor at least once without collision. Two practical, low-complexity discovery schemes are studied. The first scheme assigns sparse pseudo-random on-off signatures to the nodes, so that each node can listen to the channel during its own ...
2010-01-01
We describe a closed-loop brain-computer interface that re-ranks an image database by iterating between user generated 'interest' scores and computer vision generated visual similarity measures. The interest scores are based on decoding the electroencephalographic (EEG) correlates of target detection, attentional shifts and self-monitoring processes, which result from the user paying attention to target images interspersed in rapid serial visual presentation (RSVP) sequences. The highest scored images are passed to a semi-supervised computer vision system that reorganizes the image database accordingly, using a graph-based representation that captures visual similarity between images. The system can either query the user for more information, by adaptively resampling the database to create additional RSVP sequences, or it can converge to a 'done' state. The done state includes a final ranking of the image database and also a 'guess' of the user's chosen category of ...
2011-05-12
A Robust and Efficient Trust Management Scheme for Peer-to-Peer Networks
Studies on the large scale peer-to-peer (P2P) network like Gnutella have shown the presence of large number of free riders. Moreover, the open and decentralized nature of P2P network is exploited by malicious users who distribute unauthentic or harmful contents. Despite the existence of a number of trust management schemes in the literature for combating against free riding and distribution of malicious files, these mechanisms are not scalable due to their high computational, communication and storage overhead. These schemes also do not consider effect of trust management on quality-of-service (QoS) of the search. This paper presents a trust management scheme for P2P networks that minimizes distribution of spurious files by a novel technique called topology adaptation. It also reduces search time since most of the queries are resolved within the community of trustworthy peers. Simulation results indicate that the trust management overhead due to the pr oposed ...
2010-01-01
Diverter/bop system and method for a bottom supported offshore drilling rig
Energy Technology Data Exchange (ETDEWEB)
A system is described adapted for alternative use as a diverter or a blowout preventer for a bottom supported drilling rig and adapted for connection to a permanent housing attached to rig structural members beneath a drilling rig rotary table, the permanent housing having an outlet connectable to a rig fluid system flow line. The system consists of: a fluid flow controller having a controller housing with a lower cylindrical opening and an upper cylindrical opening and a vertical path therebetween and a first outlet passage and a second outlet passage provided in its wall, a packing element disposed within the controller housing, and annular piston means adapted for moving from a first position to a second position, whereby in the first position the piston means wall prevents interior fluid from communicating with the outlet passages in the controller housing wall and in the second position the piston means wall allows fluid communication of interior fluid with the outlet passages and ...
1986-07-01
Energy Technology Data Exchange (ETDEWEB)
The objective of the research is first to build a highly parallel processing system using 100 personal computers and an ATM switch. The former is a commodity for computer, while the latter can be regarded as a commodity for future communication systems. Second is to implement parallel relational database management system and parallel data mining system over the 100-PC cluster system. Third is to run decision-support queries typicalto data warehouses, to run association rule mining, and to prove the effectiveness of the proposed architecture as a next generation parallel database/datamining server. Performance/cost ratio of PC is significantly improved compared with workstations and proprietry systems due to its mass production. The cost of ATM switch is also considerably decreasing since ATM is being widely accepted as a communication-on infrastructure. By combining 100 PCs as computing commodities and ATM switch as a communication commodity, we built large sca-le ...
1997-03-01
Distributed Data Integration Infrastructure
Energy Technology Data Exchange (ETDEWEB)
The Internet is becoming the preferred method for disseminating scientific data from a variety of disciplines. This can result in information overload on the part of the scientists, who are unable to query all of the relevant sources, even if they knew where to find them, what they contained, how to interact with them, and how to interpret the results. A related issue is keeping up with current trends in information technology often taxes the end-user's expertise and time. Thus instead of benefiting from this information rich environment, scientists become experts on a small number of sources and technologies, use them almost exclusively, and develop a resistance to innovations that can enhance their productivity. Enabling information based scientific advances, in domains such as functional genomics, requires fully utilizing all available information and the latest technologies. In order to address this problem we are developing a end-user centric, ...
2003-02-24
Data interpretation in nuclear forensics
International Nuclear Information System (INIS)
Nuclear Forensics is a key element in the response process which is initiated after detection of illicit nuclear or other radioactive material. Credible nuclear forensics relies on appropriate sampling procedures, on validated analytical methods and on thorough data analysis and interpretation. Nuclear forensics aims at providing clues on the history and the potential origin of the material. Elemental and isotopic composition of the material, as well as its macroscopic and microscopic appearance reflect the technological processes used for the fabrication of the material. The nuclear forensic analysis first of all results in measurement data. Through appropriate processing of these data information on the nature and the history of the material can be obtained. A number of data evaluation techniques serving this purpose are conceivable and have been applied. On the one side, statistical methods like principal component analysis (PCA) or classification and regression trees (CART) can be ...
Biogenic iron oxyhydroxide formation at mid-ocean ridge hydrothermal vents: Juan de Fuca Ridge
Energy Technology Data Exchange (ETDEWEB)
Here we examine Fe speciation within Fe-encrusted biofilms formed during 2-month seafloor incubations of sulfide mineral assemblages at the Main Endeavor Segment of the Juan de Fuca Ridge. The biofilms were distributed heterogeneously across the surface of the incubated sulfide and composed primarily of particles with a twisted stalk morphology resembling those produced by some aerobic Fe-oxidizing microorganisms. Our objectives were to determine the form of biofilm-associated Fe, and identify the sulfide minerals associated with microbial growth. We used micro-focused synchrotron-radiation X-ray fluorescence mapping (mu XRF), X-ray absorption spectroscopy (mu EXAFS), and X-ray diffraction (mu XRD) in conjunction with focused ion beam (FIB) sectioning, and highresolution transmission electron microscopy (HRTEM). The chemical and mineralogical composition of an Fe-encrusted biofilm was queried at different spatial scales, and the spatial relationship between primary ...
2008-05-22
DoseWatchers - A computer based X ray dose monitoring project in paediatric radiology
International Nuclear Information System (INIS)
Full text: Introduction. Children, especially premature infants and neonates, are at a much higher risk to obtain an X ray induced disturbance of life - particularly cancer. On the one hand this is due to their longer life expectancy and on the other hand it is due to their higher cell proliferation rate. The paediatric radiology unit of the Inselspital Bern recently installed some of the most advanced X ray equipment nowadays available. It is based on the two latest digital technologies: double read computed radiography (CR) and direct digital radiography (DR). Only the implementation of these digital radiography systems permits the digital acquisition and additionally the analysis of acquired data. The systematic analysis of large amounts of biometric data and exposition data is the basis for further dose reduction and a systematic quality control (QC). Patients are increasingly critical concerning radiation exposure - especially parents regarding their children. Besides an ...
2006-11-13
Website Policies and Important Links Comments
WorldWideScience.org is maintained by the U.S. Department of Energy's
Office of Scientific and Technical Information as the Operating Agent
for the WorldWideScience Alliance.
