WorldWideScience

Sample records for accurate similarity search

  1. Protein structural similarity search by Ramachandran codes

    Directory of Open Access Journals (Sweden)

    Chang Chih-Hung

    2007-08-01

    Full Text Available Abstract Background Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases. Results We propose a new linear encoding method, SARST (Structural similarity search Aided by Ramachandran Sequential Transformation. SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms. Conclusion As a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era.

  2. Semantically enabled image similarity search

    Science.gov (United States)

    Casterline, May V.; Emerick, Timothy; Sadeghi, Kolia; Gosse, C. A.; Bartlett, Brent; Casey, Jason

    2015-05-01

    Georeferenced data of various modalities are increasingly available for intelligence and commercial use, however effectively exploiting these sources demands a unified data space capable of capturing the unique contribution of each input. This work presents a suite of software tools for representing geospatial vector data and overhead imagery in a shared high-dimension vector or embedding" space that supports fused learning and similarity search across dissimilar modalities. While the approach is suitable for fusing arbitrary input types, including free text, the present work exploits the obvious but computationally difficult relationship between GIS and overhead imagery. GIS is comprised of temporally-smoothed but information-limited content of a GIS, while overhead imagery provides an information-rich but temporally-limited perspective. This processing framework includes some important extensions of concepts in literature but, more critically, presents a means to accomplish them as a unified framework at scale on commodity cloud architectures.

  3. Similarity search processing. Paralelization and indexing technologies.

    Directory of Open Access Journals (Sweden)

    Eder Dos Santos

    2015-08-01

    The next Scientific-Technical Report addresses the similarity search and the implementation of metric structures on parallel environments. It also presents the state of the art related to similarity search on metric structures and parallelism technologies. Comparative analysis are also proposed, seeking to identify the behavior of a set of metric spaces and metric structures over processing platforms multicore-based and GPU-based.

  4. Effective semantic search using thematic similarity

    Directory of Open Access Journals (Sweden)

    Sharifullah Khan

    2014-07-01

    Full Text Available Most existing semantic search systems expand search keywords using domain ontology to deal with semantic heterogeneity. They focus on matching the semantic similarity of individual keywords in a multiple-keywords query; however, they ignore the semantic relationships that exist among the keywords of the query themselves. The systems return less relevant answers for these types of queries. More relevant documents for a multiple-keywords query can be retrieved if the systems know the relationships that exist among multiple keywords in the query. The proposed search methodology matches patterns of keywords for capturing the context of keywords, and then the relevant documents are ranked according to their pattern relevance score. A prototype system has been implemented to validate the proposed search methodology. The system has been compared with existing systems for evaluation. The results demonstrate improvement in precision and recall of search.

  5. Molecular fingerprint similarity search in virtual screening.

    Science.gov (United States)

    Cereto-Massagué, Adrià; Ojeda, María José; Valls, Cristina; Mulero, Miquel; Garcia-Vallvé, Santiago; Pujadas, Gerard

    2015-01-01

    Molecular fingerprints have been used for a long time now in drug discovery and virtual screening. Their ease of use (requiring little to no configuration) and the speed at which substructure and similarity searches can be performed with them - paired with a virtual screening performance similar to other more complex methods - is the reason for their popularity. However, there are many types of fingerprints, each representing a different aspect of the molecule, which can greatly affect search performance. This review focuses on commonly used fingerprint algorithms, their usage in virtual screening, and the software packages and online tools that provide these algorithms.

  6. Gene functional similarity search tool (GFSST

    Directory of Open Access Journals (Sweden)

    Russo James J

    2006-03-01

    Full Text Available Abstract Background With the completion of the genome sequences of human, mouse, and other species and the advent of high throughput functional genomic research technologies such as biomicroarray chips, more and more genes and their products have been discovered and their functions have begun to be understood. Increasing amounts of data about genes, gene products and their functions have been stored in databases. To facilitate selection of candidate genes for gene-disease research, genetic association studies, biomarker and drug target selection, and animal models of human diseases, it is essential to have search engines that can retrieve genes by their functions from proteome databases. In recent years, the development of Gene Ontology (GO has established structured, controlled vocabularies describing gene functions, which makes it possible to develop novel tools to search genes by functional similarity. Results By using a statistical model to measure the functional similarity of genes based on the Gene Ontology directed acyclic graph, we developed a novel Gene Functional Similarity Search Tool (GFSST to identify genes with related functions from annotated proteome databases. This search engine lets users design their search targets by gene functions. Conclusion An implementation of GFSST which works on the UniProt (Universal Protein Resource for the human and mouse proteomes is available at GFSST Web Server. GFSST provides functions not only for similar gene retrieval but also for gene search by one or more GO terms. This represents a powerful new approach for selecting similar genes and gene products from proteome databases according to their functions.

  7. Web Search Results Summarization Using Similarity Assessment

    Directory of Open Access Journals (Sweden)

    Sawant V.V.

    2014-06-01

    Full Text Available Now day’s internet has become part of our life, the WWW is most important service of internet because it allows presenting information such as document, imaging etc. The WWW grows rapidly and caters to a diversified levels and categories of users. For user specified results web search results are extracted. Millions of information pouring online, users has no time to surf the contents completely .Moreover the information available is repeated or duplicated in nature. This issue has created the necessity to restructure the search results that could yield results summarized. The proposed approach comprises of different feature extraction of web pages. Web page visual similarity assessment has been employed to address the problems in different fields including phishing, web archiving, web search engine etc. In this approach, initially by enters user query the number of search results get stored. The Earth Mover's Distance is used to assessment of web page visual similarity, in this technique take the web page as a low resolution image, create signature of that web page image with color and co-ordinate features .Calculate the distance between web pages by applying EMD method. Compute the Layout Similarity value by using tag comparison algorithm and template comparison algorithm. Textual similarity is computed by using cosine similarity, and hyperlink analysis is performed to compute outward links. The final similarity value is calculated by fusion of layout, text, hyperlink and EMD value. Once the similarity matrix is found clustering is employed with the help of connected component. Finally group of similar web pages i.e. summarized results get displayed to user. Experiment conducted to demonstrate the effectiveness of four methods to generate summarized result on different web pages and user queries also.

  8. SEAL: Spatio-Textual Similarity Search

    CERN Document Server

    Fan, Ju; Zhou, Lizhu; Chen, Shanshan; Hu, Jun

    2012-01-01

    Location-based services (LBS) have become more and more ubiquitous recently. Existing methods focus on finding relevant points-of-interest (POIs) based on users' locations and query keywords. Nowadays, modern LBS applications generate a new kind of spatio-textual data, regions-of-interest (ROIs), containing region-based spatial information and textual description, e.g., mobile user profiles with active regions and interest tags. To satisfy search requirements on ROIs, we study a new research problem, called spatio-textual similarity search: Given a set of ROIs and a query ROI, we find the similar ROIs by considering spatial overlap and textual similarity. Spatio-textual similarity search has many important applications, e.g., social marketing in location-aware social networks. It calls for an efficient search method to support large scales of spatio-textual data in LBS systems. To this end, we introduce a filter-and-verification framework to compute the answers. In the filter step, we generate signatures for ...

  9. Similarity searching in large combinatorial chemistry spaces

    Science.gov (United States)

    Rarey, Matthias; Stahl, Martin

    2001-06-01

    We present a novel algorithm, called Ftrees-FS, for similarity searching in large chemistry spaces based on dynamic programming. Given a query compound, the algorithm generates sets of compounds from a given chemistry space that are similar to the query. The similarity search is based on the feature tree similarity measure representing molecules by tree structures. This descriptor allows handling combinatorial chemistry spaces as a whole instead of looking at subsets of enumerated compounds. Within few minutes of computing time, the algorithm is able to find the most similar compound in very large spaces as well as sets of compounds at an arbitrary similarity level. In addition, the diversity among the generated compounds can be controlled. A set of 17 000 fragments of known drugs, generated by the RECAP procedure from the World Drug Index, was used as the search chemistry space. These fragments can be combined to more than 1018 compounds of reasonable size. For validation, known antagonists/inhibitors of several targets including dopamine D4, histamine H1, and COX2 are used as queries. Comparison of the compounds created by Ftrees-FS to other known actives demonstrates the ability of the method to jump between structurally unrelated molecule classes.

  10. Predicting the performance of fingerprint similarity searching.

    Science.gov (United States)

    Vogt, Martin; Bajorath, Jürgen

    2011-01-01

    Fingerprints are bit string representations of molecular structure that typically encode structural fragments, topological features, or pharmacophore patterns. Various fingerprint designs are utilized in virtual screening and their search performance essentially depends on three parameters: the nature of the fingerprint, the active compounds serving as reference molecules, and the composition of the screening database. It is of considerable interest and practical relevance to predict the performance of fingerprint similarity searching. A quantitative assessment of the potential that a fingerprint search might successfully retrieve active compounds, if available in the screening database, would substantially help to select the type of fingerprint most suitable for a given search problem. The method presented herein utilizes concepts from information theory to relate the fingerprint feature distributions of reference compounds to screening libraries. If these feature distributions do not sufficiently differ, active database compounds that are similar to reference molecules cannot be retrieved because they disappear in the "background." By quantifying the difference in feature distribution using the Kullback-Leibler divergence and relating the divergence to compound recovery rates obtained for different benchmark classes, fingerprint search performance can be quantitatively predicted.

  11. New similarity search based glioma grading

    Energy Technology Data Exchange (ETDEWEB)

    Haegler, Katrin; Brueckmann, Hartmut; Linn, Jennifer [Ludwig-Maximilians-University of Munich, Department of Neuroradiology, Munich (Germany); Wiesmann, Martin; Freiherr, Jessica [RWTH Aachen University, Department of Neuroradiology, Aachen (Germany); Boehm, Christian [Ludwig-Maximilians-University of Munich, Department of Computer Science, Munich (Germany); Schnell, Oliver; Tonn, Joerg-Christian [Ludwig-Maximilians-University of Munich, Department of Neurosurgery, Munich (Germany)

    2012-08-15

    MR-based differentiation between low- and high-grade gliomas is predominately based on contrast-enhanced T1-weighted images (CE-T1w). However, functional MR sequences as perfusion- and diffusion-weighted sequences can provide additional information on tumor grade. Here, we tested the potential of a recently developed similarity search based method that integrates information of CE-T1w and perfusion maps for non-invasive MR-based glioma grading. We prospectively included 37 untreated glioma patients (23 grade I/II, 14 grade III gliomas), in whom 3T MRI with FLAIR, pre- and post-contrast T1-weighted, and perfusion sequences was performed. Cerebral blood volume, cerebral blood flow, and mean transit time maps as well as CE-T1w images were used as input for the similarity search. Data sets were preprocessed and converted to four-dimensional Gaussian Mixture Models that considered correlations between the different MR sequences. For each patient, a so-called tumor feature vector (= probability-based classifier) was defined and used for grading. Biopsy was used as gold standard, and similarity based grading was compared to grading solely based on CE-T1w. Accuracy, sensitivity, and specificity of pure CE-T1w based glioma grading were 64.9%, 78.6%, and 56.5%, respectively. Similarity search based tumor grading allowed differentiation between low-grade (I or II) and high-grade (III) gliomas with an accuracy, sensitivity, and specificity of 83.8%, 78.6%, and 87.0%. Our findings indicate that integration of perfusion parameters and CE-T1w information in a semi-automatic similarity search based analysis improves the potential of MR-based glioma grading compared to CE-T1w data alone. (orig.)

  12. Efficient Video Similarity Measurement and Search

    Energy Technology Data Exchange (ETDEWEB)

    Cheung, S-C S

    2002-12-19

    The amount of information on the world wide web has grown enormously since its creation in 1990. Duplication of content is inevitable because there is no central management on the web. Studies have shown that many similar versions of the same text documents can be found throughout the web. This redundancy problem is more severe for multimedia content such as web video sequences, as they are often stored in multiple locations and different formats to facilitate downloading and streaming. Similar versions of the same video can also be found, unknown to content creators, when web users modify and republish original content using video editing tools. Identifying similar content can benefit many web applications and content owners. For example, it will reduce the number of similar answers to a web search and identify inappropriate use of copyright content. In this dissertation, they present a system architecture and corresponding algorithms to efficiently measure, search, and organize similar video sequences found on any large database such as the web.

  13. Ultra-accurate collaborative information filtering via directed user similarity

    Science.gov (United States)

    Guo, Q.; Song, W.-J.; Liu, J.-G.

    2014-07-01

    A key challenge of the collaborative filtering (CF) information filtering is how to obtain the reliable and accurate results with the help of peers' recommendation. Since the similarities from small-degree users to large-degree users would be larger than the ones in opposite direction, the large-degree users' selections are recommended extensively by the traditional second-order CF algorithms. By considering the users' similarity direction and the second-order correlations to depress the influence of mainstream preferences, we present the directed second-order CF (HDCF) algorithm specifically to address the challenge of accuracy and diversity of the CF algorithm. The numerical results for two benchmark data sets, MovieLens and Netflix, show that the accuracy of the new algorithm outperforms the state-of-the-art CF algorithms. Comparing with the CF algorithm based on random walks proposed by Liu et al. (Int. J. Mod. Phys. C, 20 (2009) 285) the average ranking score could reach 0.0767 and 0.0402, which is enhanced by 27.3% and 19.1% for MovieLens and Netflix, respectively. In addition, the diversity, precision and recall are also enhanced greatly. Without relying on any context-specific information, tuning the similarity direction of CF algorithms could obtain accurate and diverse recommendations. This work suggests that the user similarity direction is an important factor to improve the personalized recommendation performance.

  14. Outsourced similarity search on metric data assets

    KAUST Repository

    Yiu, Man Lung

    2012-02-01

    This paper considers a cloud computing setting in which similarity querying of metric data is outsourced to a service provider. The data is to be revealed only to trusted users, not to the service provider or anyone else. Users query the server for the most similar data objects to a query example. Outsourcing offers the data owner scalability and a low-initial investment. The need for privacy may be due to the data being sensitive (e.g., in medicine), valuable (e.g., in astronomy), or otherwise confidential. Given this setting, the paper presents techniques that transform the data prior to supplying it to the service provider for similarity queries on the transformed data. Our techniques provide interesting trade-offs between query cost and accuracy. They are then further extended to offer an intuitive privacy guarantee. Empirical studies with real data demonstrate that the techniques are capable of offering privacy while enabling efficient and accurate processing of similarity queries.

  15. Earthquake detection through computationally efficient similarity search

    Science.gov (United States)

    Yoon, Clara E.; O’Reilly, Ossian; Bergen, Karianne J.; Beroza, Gregory C.

    2015-01-01

    Seismology is experiencing rapid growth in the quantity of data, which has outpaced the development of processing algorithms. Earthquake detection—identification of seismic events in continuous data—is a fundamental operation for observational seismology. We developed an efficient method to detect earthquakes using waveform similarity that overcomes the disadvantages of existing detection methods. Our method, called Fingerprint And Similarity Thresholding (FAST), can analyze a week of continuous seismic waveform data in less than 2 hours, or 140 times faster than autocorrelation. FAST adapts a data mining algorithm, originally designed to identify similar audio clips within large databases; it first creates compact “fingerprints” of waveforms by extracting key discriminative features, then groups similar fingerprints together within a database to facilitate fast, scalable search for similar fingerprint pairs, and finally generates a list of earthquake detections. FAST detected most (21 of 24) cataloged earthquakes and 68 uncataloged earthquakes in 1 week of continuous data from a station located near the Calaveras Fault in central California, achieving detection performance comparable to that of autocorrelation, with some additional false detections. FAST is expected to realize its full potential when applied to extremely long duration data sets over a distributed network of seismic stations. The widespread application of FAST has the potential to aid in the discovery of unexpected seismic signals, improve seismic monitoring, and promote a greater understanding of a variety of earthquake processes. PMID:26665176

  16. Subvoxel accurate graph search using non-Euclidean graph space.

    Directory of Open Access Journals (Sweden)

    Michael D Abràmoff

    Full Text Available Graph search is attractive for the quantitative analysis of volumetric medical images, and especially for layered tissues, because it allows globally optimal solutions in low-order polynomial time. However, because nodes of graphs typically encode evenly distributed voxels of the volume with arcs connecting orthogonally sampled voxels in Euclidean space, segmentation cannot achieve greater precision than a single unit, i.e. the distance between two adjoining nodes, and partial volume effects are ignored. We generalize the graph to non-Euclidean space by allowing non-equidistant spacing between nodes, so that subvoxel accurate segmentation is achievable. Because the number of nodes and edges in the graph remains the same, running time and memory use are similar, while all the advantages of graph search, including global optimality and computational efficiency, are retained. A deformation field calculated from the volume data adaptively changes regional node density so that node density varies with the inverse of the expected cost. We validated our approach using optical coherence tomography (OCT images of the retina and 3-D MR of the arterial wall, and achieved statistically significant increased accuracy. Our approach allows improved accuracy in volume data acquired with the same hardware, and also, preserved accuracy with lower resolution, more cost-effective, image acquisition equipment. The method is not limited to any specific imaging modality and readily extensible to higher dimensions.

  17. Outsourced Similarity Search on Metric Data Assets

    DEFF Research Database (Denmark)

    Yiu, Man Lung; Assent, Ira; Jensen, Christian S.

    2012-01-01

    This paper considers a cloud computing setting in which similarity querying of metric data is outsourced to a service provider. The data is to be revealed only to trusted users, not to the service provider or anyone else. Users query the server for the most similar data objects to a query example...

  18. A Similarity Search Using Molecular Topological Graphs

    Directory of Open Access Journals (Sweden)

    Yoshifumi Fukunishi

    2009-01-01

    Full Text Available A molecular similarity measure has been developed using molecular topological graphs and atomic partial charges. Two kinds of topological graphs were used. One is the ordinary adjacency matrix and the other is a matrix which represents the minimum path length between two atoms of the molecule. The ordinary adjacency matrix is suitable to compare the local structures of molecules such as functional groups, and the other matrix is suitable to compare the global structures of molecules. The combination of these two matrices gave a similarity measure. This method was applied to in silico drug screening, and the results showed that it was effective as a similarity measure.

  19. Learning Style Similarity for Searching Infographics

    OpenAIRE

    Saleh, Babak; Dontcheva, Mira; Hertzmann, Aaron; Liu, Zhicheng

    2015-01-01

    Infographics are complex graphic designs integrating text, images, charts and sketches. Despite the increasing popularity of infographics and the rapid growth of online design portfolios, little research investigates how we can take advantage of these design resources. In this paper we present a method for measuring the style similarity between infographics. Based on human perception data collected from crowdsourced experiments, we use computer vision and machine learning algorithms to learn ...

  20. Distributed Efficient Similarity Search Mechanism in Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Khandakar Ahmed

    2015-03-01

    Full Text Available The Wireless Sensor Network similarity search problem has received considerable research attention due to sensor hardware imprecision and environmental parameter variations. Most of the state-of-the-art distributed data centric storage (DCS schemes lack optimization for similarity queries of events. In this paper, a DCS scheme with metric based similarity searching (DCSMSS is proposed. DCSMSS takes motivation from vector distance index, called iDistance, in order to transform the issue of similarity searching into the problem of an interval search in one dimension. In addition, a sector based distance routing algorithm is used to efficiently route messages. Extensive simulation results reveal that DCSMSS is highly efficient and significantly outperforms previous approaches in processing similarity search queries.

  1. Visual similarity is stronger than semantic similarity in guiding visual search for numbers.

    Science.gov (United States)

    Godwin, Hayward J; Hout, Michael C; Menneer, Tamaryn

    2014-06-01

    Using a visual search task, we explored how behavior is influenced by both visual and semantic information. We recorded participants' eye movements as they searched for a single target number in a search array of single-digit numbers (0-9). We examined the probability of fixating the various distractors as a function of two key dimensions: the visual similarity between the target and each distractor, and the semantic similarity (i.e., the numerical distance) between the target and each distractor. Visual similarity estimates were obtained using multidimensional scaling based on the independent observer similarity ratings. A linear mixed-effects model demonstrated that both visual and semantic similarity influenced the probability that distractors would be fixated. However, the visual similarity effect was substantially larger than the semantic similarity effect. We close by discussing the potential value of using this novel methodological approach and the implications for both simple and complex visual search displays.

  2. How Google Web Search copes with very similar documents

    NARCIS (Netherlands)

    Mettrop, W.; Nieuwenhuysen, P.; Smulders, H.

    2006-01-01

    A significant portion of the computer files that carry documents, multimedia, programs etc. on the Web are identical or very similar to other files on the Web. How do search engines cope with this? Do they perform some kind of “deduplication”? How should users take into account that web search resul

  3. RAPSearch: a fast protein similarity search tool for short reads

    National Research Council Canada - National Science Library

    Ye, Yuzhen; Choi, Jeong-Hyeon; Tang, Haixu

    2011-01-01

    ... of the very sizes of the short read datasets. We developed a fast protein similarity search tool RAPSearch that utilizes a reduced amino acid alphabet and suffix array to detect seeds of flexible length. For short reads...

  4. Fast and accurate protein substructure searching with simulated annealing and GPUs

    Directory of Open Access Journals (Sweden)

    Stivala Alex D

    2010-09-01

    Full Text Available Abstract Background Searching a database of protein structures for matches to a query structure, or occurrences of a structural motif, is an important task in structural biology and bioinformatics. While there are many existing methods for structural similarity searching, faster and more accurate approaches are still required, and few current methods are capable of substructure (motif searching. Results We developed an improved heuristic for tableau-based protein structure and substructure searching using simulated annealing, that is as fast or faster and comparable in accuracy, with some widely used existing methods. Furthermore, we created a parallel implementation on a modern graphics processing unit (GPU. Conclusions The GPU implementation achieves up to 34 times speedup over the CPU implementation of tableau-based structure search with simulated annealing, making it one of the fastest available methods. To the best of our knowledge, this is the first application of a GPU to the protein structural search problem.

  5. Efficient Subgraph Similarity Search on Large Probabilistic Graph Databases

    CERN Document Server

    Yuan, Ye; Chen, Lei; Wang, Haixun

    2012-01-01

    Many studies have been conducted on seeking the efficient solution for subgraph similarity search over certain (deterministic) graphs due to its wide application in many fields, including bioinformatics, social network analysis, and Resource Description Framework (RDF) data management. All these works assume that the underlying data are certain. However, in reality, graphs are often noisy and uncertain due to various factors, such as errors in data extraction, inconsistencies in data integration, and privacy preserving purposes. Therefore, in this paper, we study subgraph similarity search on large probabilistic graph databases. Different from previous works assuming that edges in an uncertain graph are independent of each other, we study the uncertain graphs where edges' occurrences are correlated. We formally prove that subgraph similarity search over probabilistic graphs is #P-complete, thus, we employ a filter-and-verify framework to speed up the search. In the filtering phase,we develop tight lower and u...

  6. Using SQL Databases for Sequence Similarity Searching and Analysis.

    Science.gov (United States)

    Pearson, William R; Mackey, Aaron J

    2017-09-13

    Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  7. Exact score distribution computation for ontological similarity searches

    Directory of Open Access Journals (Sweden)

    Schulz Marcel H

    2011-11-01

    Full Text Available Abstract Background Semantic similarity searches in ontologies are an important component of many bioinformatic algorithms, e.g., finding functionally related proteins with the Gene Ontology or phenotypically similar diseases with the Human Phenotype Ontology (HPO. We have recently shown that the performance of semantic similarity searches can be improved by ranking results according to the probability of obtaining a given score at random rather than by the scores themselves. However, to date, there are no algorithms for computing the exact distribution of semantic similarity scores, which is necessary for computing the exact P-value of a given score. Results In this paper we consider the exact computation of score distributions for similarity searches in ontologies, and introduce a simple null hypothesis which can be used to compute a P-value for the statistical significance of similarity scores. We concentrate on measures based on Resnik's definition of ontological similarity. A new algorithm is proposed that collapses subgraphs of the ontology graph and thereby allows fast score distribution computation. The new algorithm is several orders of magnitude faster than the naive approach, as we demonstrate by computing score distributions for similarity searches in the HPO. It is shown that exact P-value calculation improves clinical diagnosis using the HPO compared to approaches based on sampling. Conclusions The new algorithm enables for the first time exact P-value calculation via exact score distribution computation for ontology similarity searches. The approach is applicable to any ontology for which the annotation-propagation rule holds and can improve any bioinformatic method that makes only use of the raw similarity scores. The algorithm was implemented in Java, supports any ontology in OBO format, and is available for non-commercial and academic usage under: https://compbio.charite.de/svn/hpo/trunk/src/tools/significance/

  8. Search Profiles Based on User to Cluster Similarity

    Directory of Open Access Journals (Sweden)

    Saša Bošnjak

    2009-06-01

    Full Text Available Privacy of web users' query search logs has, since the AOL dataset release few years ago, been treated as one of the central issues concerning privacy on the Internet. Therefore, the question of privacy preservation has also raised a lot of attention in different communities surrounding the search engines. Usage of clustering methods for providing low level contextual search while retaining high privacy-utility tradeoff, is examined in this paper. By using only the user`s cluster membership the search query terms could be no longer retained thus providing less privacy concerns both for the users and companies. The paper brings lightweight framework for combining query words, user similarities and clustering in order to provide a meaningful way of mining user searches while protecting their privacy. This differs from previous attempts for privacy preserving in the attempt to anonymize the queries instead of the users.

  9. SEARCH PROFILES BASED ON USER TO CLUSTER SIMILARITY

    Directory of Open Access Journals (Sweden)

    Ilija Subasic

    2007-12-01

    Full Text Available Privacy of web users' query search logs has, since last year's AOL dataset release, been treated as one of the central issues concerning privacy on the Internet, Therefore, the question of privacy preservation has also raised a lot of attention in different communities surrounding the search engines. Usage of clustering methods for providing low level contextual search, wriile retaining high privacy/utility is examined in this paper. By using only the user's cluster membership the search query terms could be no longer retained thus providing less privacy concerns both for the users and companies. The paper brings lightweight framework for combining query words, user similarities and clustering in order to provide a meaningful way of mining user searches while protecting their privacy. This differs from previous attempts for privacy preserving in the attempt to anonymize the queries instead of the users.

  10. RAPSearch: a fast protein similarity search tool for short reads

    Directory of Open Access Journals (Sweden)

    Choi Jeong-Hyeon

    2011-05-01

    Full Text Available Abstract Background Next Generation Sequencing (NGS is producing enormous corpuses of short DNA reads, affecting emerging fields like metagenomics. Protein similarity search--a key step to achieve annotation of protein-coding genes in these short reads, and identification of their biological functions--faces daunting challenges because of the very sizes of the short read datasets. Results We developed a fast protein similarity search tool RAPSearch that utilizes a reduced amino acid alphabet and suffix array to detect seeds of flexible length. For short reads (translated in 6 frames we tested, RAPSearch achieved ~20-90 times speedup as compared to BLASTX. RAPSearch missed only a small fraction (~1.3-3.2% of BLASTX similarity hits, but it also discovered additional homologous proteins (~0.3-2.1% that BLASTX missed. By contrast, BLAT, a tool that is even slightly faster than RAPSearch, had significant loss of sensitivity as compared to RAPSearch and BLAST. Conclusions RAPSearch is implemented as open-source software and is accessible at http://omics.informatics.indiana.edu/mg/RAPSearch. It enables faster protein similarity search. The application of RAPSearch in metageomics has also been demonstrated.

  11. Robust hashing with local models for approximate similarity search.

    Science.gov (United States)

    Song, Jingkuan; Yang, Yi; Li, Xuelong; Huang, Zi; Yang, Yang

    2014-07-01

    Similarity search plays an important role in many applications involving high-dimensional data. Due to the known dimensionality curse, the performance of most existing indexing structures degrades quickly as the feature dimensionality increases. Hashing methods, such as locality sensitive hashing (LSH) and its variants, have been widely used to achieve fast approximate similarity search by trading search quality for efficiency. However, most existing hashing methods make use of randomized algorithms to generate hash codes without considering the specific structural information in the data. In this paper, we propose a novel hashing method, namely, robust hashing with local models (RHLM), which learns a set of robust hash functions to map the high-dimensional data points into binary hash codes by effectively utilizing local structural information. In RHLM, for each individual data point in the training dataset, a local hashing model is learned and used to predict the hash codes of its neighboring data points. The local models from all the data points are globally aligned so that an optimal hash code can be assigned to each data point. After obtaining the hash codes of all the training data points, we design a robust method by employing l2,1 -norm minimization on the loss function to learn effective hash functions, which are then used to map each database point into its hash code. Given a query data point, the search process first maps it into the query hash code by the hash functions and then explores the buckets, which have similar hash codes to the query hash code. Extensive experimental results conducted on real-life datasets show that the proposed RHLM outperforms the state-of-the-art methods in terms of search quality and efficiency.

  12. Online multiple kernel similarity learning for visual search.

    Science.gov (United States)

    Xia, Hao; Hoi, Steven C H; Jin, Rong; Zhao, Peilin

    2014-03-01

    Recent years have witnessed a number of studies on distance metric learning to improve visual similarity search in content-based image retrieval (CBIR). Despite their successes, most existing methods on distance metric learning are limited in two aspects. First, they usually assume the target proximity function follows the family of Mahalanobis distances, which limits their capacity of measuring similarity of complex patterns in real applications. Second, they often cannot effectively handle the similarity measure of multimodal data that may originate from multiple resources. To overcome these limitations, this paper investigates an online kernel similarity learning framework for learning kernel-based proximity functions which goes beyond the conventional linear distance metric learning approaches. Based on the framework, we propose a novel online multiple kernel similarity (OMKS) learning method which learns a flexible nonlinear proximity function with multiple kernels to improve visual similarity search in CBIR. We evaluate the proposed technique for CBIR on a variety of image data sets in which encouraging results show that OMKS outperforms the state-of-the-art techniques significantly.

  13. Similarity Search and Locality Sensitive Hashing using TCAMs

    CERN Document Server

    Shinde, Rajendra; Gupta, Pankaj; Dutta, Debojyoti

    2010-01-01

    Similarity search methods are widely used as kernels in various machine learning applications. Nearest neighbor search (NNS) algorithms are often used to retrieve similar entries, given a query. While there exist efficient techniques for exact query lookup using hashing, similarity search using exact nearest neighbors is known to be a hard problem and in high dimensions, best known solutions offer little improvement over a linear scan. Fast solutions to the approximate NNS problem include Locality Sensitive Hashing (LSH) based techniques, which need storage polynomial in $n$ with exponent greater than $1$, and query time sublinear, but still polynomial in $n$, where $n$ is the size of the database. In this work we present a new technique of solving the approximate NNS problem in Euclidean space using a Ternary Content Addressable Memory (TCAM), which needs near linear space and has O(1) query time. In fact, this method also works around the best known lower bounds in the cell probe model for the query time us...

  14. Semantic similarity measure in biomedical domain leverage web search engine.

    Science.gov (United States)

    Chen, Chi-Huang; Hsieh, Sheau-Ling; Weng, Yung-Ching; Chang, Wen-Yung; Lai, Feipei

    2010-01-01

    Semantic similarity measure plays an essential role in Information Retrieval and Natural Language Processing. In this paper we propose a page-count-based semantic similarity measure and apply it in biomedical domains. Previous researches in semantic web related applications have deployed various semantic similarity measures. Despite the usefulness of the measurements in those applications, measuring semantic similarity between two terms remains a challenge task. The proposed method exploits page counts returned by the Web Search Engine. We define various similarity scores for two given terms P and Q, using the page counts for querying P, Q and P AND Q. Moreover, we propose a novel approach to compute semantic similarity using lexico-syntactic patterns with page counts. These different similarity scores are integrated adapting support vector machines, to leverage the robustness of semantic similarity measures. Experimental results on two datasets achieve correlation coefficients of 0.798 on the dataset provided by A. Hliaoutakis, 0.705 on the dataset provide by T. Pedersen with physician scores and 0.496 on the dataset provided by T. Pedersen et al. with expert scores.

  15. Computing Semantic Similarity Measure Between Words Using Web Search Engine

    Directory of Open Access Journals (Sweden)

    Pushpa C N

    2013-05-01

    Full Text Available Semantic Similarity measures between words plays an important role in information retrieval, natural language processing and in various tasks on the web. In this paper, we have proposed a Modified Pattern Extraction Algorithm to compute th e supervised semantic similarity measure between the words by combining both page count meth od and web snippets method. Four association measures are used to find semantic simi larity between words in page count method using web search engines. We use a Sequential Minim al Optimization (SMO support vector machines (SVM to find the optimal combination of p age counts-based similarity scores and top-ranking patterns from the web snippets method. The SVM is trained to classify synonymous word-pairs and non-synonymous word-pairs. The propo sed Modified Pattern Extraction Algorithm outperforms by 89.8 percent of correlatio n value.

  16. Fast and accurate database searches with MS-GF+Percolator

    Energy Technology Data Exchange (ETDEWEB)

    Granholm, Viktor; Kim, Sangtae; Navarro, Jose' C.; Sjolund, Erik; Smith, Richard D.; Kall, Lukas

    2014-02-28

    To identify peptides and proteins from the large number of fragmentation spectra in mass spectrometrybased proteomics, researches commonly employ so called database search engines. Additionally, postprocessors like Percolator have been used on the results from such search engines, to assess confidence, infer peptides and generally increase the number of identifications. A recent search engine, MS-GF+, has previously been showed to out-perform these classical search engines in terms of the number of identified spectra. However, MS-GF+ generates only limited statistical estimates of the results, hence hampering the biological interpretation. Here, we enabled Percolator-processing for MS-GF+ output, and observed an increased number of identified peptides for a wide variety of datasets. In addition, Percolator directly reports false discovery rate estimates, such as q values and posterior error probabilities, as well as p values, for peptide-spectrum matches, peptides and proteins, functions useful for the whole proteomics community.

  17. CLIP: similarity searching of 3D databases using clique detection.

    Science.gov (United States)

    Rhodes, Nicholas; Willett, Peter; Calvet, Alain; Dunbar, James B; Humblet, Christine

    2003-01-01

    This paper describes a program for 3D similarity searching, called CLIP (for Candidate Ligand Identification Program), that uses the Bron-Kerbosch clique detection algorithm to find those structures in a file that have large structures in common with a target structure. Structures are characterized by the geometric arrangement of pharmacophore points and the similarity between two structures calculated using modifications of the Simpson and Tanimoto association coefficients. This modification takes into account the fact that a distance tolerance is required to ensure that pairs of interatomic distances can be regarded as equivalent during the clique-construction stage of the matching algorithm. Experiments with HIV assay data demonstrate the effectiveness and the efficiency of this approach to virtual screening.

  18. SHOP: scaffold hopping by GRID-based similarity searches

    DEFF Research Database (Denmark)

    Bergmann, Rikke; Linusson, Anna; Zamora, Ismael

    2007-01-01

    A new GRID-based method for scaffold hopping (SHOP) is presented. In a fully automatic manner, scaffolds were identified in a database based on three types of 3D-descriptors. SHOP's ability to recover scaffolds was assessed and validated by searching a database spiked with fragments of known...... ligands of three different protein targets relevant for drug discovery using a rational approach based on statistical experimental design. Five out of eight and seven out of eight thrombin scaffolds and all seven HIV protease scaffolds were recovered within the top 10 and 31 out of 31 neuraminidase...... scaffolds were in the 31 top-ranked scaffolds. SHOP also identified new scaffolds with substantially different chemotypes from the queries. Docking analysis indicated that the new scaffolds would have similar binding modes to those of the respective query scaffolds observed in X-ray structures...

  19. Fast and accurate database searches with MS-GF+Percolator.

    Science.gov (United States)

    Granholm, Viktor; Kim, Sangtae; Navarro, José C F; Sjölund, Erik; Smith, Richard D; Käll, Lukas

    2014-02-07

    One can interpret fragmentation spectra stemming from peptides in mass-spectrometry-based proteomics experiments using so-called database search engines. Frequently, one also runs post-processors such as Percolator to assess the confidence, infer unique peptides, and increase the number of identifications. A recent search engine, MS-GF+, has shown promising results, due to a new and efficient scoring algorithm. However, MS-GF+ provides few statistical estimates about the peptide-spectrum matches, hence limiting the biological interpretation. Here, we enabled Percolator processing for MS-GF+ output and observed an increased number of identified peptides for a wide variety of data sets. In addition, Percolator directly reports p values and false discovery rate estimates, such as q values and posterior error probabilities, for peptide-spectrum matches, peptides, and proteins, functions that are useful for the whole proteomics community.

  20. Keyword Search over Data Service Integration for Accurate Results

    CERN Document Server

    Zemleris, Vidmantas; Robert Gwadera

    2013-01-01

    Virtual data integration provides a coherent interface for querying heterogeneous data sources (e.g., web services, proprietary systems) with minimum upfront effort. Still, this requires its users to learn the query language and to get acquainted with data organization, which may pose problems even to proficient users. We present a keyword search system, which proposes a ranked list of structured queries along with their explanations. It operates mainly on the metadata, such as the constraints on inputs accepted by services. It was developed as an integral part of the CMS data discovery service, and is currently available as open source.

  1. Efficient searching and annotation of metabolic networks using chemical similarity

    OpenAIRE

    Pertusi, Dante A.; Stine, Andrew E.; Broadbelt, Linda J.; Keith E J Tyo

    2014-01-01

    Motivation: The urgent need for efficient and sustainable biological production of fuels and high-value chemicals has elicited a wave of in silico techniques for identifying promising novel pathways to these compounds in large putative metabolic networks. To date, these approaches have primarily used general graph search algorithms, which are prohibitively slow as putative metabolic networks may exceed 1 million compounds. To alleviate this limitation, we report two methods—SimIndex (SI) and ...

  2. An accurate algorithm to calculate the Hurst exponent of self-similar processes

    Energy Technology Data Exchange (ETDEWEB)

    Fernández-Martínez, M., E-mail: fmm124@ual.es [Department of Mathematics, Faculty of Science, Universidad de Almería, 04120 Almería (Spain); Sánchez-Granero, M.A., E-mail: misanche@ual.es [Department of Mathematics, Faculty of Science, Universidad de Almería, 04120 Almería (Spain); Trinidad Segovia, J.E., E-mail: jetrini@ual.es [Department of Accounting and Finance, Faculty of Economics and Business, Universidad de Almería, 04120 Almería (Spain); Román-Sánchez, I.M., E-mail: iroman@ual.es [Department of Accounting and Finance, Faculty of Economics and Business, Universidad de Almería, 04120 Almería (Spain)

    2014-06-27

    In this paper, we introduce a new approach which generalizes the GM2 algorithm (introduced in Sánchez-Granero et al. (2008) [52]) as well as fractal dimension algorithms (FD1, FD2 and FD3) (first appeared in Sánchez-Granero et al. (2012) [51]), providing an accurate algorithm to calculate the Hurst exponent of self-similar processes. We prove that this algorithm performs properly in the case of short time series when fractional Brownian motions and Lévy stable motions are considered. We conclude the paper with a dynamic study of the Hurst exponent evolution in the S and P500 index stocks. - Highlights: • We provide a new approach to properly calculate the Hurst exponent. • This generalizes FD algorithms and GM2, introduced previously by the authors. • This method (FD4) results especially appropriate for short time series. • FD4 may be used in both unifractal and multifractal contexts. • As an empirical application, we show that S and P500 stocks improved their efficiency.

  3. Density-based similarity measures for content based search

    Energy Technology Data Exchange (ETDEWEB)

    Hush, Don R [Los Alamos National Laboratory; Porter, Reid B [Los Alamos National Laboratory; Ruggiero, Christy E [Los Alamos National Laboratory

    2009-01-01

    We consider the query by multiple example problem where the goal is to identify database samples whose content is similar to a coUection of query samples. To assess the similarity we use a relative content density which quantifies the relative concentration of the query distribution to the database distribution. If the database distribution is a mixture of the query distribution and a background distribution then it can be shown that database samples whose relative content density is greater than a particular threshold {rho} are more likely to have been generated by the query distribution than the background distribution. We describe an algorithm for predicting samples with relative content density greater than {rho} that is computationally efficient and possesses strong performance guarantees. We also show empirical results for applications in computer network monitoring and image segmentation.

  4. Accurate estimation of influenza epidemics using Google search data via ARGO.

    Science.gov (United States)

    Yang, Shihao; Santillana, Mauricio; Kou, S C

    2015-11-24

    Accurate real-time tracking of influenza outbreaks helps public health officials make timely and meaningful decisions that could save lives. We propose an influenza tracking model, ARGO (AutoRegression with GOogle search data), that uses publicly available online search data. In addition to having a rigorous statistical foundation, ARGO outperforms all previously available Google-search-based tracking models, including the latest version of Google Flu Trends, even though it uses only low-quality search data as input from publicly available Google Trends and Google Correlate websites. ARGO not only incorporates the seasonality in influenza epidemics but also captures changes in people's online search behavior over time. ARGO is also flexible, self-correcting, robust, and scalable, making it a potentially powerful tool that can be used for real-time tracking of other social events at multiple temporal and spatial resolutions.

  5. Content-Based Search on a Database of Geometric Models: Identifying Objects of Similar Shape

    Energy Technology Data Exchange (ETDEWEB)

    XAVIER, PATRICK G.; HENRY, TYSON R.; LAFARGE, ROBERT A.; MEIRANS, LILITA; RAY, LAWRENCE P.

    2001-11-01

    The Geometric Search Engine is a software system for storing and searching a database of geometric models. The database maybe searched for modeled objects similar in shape to a target model supplied by the user. The database models are generally from CAD models while the target model may be either a CAD model or a model generated from range data collected from a physical object. This document describes key generation, database layout, and search of the database.

  6. Perceptual Grouping in Haptic Search: The Influence of Proximity, Similarity, and Good Continuation

    Science.gov (United States)

    Overvliet, Krista E.; Krampe, Ralf Th.; Wagemans, Johan

    2012-01-01

    We conducted a haptic search experiment to investigate the influence of the Gestalt principles of proximity, similarity, and good continuation. We expected faster search when the distractors could be grouped. We chose edges at different orientations as stimuli because they are processed similarly in the haptic and visual modality. We therefore…

  7. Searching the protein structure database for ligand-binding site similarities using CPASS v.2

    Directory of Open Access Journals (Sweden)

    Caprez Adam

    2011-01-01

    Full Text Available Abstract Background A recent analysis of protein sequences deposited in the NCBI RefSeq database indicates that ~8.5 million protein sequences are encoded in prokaryotic and eukaryotic genomes, where ~30% are explicitly annotated as "hypothetical" or "uncharacterized" protein. Our Comparison of Protein Active-Site Structures (CPASS v.2 database and software compares the sequence and structural characteristics of experimentally determined ligand binding sites to infer a functional relationship in the absence of global sequence or structure similarity. CPASS is an important component of our Functional Annotation Screening Technology by NMR (FAST-NMR protocol and has been successfully applied to aid the annotation of a number of proteins of unknown function. Findings We report a major upgrade to our CPASS software and database that significantly improves its broad utility. CPASS v.2 is designed with a layered architecture to increase flexibility and portability that also enables job distribution over the Open Science Grid (OSG to increase speed. Similarly, the CPASS interface was enhanced to provide more user flexibility in submitting a CPASS query. CPASS v.2 now allows for both automatic and manual definition of ligand-binding sites and permits pair-wise, one versus all, one versus list, or list versus list comparisons. Solvent accessible surface area, ligand root-mean square difference, and Cβ distances have been incorporated into the CPASS similarity function to improve the quality of the results. The CPASS database has also been updated. Conclusions CPASS v.2 is more than an order of magnitude faster than the original implementation, and allows for multiple simultaneous job submissions. Similarly, the CPASS database of ligand-defined binding sites has increased in size by ~ 38%, dramatically increasing the likelihood of a positive search result. The modification to the CPASS similarity function is effective in reducing CPASS similarity scores

  8. MUSE: An Efficient and Accurate Verifiable Privacy-Preserving Multikeyword Text Search over Encrypted Cloud Data

    Directory of Open Access Journals (Sweden)

    Zhu Xiangyang

    2017-01-01

    Full Text Available With the development of cloud computing, services outsourcing in clouds has become a popular business model. However, due to the fact that data storage and computing are completely outsourced to the cloud service provider, sensitive data of data owners is exposed, which could bring serious privacy disclosure. In addition, some unexpected events, such as software bugs and hardware failure, could cause incomplete or incorrect results returned from clouds. In this paper, we propose an efficient and accurate verifiable privacy-preserving multikeyword text search over encrypted cloud data based on hierarchical agglomerative clustering, which is named MUSE. In order to improve the efficiency of text searching, we proposed a novel index structure, HAC-tree, which is based on a hierarchical agglomerative clustering method and tends to gather the high-relevance documents in clusters. Based on the HAC-tree, a noncandidate pruning depth-first search algorithm is proposed, which can filter the unqualified subtrees and thus accelerate the search process. The secure inner product algorithm is used to encrypted the HAC-tree index and the query vector. Meanwhile, a completeness verification algorithm is given to verify search results. Experiment results demonstrate that the proposed method outperforms the existing works, DMRS and MRSE-HCI, in efficiency and accuracy, respectively.

  9. δ-Similar Elimination to Enhance Search Performance of Multiobjective Evolutionary Algorithms

    Science.gov (United States)

    Aguirre, Hernán; Sato, Masahiko; Tanaka, Kiyoshi

    In this paper, we propose δ-similar elimination to improve the search performance of multiobjective evolutionary algorithms in combinatorial optimization problems. This method eliminates similar individuals in objective space to fairly distribute selection among the different regions of the instantaneous Pareto front. We investigate four eliminating methods analyzing their effects using NSGA-II. In addition, we compare the search performance of NSGA-II enhanced by our method and NSGA-II enhanced by controlled elitism.

  10. Efficient and accurate nearest neighbor and closest pair search in high-dimensional space

    KAUST Repository

    Tao, Yufei

    2010-07-01

    Nearest Neighbor (NN) search in high-dimensional space is an important problem in many applications. From the database perspective, a good solution needs to have two properties: (i) it can be easily incorporated in a relational database, and (ii) its query cost should increase sublinearly with the dataset size, regardless of the data and query distributions. Locality-Sensitive Hashing (LSH) is a well-known methodology fulfilling both requirements, but its current implementations either incur expensive space and query cost, or abandon its theoretical guarantee on the quality of query results. Motivated by this, we improve LSH by proposing an access method called the Locality-Sensitive B-tree (LSB-tree) to enable fast, accurate, high-dimensional NN search in relational databases. The combination of several LSB-trees forms a LSB-forest that has strong quality guarantees, but improves dramatically the efficiency of the previous LSH implementation having the same guarantees. In practice, the LSB-tree itself is also an effective index which consumes linear space, supports efficient updates, and provides accurate query results. In our experiments, the LSB-tree was faster than: (i) iDistance (a famous technique for exact NN search) by two orders ofmagnitude, and (ii) MedRank (a recent approximate method with nontrivial quality guarantees) by one order of magnitude, and meanwhile returned much better results. As a second step, we extend our LSB technique to solve another classic problem, called Closest Pair (CP) search, in high-dimensional space. The long-term challenge for this problem has been to achieve subquadratic running time at very high dimensionalities, which fails most of the existing solutions. We show that, using a LSB-forest, CP search can be accomplished in (worst-case) time significantly lower than the quadratic complexity, yet still ensuring very good quality. In practice, accurate answers can be found using just two LSB-trees, thus giving a substantial

  11. SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters

    Directory of Open Access Journals (Sweden)

    Lefkowitz Elliot J

    2004-10-01

    Full Text Available Abstract Background Large-scale sequence comparison is a powerful tool for biological inference in modern molecular biology. Comparing new sequences to those in annotated databases is a useful source of functional and structural information about these sequences. Using software such as the basic local alignment search tool (BLAST or HMMPFAM to identify statistically significant matches between newly sequenced segments of genetic material and those in databases is an important task for most molecular biologists. Searching algorithms are intrinsically slow and data-intensive, especially in light of the rapid growth of biological sequence databases due to the emergence of high throughput DNA sequencing techniques. Thus, traditional bioinformatics tools are impractical on PCs and even on dedicated UNIX servers. To take advantage of larger databases and more reliable methods, high performance computation becomes necessary. Results We describe the implementation of SS-Wrapper (Similarity Search Wrapper, a package of wrapper applications that can parallelize similarity search applications on a Linux cluster. Our wrapper utilizes a query segmentation-search (QS-search approach to parallelize sequence database search applications. It takes into consideration load balancing between each node on the cluster to maximize resource usage. QS-search is designed to wrap many different search tools, such as BLAST and HMMPFAM using the same interface. This implementation does not alter the original program, so newly obtained programs and program updates should be accommodated easily. Benchmark experiments using QS-search to optimize BLAST and HMMPFAM showed that QS-search accelerated the performance of these programs almost linearly in proportion to the number of CPUs used. We have also implemented a wrapper that utilizes a database segmentation approach (DS-BLAST that provides a complementary solution for BLAST searches when the database is too large to fit into

  12. Efficient Similarity Search Using the Earth Mover's Distance for Large Multimedia Databases

    DEFF Research Database (Denmark)

    Assent, Ira; Wichterich, Marc; Meisen, Tobias

    2008-01-01

    Multimedia similarity search in large databases requires efficient query processing. The Earth mover's distance, introduced in computer vision, is successfully used as a similarity model in a number of small-scale applications. Its computational complexity hindered its adoption in large multimedia...

  13. MEASURING THE PERFORMANCE OF SIMILARITY PROPAGATION IN AN SEMANTIC SEARCH ENGINE

    Directory of Open Access Journals (Sweden)

    S. K. Jayanthi

    2013-10-01

    Full Text Available In the current scenario, web page result personalization is playing a vital role. Nearly 80 % of the users expect the best results in the first page itself without having any persistence to browse longer in URL mode. This research work focuses on two main themes: Semantic web search through online and Domain based search through offline. The first part is to find an effective method which allows grouping similar results together using BookShelf Data Structure and organizing the various clusters. The second one is focused on the academic domain based search through offline. This paper focuses on finding documents which are similar and how Vector space can be used to solve it. So more weightage is given for the principles and working methodology of similarity propagation. Cosine similarity measure is used for finding the relevancy among the documents.

  14. Efficient Retrieval of Images for Search Engine by Visual Similarity and Re Ranking

    Directory of Open Access Journals (Sweden)

    Viswa S S

    2013-06-01

    Full Text Available Nowadays, web scale image search engines (e.g. Google Image Search, Microsoft Live Image Search rely almost purely on surrounding text features. Users type keywords in hope of finding a certain type of images. The search engine returns thousands of images ranked by the text keywords extracted from the surrounding text. However, many of returned images are noisy, disorganized, or irrelevant. Even Google and Microsoft have no Visual Information for searching of images. Using visual information to re rank and improve text based image search results is the idea. This improves the precision of the text based image search ranking by incorporating the information conveyed by the visual modality. The typical assumption that the top- images in the text-based search result are equally relevant is relaxed by linking the relevance of the images to their initial rank positions. Then, a number of images from the initial search result are employed as the prototypes that serve to visually represent the query and that are subsequently used to construct meta re rankers .i.e. The most relevant images are found by visual similarity and the average scores are calculated. By applying different meta re rankers to an image from the initial result, re ranking scores are generated, which are then used to find the new rank position for an image in the re ranked search result. Human supervision is introduced to learn the model weights offline, prior to the online re ranking process. While model learning requires manual labelling of the results for a few queries, the resulting model is query independent and therefore applicable to any other query. The experimental results on a representative web image search dataset comprising 353 queries demonstrate that the proposed method outperforms the existing supervised and unsupervised Re ranking approaches. Moreover, it improves the performance over the text-based image search engine by more than 25.48%.

  15. Similarity

    Science.gov (United States)

    Apostol, Tom M. (Editor)

    1990-01-01

    In this 'Project Mathematics! series, sponsored by the California Institute for Technology (CalTech), the mathematical concept of similarity is presented. he history of and real life applications are discussed using actual film footage and computer animation. Terms used and various concepts of size, shape, ratio, area, and volume are demonstrated. The similarity of polygons, solids, congruent triangles, internal ratios, perimeters, and line segments using the previous mentioned concepts are shown.

  16. A comparison of field-based similarity searching methods: CatShape, FBSS, and ROCS.

    Science.gov (United States)

    Moffat, Kirstin; Gillet, Valerie J; Whittle, Martin; Bravi, Gianpaolo; Leach, Andrew R

    2008-04-01

    Three field-based similarity methods are compared in retrospective virtual screening experiments. The methods are the CatShape module of CATALYST, ROCS, and an in-house program developed at the University of Sheffield called FBSS. The programs are used in both rigid and flexible searches carried out in the MDL Drug Data Report. UNITY 2D fingerprints are also used to provide a comparison with a more traditional approach to similarity searching, and similarity based on simple whole-molecule properties is used to provide a baseline for the more sophisticated searches. Overall, UNITY 2D fingerprints and ROCS with the chemical force field option gave comparable performance and were superior to the shape-only 3D methods. When the flexible methods were compared with the rigid methods, it was generally found that the flexible methods gave slightly better results than their respective rigid methods; however, the increased performance did not justify the additional computational cost required.

  17. Similarities and differences between Web search procedure and searching in the pre-web information retrieval systems

    Directory of Open Access Journals (Sweden)

    Yazdan Mansourian

    2004-08-01

    Full Text Available This paper presents an introductory discussion about the commonalities and dissimilarities between Web searching procedure and the searching process in the previous online information retrieval systems including classic information retrieval systems and database. The paper attempts to explain which factors make these two groups different, why investigating about the search process on the Web environment is important, how much we know about this procedure and what are the main lines of research in front of the researchers in this area of study and practice. After presenting the major involved factor the paper concludes that although information seeking process on the Web is fairly similar to the pre-web systems in some ways, there are notable differences between them as well. These differences may provide Web searcher and Web researchers with some opportunities and challenges.

  18. A New Retrieval Model Based on TextTiling for Document Similarity Search

    Institute of Scientific and Technical Information of China (English)

    Xiao-Jun Wan; Yu-Xin Peng

    2005-01-01

    Document similarity search is to find documents similar to a given query document and return a ranked list of similar documents to users, which is widely used in many text and web systems, such as digital library, search engine,etc. Traditional retrieval models, including the Okapi's BM25 model and the Smart's vector space model with length normalization, could handle this problem to some extent by taking the query document as a long query. In practice,the Cosine measure is considered as the best model for document similarity search because of its good ability to measure similarity between two documents. In this paper, the quantitative performances of the above models are compared using experiments. Because the Cosine measure is not able to reflect the structural similarity between documents, a new retrieval model based on TextTiling is proposed in the paper. The proposed model takes into account the subtopic structures of documents. It first splits the documents into text segments with TextTiling and calculates the similarities for different pairs of text segments in the documents. Lastly the overall similarity between the documents is returned by combining the similarities of different pairs of text segments with optimal matching method. Experiments are performed and results show:1) the popular retrieval models (the Okapi's BM25 model and the Smart's vector space model with length normalization)do not perform well for document similarity search; 2) the proposed model based on TextTiling is effective and outperforms other models, including the Cosine measure; 3) the methods for the three components in the proposed model are validated to be appropriately employed.

  19. Molecular fingerprint recombination: generating hybrid fingerprints for similarity searching from different fingerprint types.

    Science.gov (United States)

    Nisius, Britta; Bajorath, Jürgen

    2009-11-01

    Molecular fingerprints have a long history in computational medicinal chemistry and continue to be popular tools for similarity searching. Over the years, a variety of fingerprint types have been introduced. We report an approach to identify preferred bit subsets in fingerprints of different design and "recombine" these bit segments into "hybrid fingerprints". These compound class-directed fingerprint representations are found to increase the similarity search performance of their parental fingerprints, which can be rationalized by the often complementary nature of distinct fingerprint features.

  20. Web Image Search Re-ranking with Click-based Similarity and Typicality.

    Science.gov (United States)

    Yang, Xiaopeng; Mei, Tao; Zhang, Yong Dong; Liu, Jie; Satoh, Shin'ichi

    2016-07-20

    In image search re-ranking, besides the well known semantic gap, intent gap, which is the gap between the representation of users' query/demand and the real intent of the users, is becoming a major problem restricting the development of image retrieval. To reduce human effects, in this paper, we use image click-through data, which can be viewed as the "implicit feedback" from users, to help overcome the intention gap, and further improve the image search performance. Generally, the hypothesis visually similar images should be close in a ranking list and the strategy images with higher relevance should be ranked higher than others are widely accepted. To obtain satisfying search results, thus, image similarity and the level of relevance typicality are determinate factors correspondingly. However, when measuring image similarity and typicality, conventional re-ranking approaches only consider visual information and initial ranks of images, while overlooking the influence of click-through data. This paper presents a novel re-ranking approach, named spectral clustering re-ranking with click-based similarity and typicality (SCCST). First, to learn an appropriate similarity measurement, we propose click-based multi-feature similarity learning algorithm (CMSL), which conducts metric learning based on clickbased triplets selection, and integrates multiple features into a unified similarity space via multiple kernel learning. Then based on the learnt click-based image similarity measure, we conduct spectral clustering to group visually and semantically similar images into same clusters, and get the final re-rank list by calculating click-based clusters typicality and withinclusters click-based image typicality in descending order. Our experiments conducted on two real-world query-image datasets with diverse representative queries show that our proposed reranking approach can significantly improve initial search results, and outperform several existing re-ranking approaches.

  1. Similarity-based search of model organism, disease and drug effect phenotypes

    KAUST Repository

    Hoehndorf, Robert

    2015-02-19

    Background: Semantic similarity measures over phenotype ontologies have been demonstrated to provide a powerful approach for the analysis of model organism phenotypes, the discovery of animal models of human disease, novel pathways, gene functions, druggable therapeutic targets, and determination of pathogenicity. Results: We have developed PhenomeNET 2, a system that enables similarity-based searches over a large repository of phenotypes in real-time. It can be used to identify strains of model organisms that are phenotypically similar to human patients, diseases that are phenotypically similar to model organism phenotypes, or drug effect profiles that are similar to the phenotypes observed in a patient or model organism. PhenomeNET 2 is available at http://aber-owl.net/phenomenet. Conclusions: Phenotype-similarity searches can provide a powerful tool for the discovery and investigation of molecular mechanisms underlying an observed phenotypic manifestation. PhenomeNET 2 facilitates user-defined similarity searches and allows researchers to analyze their data within a large repository of human, mouse and rat phenotypes.

  2. Using homology relations within a database markedly boosts protein sequence similarity search.

    Science.gov (United States)

    Tong, Jing; Sadreyev, Ruslan I; Pei, Jimin; Kinch, Lisa N; Grishin, Nick V

    2015-06-02

    Inference of homology from protein sequences provides an essential tool for analyzing protein structure, function, and evolution. Current sequence-based homology search methods are still unable to detect many similarities evident from protein spatial structures. In computer science a search engine can be improved by considering networks of known relationships within the search database. Here, we apply this idea to protein-sequence-based homology search and show that it dramatically enhances the search accuracy. Our new method, COMPADRE (COmparison of Multiple Protein sequence Alignments using Database RElationships) assesses the relationship between the query sequence and a hit in the database by considering the similarity between the query and hit's known homologs. This approach increases detection quality, boosting the precision rate from 18% to 83% at half-coverage of all database homologs. The increased precision rate allows detection of a large fraction of protein structural relationships, thus providing structure and function predictions for previously uncharacterized proteins. Our results suggest that this general approach is applicable to a wide variety of methods for detection of biological similarities. The web server is available at prodata.swmed.edu/compadre.

  3. SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences

    Directory of Open Access Journals (Sweden)

    Chen Ke

    2008-05-01

    Full Text Available Abstract Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is

  4. RScan: fast searching structural similarities for structured RNAs in large databases

    Directory of Open Access Journals (Sweden)

    Liu Guo-Ping

    2007-07-01

    Full Text Available Abstract Background Many RNAs have evolutionarily conserved secondary structures instead of primary sequences. Recently, there are an increasing number of methods being developed with focus on the structural alignments for finding conserved secondary structures as well as common structural motifs in pair-wise or multiple sequences. A challenging task is to search similar structures quickly for structured RNA sequences in large genomic databases since existing methods are too slow to be used in large databases. Results An implementation of a fast structural alignment algorithm, RScan, is proposed to fulfill the task. RScan is developed by levering the advantages of both hashing algorithms and local alignment algorithms. In our experiment, on the average, the times for searching a tRNA and an rRNA in the randomized A. pernix genome are only 256 seconds and 832 seconds respectively by using RScan, but need 3,178 seconds and 8,951 seconds respectively by using an existing method RSEARCH. Remarkably, RScan can handle large database queries, taking less than 4 minutes for searching similar structures for a microRNA precursor in human chromosome 21. Conclusion These results indicate that RScan is a preferable choice for real-life application of searching structural similarities for structured RNAs in large databases. RScan software is freely available at http://bioinfo.au.tsinghua.edu.cn/member/cxue/rscan/RScan.htm.

  5. CMASA: an accurate algorithm for detecting local protein structural similarity and its application to enzyme catalytic site annotation

    Directory of Open Access Journals (Sweden)

    Li Gong-Hua

    2010-08-01

    Full Text Available Abstract Background The rapid development of structural genomics has resulted in many "unknown function" proteins being deposited in Protein Data Bank (PDB, thus, the functional prediction of these proteins has become a challenge for structural bioinformatics. Several sequence-based and structure-based methods have been developed to predict protein function, but these methods need to be improved further, such as, enhancing the accuracy, sensitivity, and the computational speed. Here, an accurate algorithm, the CMASA (Contact MAtrix based local Structural Alignment algorithm, has been developed to predict unknown functions of proteins based on the local protein structural similarity. This algorithm has been evaluated by building a test set including 164 enzyme families, and also been compared to other methods. Results The evaluation of CMASA shows that the CMASA is highly accurate (0.96, sensitive (0.86, and fast enough to be used in the large-scale functional annotation. Comparing to both sequence-based and global structure-based methods, not only the CMASA can find remote homologous proteins, but also can find the active site convergence. Comparing to other local structure comparison-based methods, the CMASA can obtain the better performance than both FFF (a method using geometry to predict protein function and SPASM (a local structure alignment method; and the CMASA is more sensitive than PINTS and is more accurate than JESS (both are local structure alignment methods. The CMASA was applied to annotate the enzyme catalytic sites of the non-redundant PDB, and at least 166 putative catalytic sites have been suggested, these sites can not be observed by the Catalytic Site Atlas (CSA. Conclusions The CMASA is an accurate algorithm for detecting local protein structural similarity, and it holds several advantages in predicting enzyme active sites. The CMASA can be used in large-scale enzyme active site annotation. The CMASA can be available by the

  6. Manifold Learning for Multivariate Variable-Length Sequences With an Application to Similarity Search.

    Science.gov (United States)

    Ho, Shen-Shyang; Dai, Peng; Rudzicz, Frank

    2016-06-01

    Multivariate variable-length sequence data are becoming ubiquitous with the technological advancement in mobile devices and sensor networks. Such data are difficult to compare, visualize, and analyze due to the nonmetric nature of data sequence similarity measures. In this paper, we propose a general manifold learning framework for arbitrary-length multivariate data sequences driven by similarity/distance (parameter) learning in both the original data sequence space and the learned manifold. Our proposed algorithm transforms the data sequences in a nonmetric data sequence space into feature vectors in a manifold that preserves the data sequence space structure. In particular, the feature vectors in the manifold representing similar data sequences remain close to one another and far from the feature points corresponding to dissimilar data sequences. To achieve this objective, we assume a semisupervised setting where we have knowledge about whether some of data sequences are similar or dissimilar, called the instance-level constraints. Using this information, one learns the similarity measure for the data sequence space and the distance measures for the manifold. Moreover, we describe an approach to handle the similarity search problem given user-defined instance level constraints in the learned manifold using a consensus voting scheme. Experimental results on both synthetic data and real tropical cyclone sequence data are presented to demonstrate the feasibility of our manifold learning framework and the robustness of performing similarity search in the learned manifold.

  7. Software Suite for Gene and Protein Annotation Prediction and Similarity Search.

    Science.gov (United States)

    Chicco, Davide; Masseroli, Marco

    2015-01-01

    In the computational biology community, machine learning algorithms are key instruments for many applications, including the prediction of gene-functions based upon the available biomolecular annotations. Additionally, they may also be employed to compute similarity between genes or proteins. Here, we describe and discuss a software suite we developed to implement and make publicly available some of such prediction methods and a computational technique based upon Latent Semantic Indexing (LSI), which leverages both inferred and available annotations to search for semantically similar genes. The suite consists of three components. BioAnnotationPredictor is a computational software module to predict new gene-functions based upon Singular Value Decomposition of available annotations. SimilBio is a Web module that leverages annotations available or predicted by BioAnnotationPredictor to discover similarities between genes via LSI. The suite includes also SemSim, a new Web service built upon these modules to allow accessing them programmatically. We integrated SemSim in the Bio Search Computing framework (http://www.bioinformatics.deib. polimi.it/bio-seco/seco/), where users can exploit the Search Computing technology to run multi-topic complex queries on multiple integrated Web services. Accordingly, researchers may obtain ranked answers involving the computation of the functional similarity between genes in support of biomedical knowledge discovery.

  8. NSSRF: global network similarity search with subgraph signatures and its applications.

    Science.gov (United States)

    Zhang, Jiao; Kwong, Sam; Jia, Yuheng; Wong, Ka-Chun

    2017-06-01

    The exponential growth of biological network database has increasingly rendered the global network similarity search (NSS) computationally intensive. Given a query network and a network database, it aims to find out the top similar networks in the database against the query network based on a topological similarity measure of interest. With the advent of big network data, the existing search methods may become unsuitable since some of them could render queries unsuccessful by returning empty answers or arbitrary query restrictions. Therefore, the design of NSS algorithm remains challenging under the dilemma between accuracy and efficiency. We propose a global NSS method based on regression, denotated as NSSRF, which boosts the search speed without any significant sacrifice in practical performance. As motivated from the nature, subgraph signatures are heavily involved. Two phases are proposed in NSSRF: offline model building phase and similarity query phase. In the offline model building phase, the subgraph signatures and cosine similarity scores are used for efficient random forest regression (RFR) model training. In the similarity query phase, the trained regression model is queried to return similar networks. We have extensively validated NSSRF on biological pathways and molecular structures; NSSRF demonstrates competitive performance over the state-of-the-arts. Remarkably, NSSRF works especially well for large networks, which indicates that the proposed approach can be promising in the era of big data. Case studies have proven the efficiencies and uniqueness of NSSRF which could be missed by the existing state-of-the-arts. The source code of two versions of NSSRF are freely available for downloading at https://github.com/zhangjiaobxy/nssrfBinary and https://github.com/zhangjiaobxy/nssrfPackage . kc.w@cityu.edu.hk. Supplementary data are available at Bioinformatics online.

  9. A genetic similarity algorithm for searching the Gene Ontology terms and annotating anonymous protein sequences.

    Science.gov (United States)

    Othman, Razib M; Deris, Safaai; Illias, Rosli M

    2008-02-01

    A genetic similarity algorithm is introduced in this study to find a group of semantically similar Gene Ontology terms. The genetic similarity algorithm combines semantic similarity measure algorithm with parallel genetic algorithm. The semantic similarity measure algorithm is used to compute the similitude strength between the Gene Ontology terms. Then, the parallel genetic algorithm is employed to perform batch retrieval and to accelerate the search in large search space of the Gene Ontology graph. The genetic similarity algorithm is implemented in the Gene Ontology browser named basic UTMGO to overcome the weaknesses of the existing Gene Ontology browsers which use a conventional approach based on keyword matching. To show the applicability of the basic UTMGO, we extend its structure to develop a Gene Ontology -based protein sequence annotation tool named extended UTMGO. The objective of developing the extended UTMGO is to provide a simple and practical tool that is capable of producing better results and requires a reasonable amount of running time with low computing cost specifically for offline usage. The computational results and comparison with other related tools are presented to show the effectiveness of the proposed algorithm and tools.

  10. SW#db: GPU-Accelerated Exact Sequence Similarity Database Search.

    Directory of Open Access Journals (Sweden)

    Matija Korpar

    Full Text Available In recent years we have witnessed a growth in sequencing yield, the number of samples sequenced, and as a result-the growth of publicly maintained sequence databases. The increase of data present all around has put high requirements on protein similarity search algorithms with two ever-opposite goals: how to keep the running times acceptable while maintaining a high-enough level of sensitivity. The most time consuming step of similarity search are the local alignments between query and database sequences. This step is usually performed using exact local alignment algorithms such as Smith-Waterman. Due to its quadratic time complexity, alignments of a query to the whole database are usually too slow. Therefore, the majority of the protein similarity search methods prior to doing the exact local alignment apply heuristics to reduce the number of possible candidate sequences in the database. However, there is still a need for the alignment of a query sequence to a reduced database. In this paper we present the SW#db tool and a library for fast exact similarity search. Although its running times, as a standalone tool, are comparable to the running times of BLAST, it is primarily intended to be used for exact local alignment phase in which the database of sequences has already been reduced. It uses both GPU and CPU parallelization and was 4-5 times faster than SSEARCH, 6-25 times faster than CUDASW++ and more than 20 times faster than SSW at the time of writing, using multiple queries on Swiss-prot and Uniref90 databases.

  11. Applying Statistical Models and Parametric Distance Measures for Music Similarity Search

    Science.gov (United States)

    Lukashevich, Hanna; Dittmar, Christian; Bastuck, Christoph

    Automatic deriving of similarity relations between music pieces is an inherent field of music information retrieval research. Due to the nearly unrestricted amount of musical data, the real-world similarity search algorithms have to be highly efficient and scalable. The possible solution is to represent each music excerpt with a statistical model (ex. Gaussian mixture model) and thus to reduce the computational costs by applying the parametric distance measures between the models. In this paper we discuss the combinations of applying different parametric modelling techniques and distance measures and weigh the benefits of each one against the others.

  12. WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTERN RETRIEVAL ALGORITHM

    Directory of Open Access Journals (Sweden)

    Pushpa C N

    2013-02-01

    Full Text Available Semantic Similarity measures plays an important role in information retrieval, natural language processing and various tasks on web such as relation extraction, community mining, document clustering, and automatic meta-data extraction. In this paper, we have proposed a Pattern Retrieval Algorithm [PRA] to compute the semantic similarity measure between the words by combining both page count method and web snippets method. Four association measures are used to find semantic similarity between words in page count method using web search engines. We use a Sequential Minimal Optimization (SMO support vector machines (SVM to find the optimal combination of page counts-based similarity scores and top-ranking patterns from the web snippets method. The SVM is trained to classify synonymous word-pairs and nonsynonymous word-pairs. The proposed approach aims to improve the Correlation values, Precision, Recall, and F-measures, compared to the existing methods. The proposed algorithm outperforms by 89.8 % of correlation value.

  13. Parallel implementation of 3D protein structure similarity searches using a GPU and the CUDA.

    Science.gov (United States)

    Mrozek, Dariusz; Brożek, Miłosz; Małysiak-Mrozek, Bożena

    2014-02-01

    Searching for similar 3D protein structures is one of the primary processes employed in the field of structural bioinformatics. However, the computational complexity of this process means that it is constantly necessary to search for new methods that can perform such a process faster and more efficiently. Finding molecular substructures that complex protein structures have in common is still a challenging task, especially when entire databases containing tens or even hundreds of thousands of protein structures must be scanned. Graphics processing units (GPUs) and general purpose graphics processing units (GPGPUs) can perform many time-consuming and computationally demanding processes much more quickly than a classical CPU can. In this paper, we describe the GPU-based implementation of the CASSERT algorithm for 3D protein structure similarity searching. This algorithm is based on the two-phase alignment of protein structures when matching fragments of the compared proteins. The GPU (GeForce GTX 560Ti: 384 cores, 2GB RAM) implementation of CASSERT ("GPU-CASSERT") parallelizes both alignment phases and yields an average 180-fold increase in speed over its CPU-based, single-core implementation on an Intel Xeon E5620 (2.40GHz, 4 cores). In this paper, we show that massive parallelization of the 3D structure similarity search process on many-core GPU devices can reduce the execution time of the process, allowing it to be performed in real time. GPU-CASSERT is available at: http://zti.polsl.pl/dmrozek/science/gpucassert/cassert.htm.

  14. Similarity searching and scaffold hopping in synthetically accessible combinatorial chemistry spaces.

    Science.gov (United States)

    Boehm, Markus; Wu, Tong-Ying; Claussen, Holger; Lemmen, Christian

    2008-04-24

    Large collections of combinatorial libraries are an integral element in today's pharmaceutical industry. It is of great interest to perform similarity searches against all virtual compounds that are synthetically accessible by any such library. Here we describe the successful application of a new software tool CoLibri on 358 combinatorial libraries based on validated reaction protocols to create a single chemistry space containing over 10 (12) possible products. Similarity searching with FTrees-FS allows the systematic exploration of this space without the need to enumerate all product structures. The search result is a set of virtual hits which are synthetically accessible by one or more of the existing reaction protocols. Grouping these virtual hits by their synthetic protocols allows the rapid design and synthesis of multiple follow-up libraries. Such library ideas support hit-to-lead design efforts for tasks like follow-up from high-throughput screening hits or scaffold hopping from one hit to another attractive series.

  15. Prospective and retrospective ECG-gating for CT coronary angiography perform similarly accurate at low heart rates

    Energy Technology Data Exchange (ETDEWEB)

    Stolzmann, Paul, E-mail: paul.stolzmann@usz.ch [Institute of Diagnostic Radiology, University Hospital Zurich, Raemistrasse 100, 8091 Zurich (Switzerland); Goetti, Robert; Baumueller, Stephan [Institute of Diagnostic Radiology, University Hospital Zurich, Raemistrasse 100, 8091 Zurich (Switzerland); Plass, Andre; Falk, Volkmar [Clinic for Cardiovascular Surgery, University Hospital Zurich (Switzerland); Scheffel, Hans; Feuchtner, Gudrun; Marincek, Borut [Institute of Diagnostic Radiology, University Hospital Zurich, Raemistrasse 100, 8091 Zurich (Switzerland); Alkadhi, Hatem [Institute of Diagnostic Radiology, University Hospital Zurich, Raemistrasse 100, 8091 Zurich (Switzerland); Cardiac MR PET CT Program, Massachusetts General Hospital and Harvard Medical School, Boston, MA (United States); Leschka, Sebastian [Institute of Diagnostic Radiology, University Hospital Zurich, Raemistrasse 100, 8091 Zurich (Switzerland)

    2011-07-15

    Objective: To compare, in patients with suspicion of coronary artery disease (CAD) and low heart rates, image quality, diagnostic performance, and radiation dose values of prospectively and retrospectively electrocardiography (ECG)-gated dual-source computed tomography coronary angiography (CTCA) for the diagnosis of significant coronary stenoses. Materials and methods: Two-hundred consecutive patients with heart rates {<=}70 bpm were retrospectively enrolled; 100 patients undergoing prospectively ECG-gated CTCA (group 1) and 100 patients undergoing retrospectively-gated CTCA (group 2). Coronary artery segments were assessed for image quality and significant luminal diameter narrowing. Sensitivity, specificity, positive predictive values (PPV), negative predictive values (NPV), and accuracy of both CTCA groups were determined using conventional catheter angiography (CCA) as reference standard. Radiation dose values were calculated. Results: Both groups were comparable regarding gender, body weight, cardiovascular risk profile, severity of CAD, mean heart rate, heart rate variability, and Agatston score (all p > 0.05). There was no significant difference in the rate of non-assessable coronary segments between group 1 (1.6%, 24/1404) and group 2 (1.4%, 19/1385; p = 0.77); non-diagnostic image quality was significantly (p < 0.001) more often attributed to stair step artifacts in group 1. Segment-based sensitivity, specificity, PPV, NPV, and accuracy were 98%, 98%, 88%, 100%, and 100% among group 1; 96%, 99%, 90%, 100%, and 98% among group 2, respectively. Parameters of diagnostic performance were similar (all p > 0.05). Mean effective radiation dose of prospectively ECG-gated CTCA (2.2 {+-} 0.4 mSv) was significantly (p < 0.0001) smaller than that of retrospectively ECG-gated CTCA (8.1 {+-} 0.6 mSv). Conclusion: Prospectively ECG-gated CTCA yields similar image quality, performs as accurately as retrospectively ECG-gated CTCA in patients having heart rates {<=}70 bpm

  16. Effects of multiple conformers per compound upon 3-D similarity search and bioassay data analysis

    Directory of Open Access Journals (Sweden)

    Kim Sunghwan

    2012-11-01

    Full Text Available Abstract Background To improve the utility of PubChem, a public repository containing biological activities of small molecules, the PubChem3D project adds computationally-derived three-dimensional (3-D descriptions to the small-molecule records contained in the PubChem Compound database and provides various search and analysis tools that exploit 3-D molecular similarity. Therefore, the efficient use of PubChem3D resources requires an understanding of the statistical and biological meaning of computed 3-D molecular similarity scores between molecules. Results The present study investigated effects of employing multiple conformers per compound upon the 3-D similarity scores between ten thousand randomly selected biologically-tested compounds (10-K set and between non-inactive compounds in a given biological assay (156-K set. When the “best-conformer-pair” approach, in which a 3-D similarity score between two compounds is represented by the greatest similarity score among all possible conformer pairs arising from a compound pair, was employed with ten diverse conformers per compound, the average 3-D similarity scores for the 10-K set increased by 0.11, 0.09, 0.15, 0.16, 0.07, and 0.18 for STST-opt, CTST-opt, ComboTST-opt, STCT-opt, CTCT-opt, and ComboTCT-opt, respectively, relative to the corresponding averages computed using a single conformer per compound. Interestingly, the best-conformer-pair approach also increased the average 3-D similarity scores for the non-inactive–non-inactive (NN pairs for a given assay, by comparable amounts to those for the random compound pairs, although some assays showed a pronounced increase in the per-assay NN-pair 3-D similarity scores, compared to the average increase for the random compound pairs. Conclusion These results suggest that the use of ten diverse conformers per compound in PubChem bioassay data analysis using 3-D molecular similarity is not expected to increase the separation of non

  17. Semantic similarity measures in the biomedical domain by leveraging a web search engine.

    Science.gov (United States)

    Hsieh, Sheau-Ling; Chang, Wen-Yung; Chen, Chi-Huang; Weng, Yung-Ching

    2013-07-01

    Various researches in web related semantic similarity measures have been deployed. However, measuring semantic similarity between two terms remains a challenging task. The traditional ontology-based methodologies have a limitation that both concepts must be resided in the same ontology tree(s). Unfortunately, in practice, the assumption is not always applicable. On the other hand, if the corpus is sufficiently adequate, the corpus-based methodologies can overcome the limitation. Now, the web is a continuous and enormous growth corpus. Therefore, a method of estimating semantic similarity is proposed via exploiting the page counts of two biomedical concepts returned by Google AJAX web search engine. The features are extracted as the co-occurrence patterns of two given terms P and Q, by querying P, Q, as well as P AND Q, and the web search hit counts of the defined lexico-syntactic patterns. These similarity scores of different patterns are evaluated, by adapting support vector machines for classification, to leverage the robustness of semantic similarity measures. Experimental results validating against two datasets: dataset 1 provided by A. Hliaoutakis; dataset 2 provided by T. Pedersen, are presented and discussed. In dataset 1, the proposed approach achieves the best correlation coefficient (0.802) under SNOMED-CT. In dataset 2, the proposed method obtains the best correlation coefficient (SNOMED-CT: 0.705; MeSH: 0.723) with physician scores comparing with measures of other methods. However, the correlation coefficients (SNOMED-CT: 0.496; MeSH: 0.539) with coder scores received opposite outcomes. In conclusion, the semantic similarity findings of the proposed method are close to those of physicians' ratings. Furthermore, the study provides a cornerstone investigation for extracting fully relevant information from digitizing, free-text medical records in the National Taiwan University Hospital database.

  18. Protein structure alignment and fast similarity search using local shape signatures.

    Science.gov (United States)

    Can, Tolga; Wang, Yuan-Fang

    2004-03-01

    We present a new method for conducting protein structure similarity searches, which improves on the efficiency of some existing techniques. Our method is grounded in the theory of differential geometry on 3D space curve matching. We generate shape signatures for proteins that are invariant, localized, robust, compact, and biologically meaningful. The invariancy of the shape signatures allows us to improve similarity searching efficiency by adopting a hierarchical coarse-to-fine strategy. We index the shape signatures using an efficient hashing-based technique. With the help of this technique we screen out unlikely candidates and perform detailed pairwise alignments only for a small number of candidates that survive the screening process. Contrary to other hashing based techniques, our technique employs domain specific information (not just geometric information) in constructing the hash key, and hence, is more tuned to the domain of biology. Furthermore, the invariancy, localization, and compactness of the shape signatures allow us to utilize a well-known local sequence alignment algorithm for aligning two protein structures. One measure of the efficacy of the proposed technique is that we were able to perform structure alignment queries 36 times faster (on the average) than a well-known method while keeping the quality of the query results at an approximately similar level.

  19. SiMPSON: Efficient Similarity Search in Metric Spaces over P2P Structured Overlay Networks

    Science.gov (United States)

    Vu, Quang Hieu; Lupu, Mihai; Wu, Sai

    Similarity search in metric spaces over centralized systems has been significantly studied in the database research community. However, not so much work has been done in the context of P2P networks. This paper introduces SiMPSON: a P2P system supporting similarity search in metric spaces. The aim is to answer queries faster and using less resources than existing systems. For this, each peer first clusters its own data using any off-the-shelf clustering algorithms. Then, the resulting clusters are mapped to one-dimensional values. Finally, these one-dimensional values are indexed into a structured P2P overlay. Our method slightly increases the indexing overhead, but allows us to greatly reduce the number of peers and messages involved in query processing: we trade a small amount of overhead in the data publishing process for a substantial reduction of costs in the querying phase. Based on this architecture, we propose algorithms for processing range and kNN queries. Extensive experimental results validate the claims of efficiency and effectiveness of SiMPSON.

  20. World Climate Classification and Search: Data Mining Approach Utilizing Dynamic Time Warping Similarity Function

    Science.gov (United States)

    Stepinski, T. F.; Netzel, P.; Jasiewicz, J.

    2014-12-01

    We have developed a novel method for classification and search of climate over the global land surface excluding Antarctica. Our method classifies climate on the basis of the outcome of time series segmentation and clustering. We use WorldClim 30 arc sec. (approx. 1 km) resolution grid data which is based on 50 years of climatic observations. Each cell in a grid is assigned a 12 month series consisting of 50-years monthly averages of mean, maximum, and minimum temperatures as well as the total precipitation. The presented method introduces several innovations with comparison to existing data-driven methods of world climate classifications. First, it uses only climatic rather than bioclimatic data. Second, it employs object-oriented methodology - the grid is first segmented before climatic segments are classified. Third, and most importantly, the similarity between climates in two given cells is performed using the dynamic time warping (DTW) measure instead of the Euclidean distance. The DTW is known to be superior to Euclidean distance for time series, but has not been utilized before in classification of global climate. To account for computational expense of DTW we use highly efficient GeoPAT software (http://sil.uc.edu/gitlist/) that, in the first step, segments the grid into local regions of uniform climate. In the second step, the segments are classified. We also introduce a climate search - a GeoWeb-based method for interactive presentation of global climate information in the form of query-and-retrieval. A user selects a geographical location and the system returns a global map indicating level of similarity between local climates and a climate in the selected location. The results of the search for location: "University of Cincinnati, Main Campus" are presented on attached map. The results of the search for location: "University of Cincinnati, Main Campus" are presented on the map. We have compared the results of our method to Koeppen classification scheme

  1. iSARST: an integrated SARST web server for rapid protein structural similarity searches.

    Science.gov (United States)

    Lo, Wei-Cheng; Lee, Che-Yu; Lee, Chi-Ching; Lyu, Ping-Chiang

    2009-07-01

    iSARST is a web server for efficient protein structural similarity searches. It is a multi-processor, batch-processing and integrated implementation of several structural comparison tools and two database searching methods: SARST for common structural homologs and CPSARST for homologs with circular permutations. iSARST allows users submitting multiple PDB/SCOP entry IDs or an archive file containing many structures. After scanning the target database using SARST/CPSARST, the ordering of hits are refined with conventional structure alignment tools such as FAST, TM-align and SAMO, which are run in a PC cluster. In this way, iSARST achieves a high running speed while preserving the high precision of refinement engines. The final outputs include tables listing co-linear or circularly permuted homologs of the query proteins and a functional summary of the best hits. Superimposed structures can be examined through an interactive and informative visualization tool. iSARST provides the first batch mode structural comparison web service for both co-linear homologs and circular permutants. It can serve as a rapid annotation system for functionally unknown or hypothetical proteins, which are increasing rapidly in this post-genomics era. The server can be accessed at http://sarst.life.nthu.edu.tw/iSARST/.

  2. Massive problem reports mining and analysis based parallelism for similar search

    Science.gov (United States)

    Zhou, Ya; Hu, Cailin; Xiong, Han; Wei, Xiafei; Li, Ling

    2017-05-01

    Massive problem reports and solutions accumulated over time and continuously collected in XML Spreadsheet (XMLSS) format from enterprises and organizations, which record a series of comprehensive description about problems that can help technicians to trace problems and their solutions. It's a significant and challenging issue to effectively manage and analyze these massive semi-structured data to provide similar problem solutions, decisions of immediate problem and assisting product optimization for users during hardware and software maintenance. For this purpose, we build a data management system to manage, mine and analyze these data search results that can be categorized and organized into several categories for users to quickly find out where their interesting results locate. Experiment results demonstrate that this system is better than traditional centralized management system on the performance and the adaptive capability of heterogeneous data greatly. Besides, because of re-extracting topics, it enables each cluster to be described more precise and reasonable.

  3. An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: Sensitivity and Specificity analysis.

    Energy Technology Data Exchange (ETDEWEB)

    Kapp, Eugene; Schutz, Frederick; Connolly, Lisa M.; Chakel, John A.; Meza, Jose E.; Miller, Christine A.; Fenyo, David; Eng, Jimmy K.; Adkins, Joshua N.; Omenn, Gilbert; Simpson, Richard

    2005-08-01

    MS/MS and associated database search algorithms are essential proteomic tools for identifying peptides. Due to their widespread use, it is now time to perform a systematic analysis of the various algorithms currently in use. Using blood specimens used in the HUPO Plasma Proteome Project, we have evaluated five search algorithms with respect to their sensitivity and specificity, and have also accurately benchmarked them based on specified false-positive (FP) rates. Spectrum Mill and SEQUEST performed well in terms of sensitivity, but were inferior to MASCOT, X-Tandem, and Sonar in terms of specificity. Overall, MASCOT, a probabilistic search algorithm, correctly identified most peptides based on a specified FP rate. The rescoring algorithm, Peptide Prophet, enhanced the overall performance of the SEQUEST algorithm, as well as provided predictable FP error rates. Ideally, score thresholds should be calculated for each peptide spectrum or minimally, derived from a reversed-sequence search as demonstrated in this study based on a validated data set. The availability of open-source search algorithms, such as X-Tandem, makes it feasible to further improve the validation process (manual or automatic) on the basis of ''consensus scoring'', i.e., the use of multiple (at least two) search algorithms to reduce the number of FPs. complement.

  4. Gene network homology in prokaryotes using a similarity search approach: queries of quorum sensing signal transduction.

    Directory of Open Access Journals (Sweden)

    David N Quan

    Full Text Available Bacterial cell-cell communication is mediated by small signaling molecules known as autoinducers. Importantly, autoinducer-2 (AI-2 is synthesized via the enzyme LuxS in over 80 species, some of which mediate their pathogenicity by recognizing and transducing this signal in a cell density dependent manner. AI-2 mediated phenotypes are not well understood however, as the means for signal transduction appears varied among species, while AI-2 synthesis processes appear conserved. Approaches to reveal the recognition pathways of AI-2 will shed light on pathogenicity as we believe recognition of the signal is likely as important, if not more, than the signal synthesis. LMNAST (Local Modular Network Alignment Similarity Tool uses a local similarity search heuristic to study gene order, generating homology hits for the genomic arrangement of a query gene sequence. We develop and apply this tool for the E. coli lac and LuxS regulated (Lsr systems. Lsr is of great interest as it mediates AI-2 uptake and processing. Both test searches generated results that were subsequently analyzed through a number of different lenses, each with its own level of granularity, from a binary phylogenetic representation down to trackback plots that preserve genomic organizational information. Through a survey of these results, we demonstrate the identification of orthologs, paralogs, hitchhiking genes, gene loss, gene rearrangement within an operon context, and also horizontal gene transfer (HGT. We found a variety of operon structures that are consistent with our hypothesis that the signal can be perceived and transduced by homologous protein complexes, while their regulation may be key to defining subsequent phenotypic behavior.

  5. HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.

    Science.gov (United States)

    O'Driscoll, Aisling; Belogrudov, Vladislav; Carroll, John; Kropp, Kai; Walsh, Paul; Ghazal, Peter; Sleator, Roy D

    2015-04-01

    The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples. Copyright © 2015 Elsevier Inc. All rights reserved.

  6. PHOG-BLAST – a new generation tool for fast similarity search of protein families

    Directory of Open Access Journals (Sweden)

    Mironov Andrey A

    2006-06-01

    Full Text Available Abstract Background The need to compare protein profiles frequently arises in various protein research areas: comparison of protein families, domain searches, resolution of orthology and paralogy. The existing fast algorithms can only compare a protein sequence with a protein sequence and a profile with a sequence. Algorithms to compare profiles use dynamic programming and complex scoring functions. Results We developed a new algorithm called PHOG-BLAST for fast similarity search of profiles. This algorithm uses profile discretization to convert a profile to a finite alphabet and utilizes hashing for fast search. To determine the optimal alphabet, we analyzed columns in reliable multiple alignments and obtained column clusters in the 20-dimensional profile space by applying a special clustering procedure. We show that the clustering procedure works best if its parameters are chosen so that 20 profile clusters are obtained which can be interpreted as ancestral amino acid residues. With these clusters, only less than 2% of columns in multiple alignments are out of clusters. We tested the performance of PHOG-BLAST vs. PSI-BLAST on three well-known databases of multiple alignments: COG, PFAM and BALIBASE. On the COG database both algorithms showed the same performance, on PFAM and BALIBASE PHOG-BLAST was much superior to PSI-BLAST. PHOG-BLAST required 10–20 times less computer memory and computation time than PSI-BLAST. Conclusion Since PHOG-BLAST can compare multiple alignments of protein families, it can be used in different areas of comparative proteomics and protein evolution. For example, PHOG-BLAST helped to build the PHOG database of phylogenetic orthologous groups. An essential step in building this database was comparing protein complements of different species and orthologous groups of different taxons on a personal computer in reasonable time. When it is applied to detect weak similarity between protein families, PHOG-BLAST is less

  7. PSimScan: algorithm and utility for fast protein similarity search.

    Directory of Open Access Journals (Sweden)

    Anna Kaznadzey

    Full Text Available In the era of metagenomics and diagnostics sequencing, the importance of protein comparison methods of boosted performance cannot be overstated. Here we present PSimScan (Protein Similarity Scanner, a flexible open source protein similarity search tool which provides a significant gain in speed compared to BLASTP at the price of controlled sensitivity loss. The PSimScan algorithm introduces a number of novel performance optimization methods that can be further used by the community to improve the speed and lower hardware requirements of bioinformatics software. The optimization starts at the lookup table construction, then the initial lookup table-based hits are passed through a pipeline of filtering and aggregation routines of increasing computational complexity. The first step in this pipeline is a novel algorithm that builds and selects 'similarity zones' aggregated from neighboring matches on small arrays of adjacent diagonals. PSimScan performs 5 to 100 times faster than the standard NCBI BLASTP, depending on chosen parameters, and runs on commodity hardware. Its sensitivity and selectivity at the slowest settings are comparable to the NCBI BLASTP's and decrease with the increase of speed, yet stay at the levels reasonable for many tasks. PSimScan is most advantageous when used on large collections of query sequences. Comparing the entire proteome of Streptocuccus pneumoniae (2,042 proteins to the NCBI's non-redundant protein database of 16,971,855 records takes 6.5 hours on a moderately powerful PC, while the same task with the NCBI BLASTP takes over 66 hours. We describe innovations in the PSimScan algorithm in considerable detail to encourage bioinformaticians to improve on the tool and to use the innovations in their own software development.

  8. Unbounded Binary Search for a Fast and Accurate Maximum Power Point Tracking

    Science.gov (United States)

    Kim, Yong Sin; Winston, Roland

    2011-12-01

    This paper presents a technique for maximum power point tracking (MPPT) of a concentrating photovoltaic system using cell level power optimization. Perturb and observe (P&O) has been a standard for an MPPT, but it introduces a tradeoff between the tacking speed and the accuracy of the maximum power delivered. The P&O algorithm is not suitable for a rapid environmental condition change by partial shading and self-shading due to its tracking time being linear to the length of the voltage range. Some of researches have been worked on fast tracking but they come with internal ad hoc parameters. In this paper, by using the proposed unbounded binary search algorithm for the MPPT, tracking time becomes a logarithmic function of the voltage search range without ad hoc parameters.

  9. Meta-Storms: efficient search for similar microbial communities based on a novel indexing scheme and similarity score for metagenomic data.

    Science.gov (United States)

    Su, Xiaoquan; Xu, Jian; Ning, Kang

    2012-10-01

    It has long been intriguing scientists to effectively compare different microbial communities (also referred as 'metagenomic samples' here) in a large scale: given a set of unknown samples, find similar metagenomic samples from a large repository and examine how similar these samples are. With the current metagenomic samples accumulated, it is possible to build a database of metagenomic samples of interests. Any metagenomic samples could then be searched against this database to find the most similar metagenomic sample(s). However, on one hand, current databases with a large number of metagenomic samples mostly serve as data repositories that offer few functionalities for analysis; and on the other hand, methods to measure the similarity of metagenomic data work well only for small set of samples by pairwise comparison. It is not yet clear, how to efficiently search for metagenomic samples against a large metagenomic database. In this study, we have proposed a novel method, Meta-Storms, that could systematically and efficiently organize and search metagenomic data. It includes the following components: (i) creating a database of metagenomic samples based on their taxonomical annotations, (ii) efficient indexing of samples in the database based on a hierarchical taxonomy indexing strategy, (iii) searching for a metagenomic sample against the database by a fast scoring function based on quantitative phylogeny and (iv) managing database by index export, index import, data insertion, data deletion and database merging. We have collected more than 1300 metagenomic data from the public domain and in-house facilities, and tested the Meta-Storms method on these datasets. Our experimental results show that Meta-Storms is capable of database creation and effective searching for a large number of metagenomic samples, and it could achieve similar accuracies compared with the current popular significance testing-based methods. Meta-Storms method would serve as a suitable

  10. Application of 3D Zernike descriptors to shape-based ligand similarity searching

    Directory of Open Access Journals (Sweden)

    Venkatraman Vishwesh

    2009-12-01

    Full Text Available Abstract Background The identification of promising drug leads from a large database of compounds is an important step in the preliminary stages of drug design. Although shape is known to play a key role in the molecular recognition process, its application to virtual screening poses significant hurdles both in terms of the encoding scheme and speed. Results In this study, we have examined the efficacy of the alignment independent three-dimensional Zernike descriptor (3DZD for fast shape based similarity searching. Performance of this approach was compared with several other methods including the statistical moments based ultrafast shape recognition scheme (USR and SIMCOMP, a graph matching algorithm that compares atom environments. Three benchmark datasets are used to thoroughly test the methods in terms of their ability for molecular classification, retrieval rate, and performance under the situation that simulates actual virtual screening tasks over a large pharmaceutical database. The 3DZD performed better than or comparable to the other methods examined, depending on the datasets and evaluation metrics used. Reasons for the success and the failure of the shape based methods for specific cases are investigated. Based on the results for the three datasets, general conclusions are drawn with regard to their efficiency and applicability. Conclusion The 3DZD has unique ability for fast comparison of three-dimensional shape of compounds. Examples analyzed illustrate the advantages and the room for improvements for the 3DZD.

  11. Accurate Image Search using Local Descriptors into a Compact Image Representation

    Directory of Open Access Journals (Sweden)

    Soumia Benkrama

    2013-01-01

    Full Text Available Progress in image retrieval by using low-level features, such as colors, textures and shapes, the performance is still unsatisfied as there are existing gaps between low-level features and high-level semantic concepts. In this work, we present an improved implementation for the bag of visual words approach. We propose a image retrieval system based on bag-of-features (BoF model by using scale invariant feature transform (SIFT and speeded up robust features (SURF. In literature SIFT and SURF give of good results. Based on this observation, we decide to use a bag-of-features approach over quaternion zernike moments (QZM. We compare the results of SIFT and SURF with those of QZM. We propose an indexing method for content based search task that aims to retrieve collection of images and returns a ranked list of objects in response to a query image. Experimental results with the Coil-100 and corel-1000 image database, demonstrate that QZM produces a better performance than known representations (SIFT and SURF.

  12. Target-distractor similarity has a larger impact on visual search in school-age children than spacing.

    Science.gov (United States)

    Huurneman, Bianca; Boonstra, F Nienke

    2015-01-22

    In typically developing children, crowding decreases with increasing age. The influence of target-distractor similarity with respect to orientation and element spacing on visual search performance was investigated in 29 school-age children with normal vision (4- to 6-year-olds [N = 16], 7- to 8-year-olds [N = 13]). Children were instructed to search for a target E among distractor Es (feature search: all flanking Es pointing right; conjunction search: flankers in three orientations). Orientation of the target was manipulated in four directions: right (target absent), left (inversed), up, and down (vertical). Spacing was varied in four steps: 0.04°, 0.5°, 1°, and 2°. During feature search, high target-distractor similarity had a stronger impact on performance than spacing: Orientation affected accuracy until spacing was 1°, and spacing only influenced accuracy for identifying inversed targets. Spatial analyses showed that orientation affected oculomotor strategy: Children made more fixations in the "inversed" target area (4.6) than the vertical target areas (1.8 and 1.9). Furthermore, age groups differed in fixation duration: 4- to 6-year-old children showed longer fixation durations than 7- to 8-year-olds at the two largest element spacings (p = 0.039 and p = 0.027). Conjunction search performance was unaffected by spacing. Four conclusions can be drawn from this study: (a) Target-distractor similarity governs visual search performance in school-age children, (b) children make more fixations in target areas when target-distractor similarity is high, (c) 4- to 6-year-olds show longer fixation durations than 7- to 8-year-olds at 1° and 2° element spacing, and (d) spacing affects feature but not conjunction search-a finding that might indicate top-down control ameliorates crowding in children.

  13. Efficient Retrieval of Images for Search Engine by Visual Similarity and Re Ranking

    Directory of Open Access Journals (Sweden)

    Viswa S S

    2013-06-01

    Full Text Available Nowadays, web scale image search engines (e.g.Google Image Search, Microsoft Live ImageSearch rely almost purely on surrounding textfeatures. Users type keywords in hope of finding acertain type of images. The search engine returnsthousands of images ranked by the text keywordsextracted from the surrounding text. However,many of returned images are noisy, disorganized, orirrelevant. Even Google and Microsoft have noVisual Information for searching of images. Usingvisual information to re rank and improve textbased image search results is the idea. Thisimproves the precision of the text based imagesearch ranking by incorporating the informationconveyed by the visual modality.The typicalassumption that the top-images in the text-basedsearch result are equally relevant is relaxed bylinking the relevance of the images to their initialrank positions. Then, a number of images from theinitial search result are employed as the prototypesthat serve to visually represent the query and thatare subsequently used to construct meta re rankers.i.e. The most relevant images are found by visualsimilarity and the average scores are calculated. Byapplying different meta re rankers to an image fromthe initial result, re ranking scores are generated,which are then used to find the new rank positionfor an image in the re ranked search result.Humansupervision is introduced to learn the model weightsoffline, prior to the online re ranking process. Whilemodel learning requires manual labelling of theresults for a few queries, the resulting model isquery independent and therefore applicable to anyother query. The experimental results on arepresentative web image search dataset comprising353 queries demonstrate that the proposed methodoutperforms the existing supervised andunsupervised Re ranking approaches. Moreover, itimproves the performance over the text-based imagesearch engine by morethan 25.48%

  14. In Search of an Accurate Evaluation of Intrahepatic Cholestasis of Pregnancy

    Directory of Open Access Journals (Sweden)

    Manuela Martinefski

    2012-01-01

    Full Text Available Until now, biochemical parameter for diagnosis of intrahepatic cholestasis of pregnancy (ICP mostly used is the rise of total serum bile acids (TSBA above the upper normal limit of 11 μM. However, differential diagnosis is very difficult since overlapped values calculated on bile acids determinations, are observed in different conditions of pregnancy including the benign condition of pruritus gravidarum. The aim of this work was to determine the better markers in ICP for a precise diagnosis together with parameters associated with severity of symptoms and treatment evaluation. Serum bile acid profiles were evaluated using capillary electrophoresis in 38 healthy pregnant women and 32 ICP patients and it was calculated the sensitivity, specificity, accuracy, predictive values and the relationships of certain individual bile acids in pregnant women in order to replace TSBA determinations. The evaluation of the results shows that LCA and UDCA/LCA ratio provided information for a more complete and accurate diagnosis and evaluation of ICP than calculation of solely TSBA levels in pregnant women.

  15. Accurate protein structure annotation through competitive diffusion of enzymatic functions over a network of local evolutionary similarities.

    Directory of Open Access Journals (Sweden)

    Eric Venner

    Full Text Available High-throughput Structural Genomics yields many new protein structures without known molecular function. This study aims to uncover these missing annotations by globally comparing select functional residues across the structural proteome. First, Evolutionary Trace Annotation, or ETA, identifies which proteins have local evolutionary and structural features in common; next, these proteins are linked together into a proteomic network of ETA similarities; then, starting from proteins with known functions, competing functional labels diffuse link-by-link over the entire network. Every node is thus assigned a likelihood z-score for every function, and the most significant one at each node wins and defines its annotation. In high-throughput controls, this competitive diffusion process recovered enzyme activity annotations with 99% and 97% accuracy at half-coverage for the third and fourth Enzyme Commission (EC levels, respectively. This corresponds to false positive rates 4-fold lower than nearest-neighbor and 5-fold lower than sequence-based annotations. In practice, experimental validation of the predicted carboxylesterase activity in a protein from Staphylococcus aureus illustrated the effectiveness of this approach in the context of an increasingly drug-resistant microbe. This study further links molecular function to a small number of evolutionarily important residues recognizable by Evolutionary Tracing and it points to the specificity and sensitivity of functional annotation by competitive global network diffusion. A web server is at http://mammoth.bcm.tmc.edu/networks.

  16. Accurate protein structure annotation through competitive diffusion of enzymatic functions over a network of local evolutionary similarities.

    Science.gov (United States)

    Venner, Eric; Lisewski, Andreas Martin; Erdin, Serkan; Ward, R Matthew; Amin, Shivas R; Lichtarge, Olivier

    2010-12-13

    High-throughput Structural Genomics yields many new protein structures without known molecular function. This study aims to uncover these missing annotations by globally comparing select functional residues across the structural proteome. First, Evolutionary Trace Annotation, or ETA, identifies which proteins have local evolutionary and structural features in common; next, these proteins are linked together into a proteomic network of ETA similarities; then, starting from proteins with known functions, competing functional labels diffuse link-by-link over the entire network. Every node is thus assigned a likelihood z-score for every function, and the most significant one at each node wins and defines its annotation. In high-throughput controls, this competitive diffusion process recovered enzyme activity annotations with 99% and 97% accuracy at half-coverage for the third and fourth Enzyme Commission (EC) levels, respectively. This corresponds to false positive rates 4-fold lower than nearest-neighbor and 5-fold lower than sequence-based annotations. In practice, experimental validation of the predicted carboxylesterase activity in a protein from Staphylococcus aureus illustrated the effectiveness of this approach in the context of an increasingly drug-resistant microbe. This study further links molecular function to a small number of evolutionarily important residues recognizable by Evolutionary Tracing and it points to the specificity and sensitivity of functional annotation by competitive global network diffusion. A web server is at http://mammoth.bcm.tmc.edu/networks.

  17. Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction

    DEFF Research Database (Denmark)

    Wichterich, Marc; Assent, Ira; Philipp, Kranen

    2008-01-01

    The Earth Mover's Distance (EMD) was developed in computer vision as a flexible similarity model that utilizes similarities in feature space to define a high quality similarity measure in feature representation space. It has been successfully adopted in a multitude of applications with low to med...

  18. Finding and Reusing Learning Materials with Multimedia Similarity Search and Social Networks

    Science.gov (United States)

    Little, Suzanne; Ferguson, Rebecca; Ruger, Stefan

    2012-01-01

    The authors describe how content-based multimedia search technologies can be used to help learners find new materials and learning pathways by identifying semantic relationships between educational resources in a social learning network. This helps users--both learners and educators--to explore and find material to support their learning aims.…

  19. Finding and Reusing Learning Materials with Multimedia Similarity Search and Social Networks

    Science.gov (United States)

    Little, Suzanne; Ferguson, Rebecca; Ruger, Stefan

    2012-01-01

    The authors describe how content-based multimedia search technologies can be used to help learners find new materials and learning pathways by identifying semantic relationships between educational resources in a social learning network. This helps users--both learners and educators--to explore and find material to support their learning aims.…

  20. Target-distractor similarity has a larger impact on visual search in school-age children than spacing

    NARCIS (Netherlands)

    Huurneman, B.; Boonstra, F.N.

    2015-01-01

    In typically developing children, crowding decreases with increasing age. The influence of target-distractor similarity with respect to orientation and element spacing on visual search performance was investigated in 29 school-age children with normal vision (4- to 6-year-olds [N = 16], 7- to 8-year

  1. Breast cancer stories on the internet : improving search facilities to help patients find stories of similar others

    NARCIS (Netherlands)

    Overberg, Regina Ingrid

    2013-01-01

    The primary aim of this thesis is to gain insight into which search facilities for spontaneously published stories facilitate breast cancer patients in finding stories by other patients in a similar situation. According to the narrative approach, social comparison theory, and social cognitive theory

  2. Application of belief theory to similarity data fusion for use in analog searching and lead hopping.

    Science.gov (United States)

    Muchmore, Steven W; Debe, Derek A; Metz, James T; Brown, Scott P; Martin, Yvonne C; Hajduk, Philip J

    2008-05-01

    A wide variety of computational algorithms have been developed that strive to capture the chemical similarity between two compounds for use in virtual screening and lead discovery. One limitation of such approaches is that, while a returned similarity value reflects the perceived degree of relatedness between any two compounds, there is no direct correlation between this value and the expectation or confidence that any two molecules will in fact be equally active. A lack of a common framework for interpretation of similarity measures also confounds the reliable fusion of information from different algorithms. Here, we present a probabilistic framework for interpreting similarity measures that directly correlates the similarity value to a quantitative expectation that two molecules will in fact be equipotent. The approach is based on extensive benchmarking of 10 different similarity methods (MACCS keys, Daylight fingerprints, maximum common subgraphs, rapid overlay of chemical structures (ROCS) shape similarity, and six connectivity-based fingerprints) against a database of more than 150,000 compounds with activity data against 23 protein targets. Given this unified and probabilistic framework for interpreting chemical similarity, principles derived from decision theory can then be applied to combine the evidence from different similarity measures in such a way that both capitalizes on the strengths of the individual approaches and maintains a quantitative estimate of the likelihood that any two molecules will exhibit similar biological activity.

  3. RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data.

    Science.gov (United States)

    Zhao, Yongan; Tang, Haixu; Ye, Yuzhen

    2012-01-01

    With the wide application of next-generation sequencing (NGS) techniques, fast tools for protein similarity search that scale well to large query datasets and large databases are highly desirable. In a previous work, we developed RAPSearch, an algorithm that achieved a ~20-90-fold speedup relative to BLAST while still achieving similar levels of sensitivity for short protein fragments derived from NGS data. RAPSearch, however, requires a substantial memory footprint to identify alignment seeds, due to its use of a suffix array data structure. Here we present RAPSearch2, a new memory-efficient implementation of the RAPSearch algorithm that uses a collision-free hash table to index a similarity search database. The utilization of an optimized data structure further speeds up the similarity search-another 2-3 times. We also implemented multi-threading in RAPSearch2, and the multi-thread modes achieve significant acceleration (e.g. 3.5X for 4-thread mode). RAPSearch2 requires up to 2G memory when running in single thread mode, or up to 3.5G memory when running in 4-thread mode. Implemented in C++, the source code is freely available for download at the RAPSearch2 website: http://omics.informatics.indiana.edu/mg/RAPSearch2/. yye@indiana.edu Available at the RAPSearch2 website.

  4. FSim: A Novel Functional Similarity Search Algorithm and Tool for Discovering Functionally Related Gene Products

    Directory of Open Access Journals (Sweden)

    Qiang Hu

    2014-01-01

    Full Text Available Background. During the analysis of genomics data, it is often required to quantify the functional similarity of genes and their products based on the annotation information from gene ontology (GO with hierarchical structure. A flexible and user-friendly way to estimate the functional similarity of genes utilizing GO annotation is therefore highly desired. Results. We proposed a novel algorithm using a level coefficient-weighted model to measure the functional similarity of gene products based on multiple ontologies of hierarchical GO annotations. The performance of our algorithm was evaluated and found to be superior to the other tested methods. We implemented the proposed algorithm in a software package, FSim, based on R statistical and computing environment. It can be used to discover functionally related genes for a given gene, group of genes, or set of function terms. Conclusions. FSim is a flexible tool to analyze functional gene groups based on the GO annotation databases.

  5. SHOP: receptor-based scaffold hopping by GRID-based similarity searches

    DEFF Research Database (Denmark)

    Bergmann, Rikke; Liljefors, Tommy; Sørensen, Morten D

    2009-01-01

    A new field-derived 3D method for receptor-based scaffold hopping, implemented in the software SHOP, is presented. Information from a protein-ligand complex is utilized to substitute a fragment of the ligand with another fragment from a database of synthetically accessible scaffolds. A GRID......-based interaction profile of the receptor and geometrical descriptions of a ligand scaffold are used to obtain new scaffolds with different structural features and are able to replace the original scaffold in the protein-ligand complex. An enrichment study was successfully performed verifying the ability of SHOP...... to find known active CDK2 scaffolds in a database. Additionally, SHOP was used for suggesting new inhibitors of p38 MAP kinase. Four p38 complexes were used to perform six scaffold searches. Several new scaffolds were suggested, and the resulting compounds were successfully docked into the query proteins....

  6. Proposal for a Similar Question Search System on a Q&A Site

    Directory of Open Access Journals (Sweden)

    Katsutoshi Kanamori

    2014-06-01

    Full Text Available There is a service to help Internet users obtain answers to specific questions when they visit a Q&A site. A Q&A site is very useful for the Internet user, but posted questions are often not answered immediately. This delay in answering occurs because in most cases another site user is answering the question manually. In this study, we propose a system that can present a question that is similar to a question posted by a user. An advantage of this system is that a user can refer to an answer to a similar question. This research measures the similarity of a candidate question based on word and dependency parsing. In an experiment, we examined the effectiveness of the proposed system for questions actually posted on the Q&A site. The result indicates that the system can show the questioner the answer to a similar question. However, the system still has a number of aspects that should be improved.

  7. Development of a fingerprint reduction approach for Bayesian similarity searching based on Kullback-Leibler divergence analysis.

    Science.gov (United States)

    Nisius, Britta; Vogt, Martin; Bajorath, Jürgen

    2009-06-01

    The contribution of individual fingerprint bit positions to similarity search performance is systematically evaluated. A method is introduced to determine bit significance on the basis of Kullback-Leibler divergence analysis of bit distributions in active and database compounds. Bit divergence analysis and Bayesian compound screening share a common methodological foundation. Hence, given the significance ranking of all individual bit positions comprising a fingerprint, subsets of bits are evaluated in the context of Bayesian screening, and minimal fingerprint representations are determined that meet or exceed the search performance of unmodified fingerprints. For fingerprints of different design evaluated on many compound activity classes, we consistently find that subsets of fingerprint bit positions are responsible for search performance. In part, these subsets are very small and contain in some cases only a few fingerprint bit positions. Structural or pharmacophore patterns captured by preferred bit positions can often be directly associated with characteristic features of active compounds. In some cases, reduced fingerprint representations clearly exceed the search performance of the original fingerprints. Thus, fingerprint reduction likely represents a promising approach for practical applications.

  8. SIMILARITY SEARCH FOR TRAJECTORIES OF RFID TAGS IN SUPPLY CHAIN TRAFFIC

    Directory of Open Access Journals (Sweden)

    Sabu Augustine

    2016-06-01

    Full Text Available In this fast developing period the use of RFID have become more significant in many application domaindue to drastic cut down in the price of the RFID tags. This technology is evolving as a means of tracking objects and inventory items. One such diversified application domain is in Supply Chain Management where RFID is being applied as the manufacturers and distributers need to analyse product and logistic information in order to get the right quantity of products arriving at the right time to the right locations. Usually the RFID tag information collected from RFID readers is stored in remote database and the RFID data is being analyzed by querying data from this database based on path encoding method by the property of prime numbers. In this paper we propose an improved encoding scheme that encodes the flows of objects in RFID tag movement. A Trajectory of moving RFID tags consists of a sequence of tagsthat changes over time. With the integration of wireless communications and positioning technologies, the concept of Trajectory Database has become increasingly important, and has posed great challenges to the data mining community.The support of efficient trajectory similarity techniques is indisputably very important for the quality of data analysis tasks in Supply Chain Traffic which will enable similar product movements.

  9. Identifying Potential Protein Targets for Toluene Using a Molecular Similarity Search, in Silico Docking and in Vitro Validation

    Science.gov (United States)

    2015-01-01

    Molecular Similarity Search, in Silico Docking and in Vitro Validation 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6...hazardous in relatively low concentrations (ə mM for some, ə μM for others); and (3) appearance in multiple toxin/poison lists provided by government...removed and dis- carded while the remaining red blood cell pellet was re-sus- pended in a 0.9% isotonic saline solution and centrifuged at 3716g for 30

  10. SymDex: increasing the efficiency of chemical fingerprint similarity searches for comparing large chemical libraries by using query set indexing.

    Science.gov (United States)

    Tai, David; Fang, Jianwen

    2012-08-27

    The large sizes of today's chemical databases require efficient algorithms to perform similarity searches. It can be very time consuming to compare two large chemical databases. This paper seeks to build upon existing research efforts by describing a novel strategy for accelerating existing search algorithms for comparing large chemical collections. The quest for efficiency has focused on developing better indexing algorithms by creating heuristics for searching individual chemical against a chemical library by detecting and eliminating needless similarity calculations. For comparing two chemical collections, these algorithms simply execute searches for each chemical in the query set sequentially. The strategy presented in this paper achieves a speedup upon these algorithms by indexing the set of all query chemicals so redundant calculations that arise in the case of sequential searches are eliminated. We implement this novel algorithm by developing a similarity search program called Symmetric inDexing or SymDex. SymDex shows over a 232% maximum speedup compared to the state-of-the-art single query search algorithm over real data for various fingerprint lengths. Considerable speedup is even seen for batch searches where query set sizes are relatively small compared to typical database sizes. To the best of our knowledge, SymDex is the first search algorithm designed specifically for comparing chemical libraries. It can be adapted to most, if not all, existing indexing algorithms and shows potential for accelerating future similarity search algorithms for comparing chemical databases.

  11. Web Similarity

    NARCIS (Netherlands)

    Cohen, A.R.; Vitányi, P.M.B.

    2015-01-01

    Normalized web distance (NWD) is a similarity or normalized semantic distance based on the World Wide Web or any other large electronic database, for instance Wikipedia, and a search engine that returns reliable aggregate page counts. For sets of search terms the NWD gives a similarity on a scale fr

  12. Identification of aggregation breakers for bevacizumab (Avastin®) self-association through similarity searching and interaction studies.

    Science.gov (United States)

    Westermaier, Y; Veurink, M; Riis-Johannessen, T; Guinchard, S; Gurny, R; Scapozza, L

    2013-11-01

    Aggregation is a common challenge in the optimization of therapeutic antibody formulations. Since initial self-association of two monomers is typically a reversible process, the aim of this study is to identify different excipients that are able to shift this equilibrium to the monomeric state. The hypothesis is that a specific interaction between excipient and antibody may hinder two monomers from approaching each other, based on previous work in which dexamethasone phosphate showed the ability to partially reverse formed aggregates of the monoclonal IgG1 antibody bevacizumab back into monomers. The current study focuses on the selection of therapeutically inactive compounds with similar properties. Adenosine monophosphate, adenosine triphosphate, sucrose-6-phosphate and guanosine monophosphate were selected in silico through similarity searching and docking. All four compounds were predicted to bind to a protein-protein interaction hotspot on the Fc region of bevacizumab and thereby breaking dimer formation. The predictions were supported in vitro: An interaction between AMP and bevacizumab with a dissociation constant of 9.59±0.15 mM was observed by microscale thermophoresis. The stability of the antibody at elevated temperature (40 °C) in a 51 mM phosphate buffer pH 7 was investigated in presence and absence of the excipients. Quantification of the different aggregation species by asymmetrical flow field-flow fractionation and size exclusion chromatography demonstrates that all four excipients are able to partially overcome the initial self-association of bevacizumab monomers.

  13. Efficient generation, storage, and manipulation of fully flexible pharmacophore multiplets and their use in 3-D similarity searching.

    Science.gov (United States)

    Abrahamian, Edmond; Fox, Peter C; Naerum, Lars; Christensen, Inge Thøger; Thøgersen, Henning; Clark, Robert D

    2003-01-01

    Pharmacophore triplets and quartets have been used by many groups in recent years, primarily as a tool for molecular diversity analysis. In most cases, slow processing speeds and the very large size of the bitsets generated have forced researchers to compromise in terms of how such multiplets were stored, manipulated, and compared, e.g., by using simple unions to represent multiplets for sets of molecules. Here we report using bitmaps in place of bitsets to reduce storage demands and to improve processing speed. Here, a bitset is taken to mean a fully enumerated string of zeros and ones, from which a compressed bitmap is obtained by replacing uniform blocks ("runs") of digits in the bitset with a pair of values identifying the content and length of the block (run-length encoding compression). High-resolution multiplets involving four features are enabled by using 64 bit executables to create and manipulate bitmaps, which "connect" to the 32 bit executables used for database access and feature identification via an extensible mark-up language (XML) data stream. The encoding system used supports simple pairs, triplets, and quartets; multiplets in which a privileged substructure is used as an anchor point; and augmented multiplets in which an additional vertex is added to represent a contingent feature such as a hydrogen bond extension point linked to a complementary feature (e.g., a donor or an acceptor atom) in a base pair or triplet. It can readily be extended to larger, more complex multiplets as well. Database searching is one particular potential application for this technology. Consensus bitmaps built up from active ligands identified in preliminary screening can be used to generate hypothesis bitmaps, a process which includes allowance for differential weighting to allow greater emphasis to be placed on bits arising from multiplets expected to be particularly discriminating. Such hypothesis bitmaps are shown to be useful queries for database searching

  14. Retrieval of very large numbers of items in the Web of Science: an exercise to develop accurate search strategies

    CERN Document Server

    Arencibia-Jorge, Ricardo; Chinchilla-Rodriguez, Zaida; Rousseau, Ronald; Paris, Soren W

    2009-01-01

    The current communication presents a simple exercise with the aim of solving a singular problem: the retrieval of extremely large amounts of items in the Web of Science interface. As it is known, Web of Science interface allows a user to obtain at most 100,000 items from a single query. But what about queries that achieve a result of more than 100,000 items? The exercise developed one possible way to achieve this objective. The case study is the retrieval of the entire scientific production from the United States in a specific year. Different sections of items were retrieved using the field Source of the database. Then, a simple Boolean statement was created with the aim of eliminating overlapping and to improve the accuracy of the search strategy. The importance of team work in the development of advanced search strategies was noted.

  15. Introduction of the conditional correlated Bernoulli model of similarity value distributions and its application to the prospective prediction of fingerprint search performance.

    Science.gov (United States)

    Vogt, Martin; Bajorath, Jürgen

    2011-10-24

    A statistical approach named the conditional correlated Bernoulli model is introduced for modeling of similarity scores and predicting the potential of fingerprint search calculations to identify active compounds. Fingerprint features are rationalized as dependent Bernoulli variables and conditional distributions of Tanimoto similarity values of database compounds given a reference molecule are assessed. The conditional correlated Bernoulli model is utilized in the context of virtual screening to estimate the position of a compound obtaining a certain similarity value in a database ranking. Through the generation of receiver operating characteristic curves from cumulative distribution functions of conditional similarity values for known active and random database compounds, one can predict how successful a fingerprint search might be. The comparison of curves for different fingerprints makes it possible to identify fingerprints that are most likely to identify new active molecules in a database search given a set of known reference molecules.

  16. Design and implement of the search engine technology for the accurate decision in the hospital computer system%基于搜索引擎技术的医院精准决策应用研究与实现

    Institute of Scientific and Technical Information of China (English)

    赵立川

    2016-01-01

    随着医院决策系统的进步和发展,大部分医院形成了以指标、报告、OLAP等可视化分析功能为基础的数据展现平台。文中设计和实现了一种用于医院精准决策的搜索引擎技术,该搜索引擎针对结构化数据建立索引,通过中文分词匹配用户输入的语义关键字,将相似度高的数据进行聚合,并以可视化搜索页面的形式提供给用户,应用于医院的精准决策。%With the progress and development of the decision-making system in the hospital, the data display platform has been widely set up with the functions of visual analytic processing of index, report and OLAP as the bases. In this paper, a search engine technology for the Accurate Decision in the Hospital Computer System is designed and implemented. The proposed search engine builds index for the structured data, and matches the semantic key words entered by the user with the Chinese word segmentation technology, then the data with high similarity is put together. Finally, the searching results are displayed to the users in the visual searching page for accurate decision in the hospital.

  17. Novel DOCK clique driven 3D similarity database search tools for molecule shape matching and beyond: adding flexibility to the search for ligand kin.

    Science.gov (United States)

    Good, Andrew C

    2007-10-01

    With readily available CPU power and copious disk storage, it is now possible to undertake rapid comparison of 3D properties derived from explicit ligand overlay experiments. With this in mind, shape software tools originally devised in the 1990s are revisited, modified and applied to the problem of ligand database shape comparison. The utility of Connolly surface data is highlighted using the program MAKESITE, which leverages surface normal data to a create ligand shape cast. This cast is applied directly within DOCK, allowing the program to be used unmodified as a shape searching tool. In addition, DOCK has undergone multiple modifications to create a dedicated ligand shape comparison tool KIN. Scoring has been altered to incorporate the original incarnation of Gaussian function derived shape description based on STO-3G atomic electron density. In addition, a tabu-like search refinement has been added to increase search speed by removing redundant starting orientations produced during clique matching. The ability to use exclusion regions, again based on Gaussian shape overlap, has also been integrated into the scoring function. The use of both DOCK with MAKESITE and KIN in database screening mode is illustrated using a published ligand shape virtual screening template. The advantages of using a clique-driven search paradigm are highlighted, including shape optimization within a pharmacophore constrained framework, and easy incorporation of additional scoring function modifications. The potential for further development of such methods is also discussed.

  18. Visual search for real world targets under conditions of high target-background similarity: Exploring training and transfer in younger and older adults.

    Science.gov (United States)

    Neider, Mark B; Boot, Walter R; Kramer, Arthur F

    2010-05-01

    Real world visual search tasks often require observers to locate a target that blends in with its surrounding environment. However, studies of the effect of target-background similarity on search processes have been relatively rare and have ignored potential age-related differences. We trained younger and older adults to search displays comprised of real world objects on either homogenous backgrounds or backgrounds that camouflaged the target. Training was followed by a transfer session in which participants searched for novel camouflaged objects. Although older adults were slower to locate the target compared to younger adults, all participants improved substantially with training. Surprisingly, camouflage-trained younger and older adults showed no performance decrements when transferred to novel camouflage displays, suggesting that observers learned age-invariant, generalizable skills relevant for searching under conditions of high target-background similarity. Camouflage training benefits at transfer for older adults appeared to be related to improvements in attentional guidance and target recognition rather than a more efficient search strategy.

  19. Developing Molecular Interaction Database and Searching for Similar Pathways (MOLECULAR BIOLOGY AND INFORMATION-Biological Information Science)

    OpenAIRE

    Kawashima, Shuichi; Katayama, Toshiaki; Kanehisa, Minoru

    1998-01-01

    We have developed a database named BRITE, which contains knowledge of interacting molecules and/or genes concering cell cycle and early development. Here, we report an overview of the database and the method of automatic search for functionally common sub-pathways between two biological pathways in BRITE.

  20. Devices used by automated milking systems are similarly accurate in estimating milk yield and in collecting a representative milk sample compared with devices used by farms with conventional milk recording

    NARCIS (Netherlands)

    Kamphuis, Claudia; Dela Rue, B.; Turner, S.A.; Petch, S.

    2015-01-01

    Information on accuracy of milk-sampling devices used on farms with automated milking systems (AMS) is essential for development of milk recording protocols. The hypotheses of this study were (1) devices used by AMS units are similarly accurate in estimating milk yield and in collecting

  1. Removal of redundant contigs from de novo RNA-Seq assemblies via homology search improves accurate detection of differentially expressed genes.

    Science.gov (United States)

    Ono, Hanako; Ishii, Kazuo; Kozaki, Toshinori; Ogiwara, Isao; Kanekatsu, Motoki; Yamada, Tetsuya

    2015-12-04

    For plant species with unsequenced genomes, cDNA contigs created by de novo assembly of RNA-Seq reads are used as reference sequences for comparative analysis of RNA-Seq datasets and the detection of differentially expressed genes (DEGs). Redundancies in such contigs are evident in previous RNA-Seq studies, and such redundancies can lead to difficulties in subsequent analysis. Nevertheless, the effects of removing redundancy from contig assemblies on comparative RNA-Seq analysis have not been evaluated. Here we describe a method for removing redundancy from raw contigs that were primarily created by de novo assembly of Arabidopsis thaliana RNA-Seq reads. Specifically, the contigs with the highest bit scores were selected from raw contigs by a homology search against the gene dataset in the TAIR10 database. The two existing methods for removal of redundancy based on contig length or clustering analysis used to eliminate redundancies from raw contigs. Contig number was reduced most effectively with the method based on homology search. In a comparative analysis of RNA-Seq datasets, DEGs detected in contigs that underwent redundancy removal via the homology search method showed the highest identity to the DEGs detected when the TAIR10 gene dataset was used as an exact reference. Redundancy in raw contigs could also be removed by a homology search against integrated protein datasets from several plant species other than A. thaliana. DEGs detected using contigs that underwent such redundancy-removed also showed high homology to DEGs detected using the TAIR10 gene dataset. Here we describe a method for removing redundant contigs within raw contigs; this method involves a homology search against a gene or protein database. In principal, this method can be used with unsequenced plant genomes that lack a well-developed gene database. Redundant contigs were not removed adequately via either of two existing methods, but our method allowed for removal of all redundant contigs

  2. Using argumentation to retrieve articles with similar citations: an inquiry into improving related articles search in the MEDLINE digital library.

    Science.gov (United States)

    Tbahriti, Imad; Chichester, Christine; Lisacek, Frédérique; Ruch, Patrick

    2006-06-01

    The aim of this study is to investigate the relationships between citations and the scientific argumentation found abstracts. We design a related article search task and observe how the argumentation can affect the search results. We extracted citation lists from a set of 3200 full-text papers originating from a narrow domain. In parallel, we recovered the corresponding MEDLINE records for analysis of the argumentative moves. Our argumentative model is founded on four classes: PURPOSE, METHODS, RESULTS and CONCLUSION. A Bayesian classifier trained on explicitly structured MEDLINE abstracts generates these argumentative categories. The categories are used to generate four different argumentative indexes. A fifth index contains the complete abstract, together with the title and the list of Medical Subject Headings (MeSH) terms. To appraise the relationship of the moves to the citations, the citation lists were used as the criteria for determining relatedness of articles, establishing a benchmark; it means that two articles are considered as "related" if they share a significant set of co-citations. Our results show that the average precision of queries with the PURPOSE and CONCLUSION features is the highest, while the precision of the RESULTS and METHODS features was relatively low. A linear weighting combination of the moves is proposed, which significantly improves retrieval of related articles.

  3. eF-seek: prediction of the functional sites of proteins by searching for similar electrostatic potential and molecular surface shape

    Science.gov (United States)

    Kinoshita, Kengo; Murakami, Yoichi; Nakamura, Haruki

    2007-01-01

    We have developed a method to predict ligand-binding sites in a new protein structure by searching for similar binding sites in the Protein Data Bank (PDB). The similarities are measured according to the shapes of the molecular surfaces and their electrostatic potentials. A new web server, eF-seek, provides an interface to our search method. It simply requires a coordinate file in the PDB format, and generates a prediction result as a virtual complex structure, with the putative ligands in a PDB format file as the output. In addition, the predicted interacting interface is displayed to facilitate the examination of the virtual complex structure on our own applet viewer with the web browser (URL: http://eF-site.hgc.jp/eF-seek). PMID:17567616

  4. PhenoMeter: a metabolome database search tool using statistical similarity matching of metabolic phenotypes for high-confidence detection of functional links

    Directory of Open Access Journals (Sweden)

    Adam James Carroll

    2015-07-01

    Full Text Available This article describes PhenoMeter, a new type of metabolomics database search that accepts metabolite response patterns as queries and searches the MetaPhen database of reference patterns for responses that are statistically significantly similar or inverse for the purposes of detecting functional links. To identify a similarity measure that would detect functional links as reliably as possible, we compared the performance of four statistics in correctly top-matching metabolic phenotypes of Arabidopsis thaliana metabolism mutants affected in different steps of the photorespiration metabolic pathway to reference phenotypes of mutants affected in the same enzymes by independent mutations. The best performing statistic, the PhenoMeter Score (PM Score, was a function of both Pearson correlation and Fisher’s Exact Test of directional overlap. This statistic outperformed Pearson correlation, biweight midcorrelation and Fisher’s Exact Test used alone. To demonstrate general applicability, we show that the PhenoMeter reliably retrieved the most closely functionally-linked response in the database when queried with responses to a wide variety of environmental and genetic perturbations. Attempts to match metabolic phenotypes between independent studies were met with varying success and possible reasons for this are discussed. Overall, our results suggest that integration of pattern-based search tools into metabolomics databases will aid functional annotation of newly recorded metabolic phenotypes analogously to the way sequence similarity search algorithms have aided the functional annotation of genes and proteins. PhenoMeter is freely available at MetabolomeExpress (https://www.metabolome-express.org/phenometer.php.

  5. Similarity-potency trees: a method to search for SAR information in compound data sets and derive SAR rules.

    Science.gov (United States)

    Wawer, Mathias; Bajorath, Jürgen

    2010-08-23

    An intuitive and generally applicable analysis method, termed similarity-potency tree (SPT), is introduced to mine structure-activity relationship (SAR) information in compound data sets of any source. Only compound potency values and nearest-neighbor similarity relationships are considered. Rather than analyzing a data set as a whole, in part overlapping compound neighborhoods are systematically generated and represented as SPTs. This local analysis scheme simplifies the evaluation of SAR information and SPTs of high SAR information content are easily identified. By inspecting only a limited number of compound neighborhoods, it is also straightforward to determine whether data sets contain only little or no interpretable SAR information. Interactive analysis of SPTs is facilitated by reading the trees in two directions, which makes it possible to extract SAR rules, if available, in a consistent manner. The simplicity and interpretability of the data structure and the ease of calculation are characteristic features of this approach. We apply the methodology to high-throughput screening and lead optimization data sets, compare the approach to standard clustering techniques, illustrate how SAR rules are derived, and provide some practical guidance how to best utilize the methodology. The SPT program is made freely available to the scientific community.

  6. A relook on using the Earth Similarity Index for searching habitable zones around solar and extrasolar planets

    Science.gov (United States)

    Biswas, S.; Shome, A.; Raha, B.; Bhattacharya, A. B.

    2017-01-01

    To study the distribution of Earth-like planets and to locate the habitable zone around extrasolar planets and their known satellites, we have emphasized in this paper the consideration of Earth similarity index (ESI) as a multi parameter quick assessment of Earth-likeness with a value between zero and one. Weight exponent values for four planetary properties have been taken into account to determine the ESI. A plot of surface ESI against the interior ESI exhibits some interesting results which provide further information when confirmed planets are examined. From the analysis of the available catalog and existing theory, none of the solar planets achieves an ESI value greater than 0.8. Though the planet Mercury has a value of 0.6, Mars exhibits a value between 0.6 and 0.8 and the planet Venus shows a value near 0.5. Finally, the locations of the habitable zone around different type of stars are critically examined and discussed.

  7. A spin transfer torque magnetoresistance random access memory-based high-density and ultralow-power associative memory for fully data-adaptive nearest neighbor search with current-mode similarity evaluation and time-domain minimum searching

    Science.gov (United States)

    Ma, Yitao; Miura, Sadahiko; Honjo, Hiroaki; Ikeda, Shoji; Hanyu, Takahiro; Ohno, Hideo; Endoh, Tetsuo

    2017-04-01

    A high-density nonvolatile associative memory (NV-AM) based on spin transfer torque magnetoresistive random access memory (STT-MRAM), which achieves highly concurrent and ultralow-power nearest neighbor search with full adaptivity of the template data format, has been proposed and fabricated using the 90 nm CMOS/70 nm perpendicular-magnetic-tunnel-junction hybrid process. A truly compact current-mode circuitry is developed to realize flexibly controllable and high-parallel similarity evaluation, which makes the NV-AM adaptable to any dimensionality and component-bit of template data. A compact dual-stage time-domain minimum searching circuit is also developed, which can freely extend the system for more template data by connecting multiple NM-AM cores without additional circuits for integrated processing. Both the embedded STT-MRAM module and the computing circuit modules in this NV-AM chip are synchronously power-gated to completely eliminate standby power and maximally reduce operation power by only activating the currently accessed circuit blocks. The operations of a prototype chip at 40 MHz are demonstrated by measurement. The average operation power is only 130 µW, and the circuit density is less than 11 µm2/bit. Compared with the latest conventional works in both volatile and nonvolatile approaches, more than 31.3% circuit area reductions and 99.2% power improvements are achieved, respectively. Further power performance analyses are discussed, which verify the special superiority of the proposed NV-AM in low-power and large-memory-based VLSIs.

  8. An Accurate FOA and TOA Estimation Algorithm for Galileo Search and Rescue Signal%伽利略搜救信号FOA和TOA精确估计算法

    Institute of Scientific and Technical Information of China (English)

    王堃; 吴嗣亮; 韩月涛

    2011-01-01

    According to the high precision demand of Frequency of Arrival(FOA) and Time of Arrival(TOA) estimation in Galileo search and rescue(SAR) system and considering the fact that the message bit width is unknown in real received beacons,a new FOA and TOA estimation algorithm which combines the multi-dimensional joint maximum likelihood estimation algorithm and barycenter calculation algorithm is proposed.The principle of the algorithm is derived after the signal model is introduced,and the concrete realization of the estimation algorithm is given.Monte Carlo simulation results and measurement results show that when CNR equals the threshold of 34.8 dBHz,FOA and TOA estimation rmse(root-mean-square error) of this algorithm are respectively within 0.03 Hz and 9.5 μs,which are better than the system requirements of 0.05 Hz and 11 μs.This algorithm has been applied to the Galileo Medium-altitude Earth Orbit Local User Terminal(MEOLUT station).%针对伽利略搜救系统中到达频率(FOA)和到达时间(TOA)高精度估计的需求,考虑到实际接收的信标信号中信息位宽未知的情况,提出了多维联合极大似然估计算法和体积重心算法相结合的FOA和TOA估计算法。在介绍信号模型的基础上推导了算法原理,给出了估计算法的具体实现过程。Monte Carlo仿真和实测结果表明,在34.8 dBHz的处理门限下,该算法得到的FOA和TOA估计的均方根误差分别小于0.03 Hz和9.5μs,优于0.05 Hz和11μs的系统指标要求。该算法目前已应用于伽利略中轨卫星地面用户终端(MEOLUT地面站)。

  9. Megraft: a software package to graft ribosomal small subunit (16S/18S) fragments onto full-length sequences for accurate species richness and sequencing depth analysis in pyrosequencing-length metagenomes and similar environmental datasets.

    Science.gov (United States)

    Bengtsson, Johan; Hartmann, Martin; Unterseher, Martin; Vaishampayan, Parag; Abarenkov, Kessy; Durso, Lisa; Bik, Elisabeth M; Garey, James R; Eriksson, K Martin; Nilsson, R Henrik

    2012-07-01

    Metagenomic libraries represent subsamples of the total DNA found at a study site and offer unprecedented opportunities to study ecological and functional aspects of microbial communities. To examine the depth of a community sequencing effort, rarefaction analysis of the ribosomal small subunit (SSU/16S/18S) gene in the metagenome is usually performed. The fragmentary, non-overlapping nature of SSU sequences in metagenomic libraries poses a problem for this analysis, however. We introduce a software package - Megraft - that grafts SSU fragments onto full-length SSU sequences, accounting for observed and unobserved variability, for accurate assessment of species richness and sequencing depth in metagenomics endeavors.

  10. Combination of 2D/3D Ligand-Based Similarity Search in Rapid Virtual Screening from Multimillion Compound Repositories. Selection and Biological Evaluation of Potential PDE4 and PDE5 Inhibitors

    Directory of Open Access Journals (Sweden)

    Krisztina Dobi

    2014-05-01

    Full Text Available Rapid in silico selection of target focused libraries from commercial repositories is an attractive and cost effective approach. If structures of active compounds are available rapid 2D similarity search can be performed on multimillion compound databases but the generated library requires further focusing by various 2D/3D chemoinformatics tools. We report here a combination of the 2D approach with a ligand-based 3D method (Screen3D which applies flexible matching to align reference and target compounds in a dynamic manner and thus to assess their structural and conformational similarity. In the first case study we compared the 2D and 3D similarity scores on an existing dataset derived from the biological evaluation of a PDE5 focused library. Based on the obtained similarity metrices a fusion score was proposed. The fusion score was applied to refine the 2D similarity search in a second case study where we aimed at selecting and evaluating a PDE4B focused library. The application of this fused 2D/3D similarity measure led to an increase of the hit rate from 8.5% (1st round, 47% inhibition at 10 µM to 28.5% (2nd round at 50% inhibition at 10 µM and the best two hits had 53 nM inhibitory activities.

  11. Compression-based Similarity

    CERN Document Server

    Vitanyi, Paul M B

    2011-01-01

    First we consider pair-wise distances for literal objects consisting of finite binary files. These files are taken to contain all of their meaning, like genomes or books. The distances are based on compression of the objects concerned, normalized, and can be viewed as similarity distances. Second, we consider pair-wise distances between names of objects, like "red" or "christianity." In this case the distances are based on searches of the Internet. Such a search can be performed by any search engine that returns aggregate page counts. We can extract a code length from the numbers returned, use the same formula as before, and derive a similarity or relative semantics between names for objects. The theory is based on Kolmogorov complexity. We test both similarities extensively experimentally.

  12. Search of phenotype related candidate genes using gene ontology-based semantic similarity and protein interaction information: application to Brugada syndrome.

    Science.gov (United States)

    Massanet, Raimon; Gallardo-Chacon, Joan-Josep; Caminal, Pere; Perera, Alexandre

    2009-01-01

    This work presents a methodology for finding phenotype candidate genes starting from a set of known related genes. This is accomplished by automatically mining and organizing the available scientific literature using Gene Ontology-based semantic similarity. As a case study, Brugada syndrome related genes have been used as input in order to obtain a list of other possible candidate genes related with this disease. Brugada anomaly produces a typical alteration in the Electrocardiogram and carriers of the disease show an increased probability of sudden death. Results show a set of semantically coherent proteins that are shown to be related with synaptic transmission and muscle contraction physiological processes.

  13. Personalized Search

    CERN Document Server

    AUTHOR|(SzGeCERN)749939

    2015-01-01

    As the volume of electronically available information grows, relevant items become harder to find. This work presents an approach to personalizing search results in scientific publication databases. This work focuses on re-ranking search results from existing search engines like Solr or ElasticSearch. This work also includes the development of Obelix, a new recommendation system used to re-rank search results. The project was proposed and performed at CERN, using the scientific publications available on the CERN Document Server (CDS). This work experiments with re-ranking using offline and online evaluation of users and documents in CDS. The experiments conclude that the personalized search result outperform both latest first and word similarity in terms of click position in the search result for global search in CDS.

  14. Active browsing using similarity pyramids

    Science.gov (United States)

    Chen, Jau-Yuen; Bouman, Charles A.; Dalton, John C.

    1998-12-01

    In this paper, we describe a new approach to managing large image databases, which we call active browsing. Active browsing integrates relevance feedback into the browsing environment, so that users can modify the database's organization to suit the desired task. Our method is based on a similarity pyramid data structure, which hierarchically organizes the database, so that it can be efficiently browsed. At coarse levels, the similarity pyramid allows users to view the database as large clusters of similar images. Alternatively, users can 'zoom into' finer levels to view individual images. We discuss relevance feedback for the browsing process, and argue that it is fundamentally different from relevance feedback for more traditional search-by-query tasks. We propose two fundamental operations for active browsing: pruning and reorganization. Both of these operations depend on a user-defined relevance set, which represents the image or set of images desired by the user. We present statistical methods for accurately pruning the database, and we propose a new 'worm hole' distance metric for reorganizing the database, so that members of the relevance set are grouped together.

  15. Efficient Video Similarity Measurement and Search

    Science.gov (United States)

    2002-01-01

    sequence matching techniques for video copy detection,” in Proceedings of SPIE – Storage and Retrieval for Media Databases 2002, San Jose , CA, January 2002...Proceedings of the Storage and Retrieval for Media Datbases 2001, San Jose , USA, jan 2001, vol. 4315, pp. 188–195. [19] D. Adjeroh, I. King, and M.C. Lee, “A... Vasconcelos , “On the complexity of probabilistic image retrieval,” in Pro- ceedings Eighth IEEE International Conference on Computer Vision, Vancou- ver, B.C

  16. Similarity Scaling

    Science.gov (United States)

    Schnack, Dalton D.

    In Lecture 10, we introduced a non-dimensional parameter called the Lundquist number, denoted by S. This is just one of many non-dimensional parameters that can appear in the formulations of both hydrodynamics and MHD. These generally express the ratio of the time scale associated with some dissipative process to the time scale associated with either wave propagation or transport by flow. These are important because they define regions in parameter space that separate flows with different physical characteristics. All flows that have the same non-dimensional parameters behave in the same way. This property is called similarity scaling.

  17. Proteomic analysis of cellular soluble proteins from human bronchial smooth muscle cells by combining nondenaturing micro 2DE and quantitative LC-MS/MS. 2. Similarity search between protein maps for the analysis of protein complexes.

    Science.gov (United States)

    Jin, Ya; Yuan, Qi; Zhang, Jun; Manabe, Takashi; Tan, Wen

    2015-09-01

    Human bronchial smooth muscle cell soluble proteins were analyzed by a combined method of nondenaturing micro 2DE, grid gel-cutting, and quantitative LC-MS/MS and a native protein map was prepared for each of the identified 4323 proteins [1]. A method to evaluate the degree of similarity between the protein maps was developed since we expected the proteins comprising a protein complex would be separated together under nondenaturing conditions. The following procedure was employed using Excel macros; (i) maps that have three or more squares with protein quantity data were selected (2328 maps), (ii) within each map, the quantity values of the squares were normalized setting the highest value to be 1.0, (iii) in comparing a map with another map, the smaller normalized quantity in two corresponding squares was taken and summed throughout the map to give an "overlap score," (iv) each map was compared against all the 2328 maps and the largest overlap score, obtained when a map was compared with itself, was set to be 1.0 thus providing 2328 "overlap factors," (v) step (iv) was repeated for all maps providing 2328 × 2328 matrix of overlap factors. From the matrix, protein pairs that showed overlap factors above 0.65 from both protein sides were selected (431 protein pairs). Each protein pair was searched in a database (UniProtKB) on complex formation and 301 protein pairs, which comprise 35 protein complexes, were found to be documented. These results demonstrated that native protein maps and their similarity search would enable simultaneous analysis of multiple protein complexes in cells.

  18. Proteomic analysis of cellular soluble proteins from human bronchial smooth muscle cells by combining nondenaturing micro 2DE and quantitative LC‐MS/MS. 2. Similarity search between protein maps for the analysis of protein complexes

    Science.gov (United States)

    Jin, Ya; Yuan, Qi; Zhang, Jun; Manabe, Takashi

    2015-01-01

    Human bronchial smooth muscle cell soluble proteins were analyzed by a combined method of nondenaturing micro 2DE, grid gel‐cutting, and quantitative LC‐MS/MS and a native protein map was prepared for each of the identified 4323 proteins [1]. A method to evaluate the degree of similarity between the protein maps was developed since we expected the proteins comprising a protein complex would be separated together under nondenaturing conditions. The following procedure was employed using Excel macros; (i) maps that have three or more squares with protein quantity data were selected (2328 maps), (ii) within each map, the quantity values of the squares were normalized setting the highest value to be 1.0, (iii) in comparing a map with another map, the smaller normalized quantity in two corresponding squares was taken and summed throughout the map to give an “overlap score,” (iv) each map was compared against all the 2328 maps and the largest overlap score, obtained when a map was compared with itself, was set to be 1.0 thus providing 2328 “overlap factors,” (v) step (iv) was repeated for all maps providing 2328 × 2328 matrix of overlap factors. From the matrix, protein pairs that showed overlap factors above 0.65 from both protein sides were selected (431 protein pairs). Each protein pair was searched in a database (UniProtKB) on complex formation and 301 protein pairs, which comprise 35 protein complexes, were found to be documented. These results demonstrated that native protein maps and their similarity search would enable simultaneous analysis of multiple protein complexes in cells. PMID:26031785

  19. Wavelet transform in similarity paradigm

    NARCIS (Netherlands)

    Z.R. Struzik; A.P.J.M. Siebes (Arno)

    1998-01-01

    textabstract[INS-R9802] Searching for similarity in time series finds still broader applications in data mining. However, due to the very broad spectrum of data involved, there is no possibility of defining one single notion of similarity suitable to serve all applications. We present a powerful

  20. Applying ligands profiling using multiple extended electron distribution based field templates and feature trees similarity searching in the discovery of new generation of urea-based antineoplastic kinase inhibitors.

    Directory of Open Access Journals (Sweden)

    Eman M Dokla

    Full Text Available This study provides a comprehensive computational procedure for the discovery of novel urea-based antineoplastic kinase inhibitors while focusing on diversification of both chemotype and selectivity pattern. It presents a systematic structural analysis of the different binding motifs of urea-based kinase inhibitors and the corresponding configurations of the kinase enzymes. The computational model depends on simultaneous application of two protocols. The first protocol applies multiple consecutive validated virtual screening filters including SMARTS, support vector-machine model (ROC = 0.98, Bayesian model (ROC = 0.86 and structure-based pharmacophore filters based on urea-based kinase inhibitors complexes retrieved from literature. This is followed by hits profiling against different extended electron distribution (XED based field templates representing different kinase targets. The second protocol enables cancericidal activity verification by using the algorithm of feature trees (Ftrees similarity searching against NCI database. Being a proof-of-concept study, this combined procedure was experimentally validated by its utilization in developing a novel series of urea-based derivatives of strong anticancer activity. This new series is based on 3-benzylbenzo[d]thiazol-2(3H-one scaffold which has interesting chemical feasibility and wide diversification capability. Antineoplastic activity of this series was assayed in vitro against NCI 60 tumor-cell lines showing very strong inhibition of GI(50 as low as 0.9 uM. Additionally, its mechanism was unleashed using KINEX™ protein kinase microarray-based small molecule inhibitor profiling platform and cell cycle analysis showing a peculiar selectivity pattern against Zap70, c-src, Mink1, csk and MeKK2 kinases. Interestingly, it showed activity on syk kinase confirming the recent studies finding of the high activity of diphenyl urea containing compounds against this kinase. Allover, the new series

  1. Applying ligands profiling using multiple extended electron distribution based field templates and feature trees similarity searching in the discovery of new generation of urea-based antineoplastic kinase inhibitors.

    Science.gov (United States)

    Dokla, Eman M; Mahmoud, Amr H; Elsayed, Mohamed S A; El-Khatib, Ahmed H; Linscheid, Michael W; Abouzid, Khaled A

    2012-01-01

    This study provides a comprehensive computational procedure for the discovery of novel urea-based antineoplastic kinase inhibitors while focusing on diversification of both chemotype and selectivity pattern. It presents a systematic structural analysis of the different binding motifs of urea-based kinase inhibitors and the corresponding configurations of the kinase enzymes. The computational model depends on simultaneous application of two protocols. The first protocol applies multiple consecutive validated virtual screening filters including SMARTS, support vector-machine model (ROC = 0.98), Bayesian model (ROC = 0.86) and structure-based pharmacophore filters based on urea-based kinase inhibitors complexes retrieved from literature. This is followed by hits profiling against different extended electron distribution (XED) based field templates representing different kinase targets. The second protocol enables cancericidal activity verification by using the algorithm of feature trees (Ftrees) similarity searching against NCI database. Being a proof-of-concept study, this combined procedure was experimentally validated by its utilization in developing a novel series of urea-based derivatives of strong anticancer activity. This new series is based on 3-benzylbenzo[d]thiazol-2(3H)-one scaffold which has interesting chemical feasibility and wide diversification capability. Antineoplastic activity of this series was assayed in vitro against NCI 60 tumor-cell lines showing very strong inhibition of GI(50) as low as 0.9 uM. Additionally, its mechanism was unleashed using KINEX™ protein kinase microarray-based small molecule inhibitor profiling platform and cell cycle analysis showing a peculiar selectivity pattern against Zap70, c-src, Mink1, csk and MeKK2 kinases. Interestingly, it showed activity on syk kinase confirming the recent studies finding of the high activity of diphenyl urea containing compounds against this kinase. Allover, the new series, which is based on

  2. Custom Search Engines: Tools & Tips

    Science.gov (United States)

    Notess, Greg R.

    2008-01-01

    Few have the resources to build a Google or Yahoo! from scratch. Yet anyone can build a search engine based on a subset of the large search engines' databases. Use Google Custom Search Engine or Yahoo! Search Builder or any of the other similar programs to create a vertical search engine targeting sites of interest to users. The basic steps to…

  3. Different predictors of multiple-target search accuracy between nonprofessional and professional visual searchers.

    Science.gov (United States)

    Biggs, Adam T; Mitroff, Stephen R

    2014-01-01

    Visual search, locating target items among distractors, underlies daily activities ranging from critical tasks (e.g., looking for dangerous objects during security screening) to commonplace ones (e.g., finding your friends in a crowded bar). Both professional and nonprofessional individuals conduct visual searches, and the present investigation is aimed at understanding how they perform similarly and differently. We administered a multiple-target visual search task to both professional (airport security officers) and nonprofessional participants (members of the Duke University community) to determine how search abilities differ between these populations and what factors might predict accuracy. There were minimal overall accuracy differences, although the professionals were generally slower to respond. However, the factors that predicted accuracy varied drastically between groups; variability in search consistency-how similarly an individual searched from trial to trial in terms of speed-best explained accuracy for professional searchers (more consistent professionals were more accurate), whereas search speed-how long an individual took to complete a search when no targets were present-best explained accuracy for nonprofessional searchers (slower nonprofessionals were more accurate). These findings suggest that professional searchers may utilize different search strategies from those of nonprofessionals, and that search consistency, in particular, may provide a valuable tool for enhancing professional search accuracy.

  4. Speaking Fluently And Accurately

    Institute of Scientific and Technical Information of China (English)

    JosephDeVeto

    2004-01-01

    Even after many years of study,students make frequent mistakes in English. In addition, many students still need a long time to think of what they want to say. For some reason, in spite of all the studying, students are still not quite fluent.When I teach, I use one technique that helps students not only speak more accurately, but also more fluently. That technique is dictations.

  5. Accurate backgrounds to Higgs production at the LHC

    CERN Document Server

    Kauer, N

    2007-01-01

    Corrections of 10-30% for backgrounds to the H --> WW --> l^+l^-\\sla{p}_T search in vector boson and gluon fusion at the LHC are reviewed to make the case for precise and accurate theoretical background predictions.

  6. Domain similarity based orthology detection.

    Science.gov (United States)

    Bitard-Feildel, Tristan; Kemena, Carsten; Greenwood, Jenny M; Bornberg-Bauer, Erich

    2015-05-13

    Orthologous protein detection software mostly uses pairwise comparisons of amino-acid sequences to assert whether two proteins are orthologous or not. Accordingly, when the number of sequences for comparison increases, the number of comparisons to compute grows in a quadratic order. A current challenge of bioinformatic research, especially when taking into account the increasing number of sequenced organisms available, is to make this ever-growing number of comparisons computationally feasible in a reasonable amount of time. We propose to speed up the detection of orthologous proteins by using strings of domains to characterize the proteins. We present two new protein similarity measures, a cosine and a maximal weight matching score based on domain content similarity, and new software, named porthoDom. The qualities of the cosine and the maximal weight matching similarity measures are compared against curated datasets. The measures show that domain content similarities are able to correctly group proteins into their families. Accordingly, the cosine similarity measure is used inside porthoDom, the wrapper developed for proteinortho. porthoDom makes use of domain content similarity measures to group proteins together before searching for orthologs. By using domains instead of amino acid sequences, the reduction of the search space decreases the computational complexity of an all-against-all sequence comparison. We demonstrate that representing and comparing proteins as strings of discrete domains, i.e. as a concatenation of their unique identifiers, allows a drastic simplification of search space. porthoDom has the advantage of speeding up orthology detection while maintaining a degree of accuracy similar to proteinortho. The implementation of porthoDom is released using python and C++ languages and is available under the GNU GPL licence 3 at http://www.bornberglab.org/pages/porthoda .

  7. Efficient protein structure search using indexing methods.

    Science.gov (United States)

    Kim, Sungchul; Sael, Lee; Yu, Hwanjo

    2013-01-01

    Understanding functions of proteins is one of the most important challenges in many studies of biological processes. The function of a protein can be predicted by analyzing the functions of structurally similar proteins, thus finding structurally similar proteins accurately and efficiently from a large set of proteins is crucial. A protein structure can be represented as a vector by 3D-Zernike Descriptor (3DZD) which compactly represents the surface shape of the protein tertiary structure. This simplified representation accelerates the searching process. However, computing the similarity of two protein structures is still computationally expensive, thus it is hard to efficiently process many simultaneous requests of structurally similar protein search. This paper proposes indexing techniques which substantially reduce the search time to find structurally similar proteins. In particular, we first exploit two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. After that, we extend the techniques to further improve the search speed for protein structures. The extended indexing techniques build and utilize an reduced index constructed from the first few attributes of 3DZDs of protein structures. To retrieve top-k similar structures, top-10 × k similar structures are first found using the reduced index, and top-k structures are selected among them. We also modify the indexing techniques to support θ-based nearest neighbor search, which returns data points less than θ to the query point. The results show that both iDistance and iKernel significantly enhance the searching speed. In top-k nearest neighbor search, the searching time is reduced 69.6%, 77%, 77.4% and 87.9%, respectively using iDistance, iKernel, the extended iDistance, and the extended iKernel. In θ-based nearest neighbor serach, the searching time is reduced 80%, 81%, 95.6% and 95.6% using iDistance, iKernel, the extended iDistance, and the extended iKernel, respectively.

  8. Contextual Bandits with Similarity Information

    CERN Document Server

    Slivkins, Aleksandrs

    2009-01-01

    In a multi-armed bandit (MAB) problem, an online algorithm makes a sequence of choices. In each round it chooses from a time-invariant set of alternatives and receives the payoff associated with this alternative. While the case of small strategy sets is by now well-understood, a lot of recent work has focused on MAB problems with exponentially or infinitely large strategy sets, where one needs to assume extra structure in order to make the problem tractable. In particular, recent literature considered information on similarity between arms. We consider similarity information in the setting of "contextual bandits", a natural extension of the basic MAB problem where before each round an algorithm is given the "context" -- a hint about the payoffs in this round. Contextual bandits are directly motivated by placing advertisements on webpages, one of the crucial problems in sponsored search. A particularly simple way to represent similarity information in the contextual bandit setting is via a "similarity distance...

  9. Professional Microsoft search fast search, Sharepoint search, and search server

    CERN Document Server

    Bennett, Mark; Kehoe, Miles; Voskresenskaya, Natalya

    2010-01-01

    Use Microsoft's latest search-based technology-FAST search-to plan, customize, and deploy your search solutionFAST is Microsoft's latest intelligent search-based technology that boasts robustness and an ability to integrate business intelligence with Search. This in-depth guide provides you with advanced coverage on FAST search and shows you how to use it to plan, customize, and deploy your search solution, with an emphasis on SharePoint 2010 and Internet-based search solutions.With a particular appeal for anyone responsible for implementing and managing enterprise search, this book presents t

  10. BIOACCESSIBILITY TESTS ACCURATELY ESTIMATE ...

    Science.gov (United States)

    Hazards of soil-borne Pb to wild birds may be more accurately quantified if the bioavailability of that Pb is known. To better understand the bioavailability of Pb to birds, we measured blood Pb concentrations in Japanese quail (Coturnix japonica) fed diets containing Pb-contaminated soils. Relative bioavailabilities were expressed by comparison with blood Pb concentrations in quail fed a Pb acetate reference diet. Diets containing soil from five Pb-contaminated Superfund sites had relative bioavailabilities from 33%-63%, with a mean of about 50%. Treatment of two of the soils with P significantly reduced the bioavailability of Pb. The bioaccessibility of the Pb in the test soils was then measured in six in vitro tests and regressed on bioavailability. They were: the “Relative Bioavailability Leaching Procedure” (RBALP) at pH 1.5, the same test conducted at pH 2.5, the “Ohio State University In vitro Gastrointestinal” method (OSU IVG), the “Urban Soil Bioaccessible Lead Test”, the modified “Physiologically Based Extraction Test” and the “Waterfowl Physiologically Based Extraction Test.” All regressions had positive slopes. Based on criteria of slope and coefficient of determination, the RBALP pH 2.5 and OSU IVG tests performed very well. Speciation by X-ray absorption spectroscopy demonstrated that, on average, most of the Pb in the sampled soils was sorbed to minerals (30%), bound to organic matter 24%, or present as Pb sulfate 18%. Ad

  11. Search Cloud

    Science.gov (United States)

    ... of this page: https://medlineplus.gov/cloud.html Search Cloud To use the sharing features on this ... of Top 110 zoster vaccine Share the MedlinePlus search cloud with your users by embedding our search ...

  12. Distance learning for similarity estimation.

    Science.gov (United States)

    Yu, Jie; Amores, Jaume; Sebe, Nicu; Radeva, Petia; Tian, Qi

    2008-03-01

    In this paper, we present a general guideline to find a better distance measure for similarity estimation based on statistical analysis of distribution models and distance functions. A new set of distance measures are derived from the harmonic distance, the geometric distance, and their generalized variants according to the Maximum Likelihood theory. These measures can provide a more accurate feature model than the classical Euclidean and Manhattan distances. We also find that the feature elements are often from heterogeneous sources that may have different influence on similarity estimation. Therefore, the assumption of single isotropic distribution model is often inappropriate. To alleviate this problem, we use a boosted distance measure framework that finds multiple distance measures which fit the distribution of selected feature elements best for accurate similarity estimation. The new distance measures for similarity estimation are tested on two applications: stereo matching and motion tracking in video sequences. The performance of boosted distance measure is further evaluated on several benchmark data sets from the UCI repository and two image retrieval applications. In all the experiments, robust results are obtained based on the proposed methods.

  13. CADC Advanced Search

    Science.gov (United States)

    Jenkins, D. N.

    2012-09-01

    The Canadian Astronomy Data Centre's (CADC) Advanced Search web application is a modern search tool to access data across the CADC archives. It allows searching in different units, and is well averse in wild card characters and numeric operations. Search results are displayed in a sortable and filterable manner allowing quick and accurate access to downloadable data. The Advanced Search interface makes extremely good use of the Astronomical Data Query Language (ADQL) to scour the Common Archive Observation Model (CAOM) Table Access Protocol (TAP) query service and the vast CADC Archive Data (AD) storage system. A new tabular view of the query form and the results data makes it easy to view the query, then return to the query form to make further changes, or, alternatively, filter the data from the paginated table. Results are displayed using a rich, open-source, JavaScript-based VOTable viewer called voview.

  14. Groundwater recharge: Accurately representing evapotranspiration

    CSIR Research Space (South Africa)

    Bugan, Richard DH

    2011-09-01

    Full Text Available Groundwater recharge is the basis for accurate estimation of groundwater resources, for determining the modes of water allocation and groundwater resource susceptibility to climate change. Accurate estimations of groundwater recharge with models...

  15. Search Recipes

    Science.gov (United States)

    ... Tips A to Z Map Search Enter your search term 98 results • Advanced Search Everything News Videos e- ... usda.gov https://www.whatscooking.fns.usda.gov/search/solr-results/im_field_term_program/child-nutrition-cnp-163 We would like ...

  16. Search Patterns

    CERN Document Server

    Morville, Peter

    2010-01-01

    What people are saying about Search Patterns "Search Patterns is a delight to read -- very thoughtful and thought provoking. It's the most comprehensive survey of designing effective search experiences I've seen." --Irene Au, Director of User Experience, Google "I love this book! Thanks to Peter and Jeffery, I now know that search (yes, boring old yucky who cares search) is one of the coolest ways around of looking at the world." --Dan Roam, author, The Back of the Napkin (Portfolio Hardcover) "Search Patterns is a playful guide to the practical concerns of search interface design. It cont

  17. Niche Genetic Algorithm with Accurate Optimization Performance

    Institute of Scientific and Technical Information of China (English)

    LIU Jian-hua; YAN De-kun

    2005-01-01

    Based on crowding mechanism, a novel niche genetic algorithm was proposed which can record evolutionary direction dynamically during evolution. After evolution, the solutions's precision can be greatly improved by means of the local searching along the recorded direction. Simulation shows that this algorithm can not only keep population diversity but also find accurate solutions. Although using this method has to take more time compared with the standard GA, it is really worth applying to some cases that have to meet a demand for high solution precision.

  18. Query Language for Complex Similarity Queries

    CERN Document Server

    Budikova, Petra; Zezula, Pavel

    2012-01-01

    For complex data types such as multimedia, traditional data management methods are not suitable. Instead of attribute matching approaches, access methods based on object similarity are becoming popular. Recently, this resulted in an intensive research of indexing and searching methods for the similarity-based retrieval. Nowadays, many efficient methods are already available, but using them to build an actual search system still requires specialists that tune the methods and build the system manually. Several attempts have already been made to provide a more convenient high-level interface in a form of query languages for such systems, but these are limited to support only basic similarity queries. In this paper, we propose a new language that allows to formulate content-based queries in a flexible way, taking into account the functionality offered by a particular search engine in use. To ensure this, the language is based on a general data model with an abstract set of operations. Consequently, the language s...

  19. Similarity transformations of MAPs

    Directory of Open Access Journals (Sweden)

    Andersen Allan T.

    1999-01-01

    Full Text Available We introduce the notion of similar Markovian Arrival Processes (MAPs and show that the event stationary point processes related to two similar MAPs are stochastically equivalent. This holds true for the time stationary point processes too. We show that several well known stochastical equivalences as e.g. that between the H 2 renewal process and the Interrupted Poisson Process (IPP can be expressed by the similarity transformations of MAPs. In the appendix the valid region of similarity transformations for two-state MAPs is characterized.

  20. Search Combinators

    CERN Document Server

    Schrijvers, Tom; Wuille, Pieter; Samulowitz, Horst; Stuckey, Peter J

    2012-01-01

    The ability to model search in a constraint solver can be an essential asset for solving combinatorial problems. However, existing infrastructure for defining search heuristics is often inadequate. Either modeling capabilities are extremely limited or users are faced with a general-purpose programming language whose features are not tailored towards writing search heuristics. As a result, major improvements in performance may remain unexplored. This article introduces search combinators, a lightweight and solver-independent method that bridges the gap between a conceptually simple modeling language for search (high-level, functional and naturally compositional) and an efficient implementation (low-level, imperative and highly non-modular). By allowing the user to define application-tailored search strategies from a small set of primitives, search combinators effectively provide a rich domain-specific language (DSL) for modeling search to the user. Remarkably, this DSL comes at a low implementation cost to the...

  1. Measuring Personalization of Web Search

    DEFF Research Database (Denmark)

    Hannak, Aniko; Sapiezynski, Piotr; Kakhki, Arash Molavi

    2013-01-01

    Web search is an integral part of our daily lives. Recently, there has been a trend of personalization in Web search, where different users receive different results for the same search query. The increasing personalization is leading to concerns about Filter Bubble effects, where certain users...... are simply unable to access information that the search engines’ algorithm decidesis irrelevant. Despitetheseconcerns, there has been little quantification of the extent of personalization in Web search today, or the user attributes that cause it. In light of this situation, we make three contributions....... First, we develop a methodology for measuring personalization in Web search results. While conceptually simple, there are numerous details that our methodology must handle in order to accurately attribute differences in search results to personalization. Second, we apply our methodology to 200 users...

  2. Clustering by Pattern Similarity

    Institute of Scientific and Technical Information of China (English)

    Hai-xun Wang; Jian Pei

    2008-01-01

    The task of clustering is to identify classes of similar objects among a set of objects. The definition of similarity varies from one clustering model to another. However, in most of these models the concept of similarity is often based on such metrics as Manhattan distance, Euclidean distance or other Lp distances. In other words, similar objects must have close values in at least a set of dimensions. In this paper, we explore a more general type of similarity. Under the pCluster model we proposed, two objects are similar if they exhibit a coherent pattern on a subset of dimensions. The new similarity concept models a wide range of applications. For instance, in DNA microarray analysis, the expression levels of two genes may rise and fall synchronously in response to a set of environmental stimuli. Although the magnitude of their expression levels may not be close, the patterns they exhibit can be very much alike. Discovery of such clusters of genes is essential in revealing significant connections in gene regulatory networks. E-commerce applications, such as collaborative filtering, can also benefit from the new model, because it is able to capture not only the closeness of values of certain leading indicators but also the closeness of (purchasing, browsing, etc.) patterns exhibited by the customers. In addition to the novel similarity model, this paper also introduces an effective and efficient algorithm to detect such clusters, and we perform tests on several real and synthetic data sets to show its performance.

  3. Judgments of brand similarity

    NARCIS (Netherlands)

    Bijmolt, THA; Wedel, M; Pieters, RGM; DeSarbo, WS

    This paper provides empirical insight into the way consumers make pairwise similarity judgments between brands, and how familiarity with the brands, serial position of the pair in a sequence, and the presentation format affect these judgments. Within the similarity judgment process both the

  4. New Similarity Functions

    DEFF Research Database (Denmark)

    Yazdani, Hossein; Ortiz-Arroyo, Daniel; Kwasnicka, Halina

    2016-01-01

    In data science, there are important parameters that affect the accuracy of the algorithms used. Some of these parameters are: the type of data objects, the membership assignments, and distance or similarity functions. This paper discusses similarity functions as fundamental elements in membership...

  5. Judgments of brand similarity

    NARCIS (Netherlands)

    Bijmolt, THA; Wedel, M; Pieters, RGM; DeSarbo, WS

    1998-01-01

    This paper provides empirical insight into the way consumers make pairwise similarity judgments between brands, and how familiarity with the brands, serial position of the pair in a sequence, and the presentation format affect these judgments. Within the similarity judgment process both the formatio

  6. Visual search

    NARCIS (Netherlands)

    Toet, A.; Bijl, P.

    2003-01-01

    Visual search, with or without the aid of optical or electro-optical instruments, plays a significant role in various types of military and civilian operations (e.g., reconnaissance, surveillance, and search and rescue). Advance knowledge of human visual search and target acquisition performance is

  7. New Similarity Functions

    DEFF Research Database (Denmark)

    Yazdani, Hossein; Ortiz-Arroyo, Daniel; Kwasnicka, Halina

    2016-01-01

    In data science, there are important parameters that affect the accuracy of the algorithms used. Some of these parameters are: the type of data objects, the membership assignments, and distance or similarity functions. This paper discusses similarity functions as fundamental elements in membership...... assignments. The paper introduces Weighted Feature Distance (WFD), and Prioritized Weighted Feature Distance (PWFD), two new distance functions that take into account the diversity in feature spaces. WFD functions perform better in supervised and unsupervised methods by comparing data objects on their feature...... spaces, in addition to their similarity in the vector space. Prioritized Weighted Feature Distance (PWFD) works similarly as WFD, but provides the ability to give priorities to desirable features. The accuracy of the proposed functions are compared with other similarity functions on several data sets...

  8. The semantic similarity ensemble

    Directory of Open Access Journals (Sweden)

    Andrea Ballatore

    2013-12-01

    Full Text Available Computational measures of semantic similarity between geographic terms provide valuable support across geographic information retrieval, data mining, and information integration. To date, a wide variety of approaches to geo-semantic similarity have been devised. A judgment of similarity is not intrinsically right or wrong, but obtains a certain degree of cognitive plausibility, depending on how closely it mimics human behavior. Thus selecting the most appropriate measure for a specific task is a significant challenge. To address this issue, we make an analogy between computational similarity measures and soliciting domain expert opinions, which incorporate a subjective set of beliefs, perceptions, hypotheses, and epistemic biases. Following this analogy, we define the semantic similarity ensemble (SSE as a composition of different similarity measures, acting as a panel of experts having to reach a decision on the semantic similarity of a set of geographic terms. The approach is evaluated in comparison to human judgments, and results indicate that an SSE performs better than the average of its parts. Although the best member tends to outperform the ensemble, all ensembles outperform the average performance of each ensemble's member. Hence, in contexts where the best measure is unknown, the ensemble provides a more cognitively plausible approach.

  9. Similar component analysis

    Institute of Scientific and Technical Information of China (English)

    ZHANG Hong; WANG Xin; LI Junwei; CAO Xianguang

    2006-01-01

    A new unsupervised feature extraction method called similar component analysis (SCA) is proposed in this paper. SCA method has a self-aggregation property that the data objects will move towards each other to form clusters through SCA theoretically,which can reveal the inherent pattern of similarity hidden in the dataset. The inputs of SCA are just the pairwise similarities of the dataset,which makes it easier for time series analysis due to the variable length of the time series. Our experimental results on many problems have verified the effectiveness of SCA on some engineering application.

  10. Trajectory similarity join in spatial networks

    KAUST Repository

    Shang, Shuo

    2017-09-07

    The matching of similar pairs of objects, called similarity join, is fundamental functionality in data management. We consider the case of trajectory similarity join (TS-Join), where the objects are trajectories of vehicles moving in road networks. Thus, given two sets of trajectories and a threshold θ, the TS-Join returns all pairs of trajectories from the two sets with similarity above θ. This join targets applications such as trajectory near-duplicate detection, data cleaning, ridesharing recommendation, and traffic congestion prediction. With these applications in mind, we provide a purposeful definition of similarity. To enable efficient TS-Join processing on large sets of trajectories, we develop search space pruning techniques and take into account the parallel processing capabilities of modern processors. Specifically, we present a two-phase divide-and-conquer algorithm. For each trajectory, the algorithm first finds similar trajectories. Then it merges the results to achieve a final result. The algorithm exploits an upper bound on the spatiotemporal similarity and a heuristic scheduling strategy for search space pruning. The algorithm\\'s per-trajectory searches are independent of each other and can be performed in parallel, and the merging has constant cost. An empirical study with real data offers insight in the performance of the algorithm and demonstrates that is capable of outperforming a well-designed baseline algorithm by an order of magnitude.

  11. Gender similarities and differences.

    Science.gov (United States)

    Hyde, Janet Shibley

    2014-01-01

    Whether men and women are fundamentally different or similar has been debated for more than a century. This review summarizes major theories designed to explain gender differences: evolutionary theories, cognitive social learning theory, sociocultural theory, and expectancy-value theory. The gender similarities hypothesis raises the possibility of theorizing gender similarities. Statistical methods for the analysis of gender differences and similarities are reviewed, including effect sizes, meta-analysis, taxometric analysis, and equivalence testing. Then, relying mainly on evidence from meta-analyses, gender differences are reviewed in cognitive performance (e.g., math performance), personality and social behaviors (e.g., temperament, emotions, aggression, and leadership), and psychological well-being. The evidence on gender differences in variance is summarized. The final sections explore applications of intersectionality and directions for future research.

  12. Cluster Tree Based Hybrid Document Similarity Measure

    Directory of Open Access Journals (Sweden)

    M. Varshana Devi

    2015-10-01

    Full Text Available similarity measure is established to measure the hybrid similarity. In cluster tree, the hybrid similarity measure can be calculated for the random data even it may not be the co-occurred and generate different views. Different views of tree can be combined and choose the one which is significant in cost. A method is proposed to combine the multiple views. Multiple views are represented by different distance measures into a single cluster. Comparing the cluster tree based hybrid similarity with the traditional statistical methods it gives the better feasibility for intelligent based search. It helps in improving the dimensionality reduction and semantic analysis.

  13. NNLOPS accurate associated HW production

    CERN Document Server

    Astill, William; Re, Emanuele; Zanderighi, Giulia

    2016-01-01

    We present a next-to-next-to-leading order accurate description of associated HW production consistently matched to a parton shower. The method is based on reweighting events obtained with the HW plus one jet NLO accurate calculation implemented in POWHEG, extended with the MiNLO procedure, to reproduce NNLO accurate Born distributions. Since the Born kinematics is more complex than the cases treated before, we use a parametrization of the Collins-Soper angles to reduce the number of variables required for the reweighting. We present phenomenological results at 13 TeV, with cuts suggested by the Higgs Cross Section Working Group.

  14. Music Retrieval based on Melodic Similarity

    NARCIS (Netherlands)

    Typke, R.

    2007-01-01

    This thesis introduces a method for measuring melodic similarity for notated music such as MIDI files. This music search algorithm views music as sets of notes that are represented as weighted points in the two-dimensional space of time and pitch. Two point sets can be compared by calculating how mu

  15. Efficient Similarity Retrieval in Music Databases

    DEFF Research Database (Denmark)

    Ruxanda, Maria Magdalena; Jensen, Christian Søndergaard

    2006-01-01

    Audio music is increasingly becoming available in digital form, and the digital music collections of individuals continue to grow. Addressing the need for effective means of retrieving music from such collections, this paper proposes new techniques for content-based similarity search. Each music ...

  16. Improved Search Techniques

    Science.gov (United States)

    Albornoz, Caleb Ronald

    2012-01-01

    Thousands of millions of documents are stored and updated daily in the World Wide Web. Most of the information is not efficiently organized to build knowledge from the stored data. Nowadays, search engines are mainly used by users who rely on their skills to look for the information needed. This paper presents different techniques search engine users can apply in Google Search to improve the relevancy of search results. According to the Pew Research Center, the average person spends eight hours a month searching for the right information. For instance, a company that employs 1000 employees wastes $2.5 million dollars on looking for nonexistent and/or not found information. The cost is very high because decisions are made based on the information that is readily available to use. Whenever the information necessary to formulate an argument is not available or found, poor decisions may be made and mistakes will be more likely to occur. Also, the survey indicates that only 56% of Google users feel confident with their current search skills. Moreover, just 76% of the information that is available on the Internet is accurate.

  17. Similarity or difference?

    DEFF Research Database (Denmark)

    Villadsen, Anders Ryom

    2013-01-01

    While the organizational structures and strategies of public organizations have attracted substantial research attention among public management scholars, little research has explored how these organizational core dimensions are interconnected and influenced by pressures for similarity....... In this paper I address this topic by exploring the relation between expenditure strategy isomorphism and structure isomorphism in Danish municipalities. Different literatures suggest that organizations exist in concurrent pressures for being similar to and different from other organizations in their field......-shaped relation exists between expenditure strategy isomorphism and structure isomorphism in a longitudinal quantitative study of Danish municipalities....

  18. Segmentation Similarity and Agreement

    CERN Document Server

    Fournier, Chris

    2012-01-01

    We propose a new segmentation evaluation metric, called segmentation similarity (S), that quantifies the similarity between two segmentations as the proportion of boundaries that are not transformed when comparing them using edit distance, essentially using edit distance as a penalty function and scaling penalties by segmentation size. We propose several adapted inter-annotator agreement coefficients which use S that are suitable for segmentation. We show that S is configurable enough to suit a wide variety of segmentation evaluations, and is an improvement upon the state of the art. We also propose using inter-annotator agreement coefficients to evaluate automatic segmenters in terms of human performance.

  19. Quantum search by measurement

    CERN Document Server

    Childs, A M; Farhi, E; Goldstone, J; Gutmann, S; Landahl, A J; Childs, Andrew M.; Deotto, Enrico; Farhi, Edward; Goldstone, Jeffrey; Gutmann, Sam; Landahl, Andrew J.

    2002-01-01

    We propose a quantum algorithm for solving combinatorial search problems that uses only a sequence of measurements. The algorithm is similar in spirit to quantum computation by adiabatic evolution, in that the goal is to remain in the ground state of a time-varying Hamiltonian. Indeed, we show that the running times of the two algorithms are closely related. We also show how to achieve the quadratic speedup for Grover's unstructured search problem with only two measurements. Finally, we discuss some similarities and differences between the adiabatic and measurement algorithms.

  20. MOST: most-similar ligand based approach to target prediction.

    Science.gov (United States)

    Huang, Tao; Mi, Hong; Lin, Cheng-Yuan; Zhao, Ling; Zhong, Linda L D; Liu, Feng-Bin; Zhang, Ge; Lu, Ai-Ping; Bian, Zhao-Xiang

    2017-03-11

    Many computational approaches have been used for target prediction, including machine learning, reverse docking, bioactivity spectra analysis, and chemical similarity searching. Recent studies have suggested that chemical similarity searching may be driven by the most-similar ligand. However, the extent of bioactivity of most-similar ligands has been oversimplified or even neglected in these studies, and this has impaired the prediction power. Here we propose the MOst-Similar ligand-based Target inference approach, namely MOST, which uses fingerprint similarity and explicit bioactivity of the most-similar ligands to predict targets of the query compound. Performance of MOST was evaluated by using combinations of different fingerprint schemes, machine learning methods, and bioactivity representations. In sevenfold cross-validation with a benchmark Ki dataset from CHEMBL release 19 containing 61,937 bioactivity data of 173 human targets, MOST achieved high average prediction accuracy (0.95 for pKi ≥ 5, and 0.87 for pKi ≥ 6). Morgan fingerprint was shown to be slightly better than FP2. Logistic Regression and Random Forest methods performed better than Naïve Bayes. In a temporal validation, the Ki dataset from CHEMBL19 were used to train models and predict the bioactivity of newly deposited ligands in CHEMBL20. MOST also performed well with high accuracy (0.90 for pKi ≥ 5, and 0.76 for pKi ≥ 6), when Logistic Regression and Morgan fingerprint were employed. Furthermore, the p values associated with explicit bioactivity were found be a robust index for removing false positive predictions. Implicit bioactivity did not offer this capability. Finally, p values generated with Logistic Regression, Morgan fingerprint and explicit activity were integrated with a false discovery rate (FDR) control procedure to reduce false positives in multiple-target prediction scenario, and the success of this strategy it was demonstrated with a case of fluanisone

  1. Faceted Search

    CERN Document Server

    Tunkelang, Daniel

    2009-01-01

    We live in an information age that requires us, more than ever, to represent, access, and use information. Over the last several decades, we have developed a modern science and technology for information retrieval, relentlessly pursuing the vision of a "memex" that Vannevar Bush proposed in his seminal article, "As We May Think." Faceted search plays a key role in this program. Faceted search addresses weaknesses of conventional search approaches and has emerged as a foundation for interactive information retrieval. User studies demonstrate that faceted search provides more

  2. Incremental Similarity and Turbulence

    DEFF Research Database (Denmark)

    Barndorff-Nielsen, Ole E.; Hedevang, Emil; Schmiegel, Jürgen

    This paper discusses the mathematical representation of an empirically observed phenomenon, referred to as Incremental Similarity. We discuss this feature from the viewpoint of stochastic processes and present a variety of non-trivial examples, including those that are of relevance for turbulence...

  3. An efficient and accurate 3D displacements tracking strategy for digital volume correlation

    KAUST Repository

    Pan, Bing

    2014-07-01

    Owing to its inherent computational complexity, practical implementation of digital volume correlation (DVC) for internal displacement and strain mapping faces important challenges in improving its computational efficiency. In this work, an efficient and accurate 3D displacement tracking strategy is proposed for fast DVC calculation. The efficiency advantage is achieved by using three improvements. First, to eliminate the need of updating Hessian matrix in each iteration, an efficient 3D inverse compositional Gauss-Newton (3D IC-GN) algorithm is introduced to replace existing forward additive algorithms for accurate sub-voxel displacement registration. Second, to ensure the 3D IC-GN algorithm that converges accurately and rapidly and avoid time-consuming integer-voxel displacement searching, a generalized reliability-guided displacement tracking strategy is designed to transfer accurate and complete initial guess of deformation for each calculation point from its computed neighbors. Third, to avoid the repeated computation of sub-voxel intensity interpolation coefficients, an interpolation coefficient lookup table is established for tricubic interpolation. The computational complexity of the proposed fast DVC and the existing typical DVC algorithms are first analyzed quantitatively according to necessary arithmetic operations. Then, numerical tests are performed to verify the performance of the fast DVC algorithm in terms of measurement accuracy and computational efficiency. The experimental results indicate that, compared with the existing DVC algorithm, the presented fast DVC algorithm produces similar precision and slightly higher accuracy at a substantially reduced computational cost. © 2014 Elsevier Ltd.

  4. The Application of Similar Image Retrieval in Electronic Commerce

    Science.gov (United States)

    Hu, YuPing; Yin, Hua; Han, Dezhi; Yu, Fei

    2014-01-01

    Traditional online shopping platform (OSP), which searches product information by keywords, faces three problems: indirect search mode, large search space, and inaccuracy in search results. For solving these problems, we discuss and research the application of similar image retrieval in electronic commerce. Aiming at improving the network customers' experience and providing merchants with the accuracy of advertising, we design a reasonable and extensive electronic commerce application system, which includes three subsystems: image search display subsystem, image search subsystem, and product information collecting subsystem. This system can provide seamless connection between information platform and OSP, on which consumers can automatically and directly search similar images according to the pictures from information platform. At the same time, it can be used to provide accuracy of internet marketing for enterprises. The experiment shows the efficiency of constructing the system. PMID:24883411

  5. The Application of Similar Image Retrieval in Electronic Commerce

    Directory of Open Access Journals (Sweden)

    YuPing Hu

    2014-01-01

    Full Text Available Traditional online shopping platform (OSP, which searches product information by keywords, faces three problems: indirect search mode, large search space, and inaccuracy in search results. For solving these problems, we discuss and research the application of similar image retrieval in electronic commerce. Aiming at improving the network customers’ experience and providing merchants with the accuracy of advertising, we design a reasonable and extensive electronic commerce application system, which includes three subsystems: image search display subsystem, image search subsystem, and product information collecting subsystem. This system can provide seamless connection between information platform and OSP, on which consumers can automatically and directly search similar images according to the pictures from information platform. At the same time, it can be used to provide accuracy of internet marketing for enterprises. The experiment shows the efficiency of constructing the system.

  6. The application of similar image retrieval in electronic commerce.

    Science.gov (United States)

    Hu, YuPing; Yin, Hua; Han, Dezhi; Yu, Fei

    2014-01-01

    Traditional online shopping platform (OSP), which searches product information by keywords, faces three problems: indirect search mode, large search space, and inaccuracy in search results. For solving these problems, we discuss and research the application of similar image retrieval in electronic commerce. Aiming at improving the network customers' experience and providing merchants with the accuracy of advertising, we design a reasonable and extensive electronic commerce application system, which includes three subsystems: image search display subsystem, image search subsystem, and product information collecting subsystem. This system can provide seamless connection between information platform and OSP, on which consumers can automatically and directly search similar images according to the pictures from information platform. At the same time, it can be used to provide accuracy of internet marketing for enterprises. The experiment shows the efficiency of constructing the system.

  7. Accurate guitar tuning by cochlear implant musicians.

    Directory of Open Access Journals (Sweden)

    Thomas Lu

    Full Text Available Modern cochlear implant (CI users understand speech but find difficulty in music appreciation due to poor pitch perception. Still, some deaf musicians continue to perform with their CI. Here we show unexpected results that CI musicians can reliably tune a guitar by CI alone and, under controlled conditions, match simultaneously presented tones to <0.5 Hz. One subject had normal contralateral hearing and produced more accurate tuning with CI than his normal ear. To understand these counterintuitive findings, we presented tones sequentially and found that tuning error was larger at ∼ 30 Hz for both subjects. A third subject, a non-musician CI user with normal contralateral hearing, showed similar trends in performance between CI and normal hearing ears but with less precision. This difference, along with electric analysis, showed that accurate tuning was achieved by listening to beats rather than discriminating pitch, effectively turning a spectral task into a temporal discrimination task.

  8. More Similar Than Different

    DEFF Research Database (Denmark)

    Pedersen, Mogens Jin

    2015-01-01

    What role do employee features play into the success of different personnel management practices for serving high performance? Using data from a randomized survey experiment among 5,982 individuals of all ages, this article examines how gender conditions the compliance effects of different...... incentive treatments—each relating to the basic content of distinct types of personnel management practices. The findings show that males and females are more similar than different in terms of the incentive treatments’ effects: Significant average effects are found for three out of five incentive...

  9. SiRen: Leveraging Similar Regions for Efficient and Accurate Variant Calling

    Science.gov (United States)

    2015-05-30

    Cloudera, EMC2, Ericsson, Facebook, Guavus, HP, Huawei, Informatica , Intel, Microsoft, NetApp, Pivotal, Samsung, Schlumberger, Splunk, Virdata and VMware...EMC2, Ericsson, Facebook, Guavus, HP, Huawei, Informatica , Intel, Microsoft, NetApp, Pivotal, Samsung, Schlumberger, Splunk, Virdata and VMware

  10. Similar dissection of sets

    CERN Document Server

    Akiyama, Shigeki; Okazaki, Ryotaro; Steiner, Wolfgang; Thuswaldner, Jörg

    2010-01-01

    In 1994, Martin Gardner stated a set of questions concerning the dissection of a square or an equilateral triangle in three similar parts. Meanwhile, Gardner's questions have been generalized and some of them are already solved. In the present paper, we solve more of his questions and treat them in a much more general context. Let $D\\subset \\mathbb{R}^d$ be a given set and let $f_1,...,f_k$ be injective continuous mappings. Does there exist a set $X$ such that $D = X \\cup f_1(X) \\cup ... \\cup f_k(X)$ is satisfied with a non-overlapping union? We prove that such a set $X$ exists for certain choices of $D$ and $\\{f_1,...,f_k\\}$. The solutions $X$ often turn out to be attractors of iterated function systems with condensation in the sense of Barnsley. Coming back to Gardner's setting, we use our theory to prove that an equilateral triangle can be dissected in three similar copies whose areas have ratio $1:1:a$ for $a \\ge (3+\\sqrt{5})/2$.

  11. Dual-target cost in visual search for multiple unfamiliar faces.

    Science.gov (United States)

    Mestry, Natalie; Menneer, Tamaryn; Cave, Kyle R; Godwin, Hayward J; Donnelly, Nick

    2017-08-01

    The efficiency of visual search for one (single-target) and either of two (dual-target) unfamiliar faces was explored to understand the manifestations of capacity and guidance limitations in face search. The visual similarity of distractor faces to target faces was manipulated using morphing (Experiments 1 and 2) and multidimensional scaling (Experiment 3). A dual-target cost was found in all experiments, evidenced by slower and less accurate search in dual- than single-target conditions. The dual-target cost was unequal across the targets, with performance being maintained on one target and reduced on the other, which we label "preferred" and "non-preferred" respectively. We calculated the capacity for each target face and show reduced capacity for representing the non-preferred target face. However, results show that the capacity for the non-preferred target can be increased when the dual-target condition is conducted after participants complete the single-target conditions. Analyses of eye movements revealed evidence for weak guidance of fixations in single-target search, and when searching for the preferred target in dual-target search. Overall, the experiments show dual-target search for faces is capacity- and guidance-limited, leading to superior search for 1 face over the other in dual-target search. However, learning faces individually may improve capacity with the second face. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  12. Similarity transformed semiclassical dynamics

    Science.gov (United States)

    Van Voorhis, Troy; Heller, Eric J.

    2003-12-01

    In this article, we employ a recently discovered criterion for selecting important contributions to the semiclassical coherent state propagator [T. Van Voorhis and E. J. Heller, Phys. Rev. A 66, 050501 (2002)] to study the dynamics of many dimensional problems. We show that the dynamics are governed by a similarity transformed version of the standard classical Hamiltonian. In this light, our selection criterion amounts to using trajectories generated with the untransformed Hamiltonian as approximate initial conditions for the transformed boundary value problem. We apply the new selection scheme to some multidimensional Henon-Heiles problems and compare our results to those obtained with the more sophisticated Herman-Kluk approach. We find that the present technique gives near-quantitative agreement with the the standard results, but that the amount of computational effort is less than Herman-Kluk requires even when sophisticated integral smoothing techniques are employed in the latter.

  13. Predicting user click behaviour in search engine advertisements

    Science.gov (United States)

    Daryaie Zanjani, Mohammad; Khadivi, Shahram

    2015-10-01

    According to the specific requirements and interests of users, search engines select and display advertisements that match user needs and have higher probability of attracting users' attention based on their previous search history. New objects such as user, advertisement or query cause a deterioration of precision in targeted advertising due to their lack of history. This article surveys this challenge. In the case of new objects, we first extract similar observed objects to the new object and then we use their history as the history of new object. Similarity between objects is measured based on correlation, which is a relation between user and advertisement when the advertisement is displayed to the user. This method is used for all objects, so it has helped us to accurately select relevant advertisements for users' queries. In our proposed model, we assume that similar users behave in a similar manner. We find that users with few queries are similar to new users. We will show that correlation between users and advertisements' keywords is high. Thus, users who pay attention to advertisements' keywords, click similar advertisements. In addition, users who pay attention to specific brand names might have similar behaviours too.

  14. Autonomous search

    CERN Document Server

    Hamadi, Youssef; Saubion, Frédéric

    2012-01-01

    Autonomous combinatorial search (AS) represents a new field in combinatorial problem solving. Its major standpoint and originality is that it considers that problem solvers must be capable of self-improvement operations. This is the first book dedicated to AS.

  15. Accurate pose estimation for forensic identification

    Science.gov (United States)

    Merckx, Gert; Hermans, Jeroen; Vandermeulen, Dirk

    2010-04-01

    In forensic authentication, one aims to identify the perpetrator among a series of suspects or distractors. A fundamental problem in any recognition system that aims for identification of subjects in a natural scene is the lack of constrains on viewing and imaging conditions. In forensic applications, identification proves even more challenging, since most surveillance footage is of abysmal quality. In this context, robust methods for pose estimation are paramount. In this paper we will therefore present a new pose estimation strategy for very low quality footage. Our approach uses 3D-2D registration of a textured 3D face model with the surveillance image to obtain accurate far field pose alignment. Starting from an inaccurate initial estimate, the technique uses novel similarity measures based on the monogenic signal to guide a pose optimization process. We will illustrate the descriptive strength of the introduced similarity measures by using them directly as a recognition metric. Through validation, using both real and synthetic surveillance footage, our pose estimation method is shown to be accurate, and robust to lighting changes and image degradation.

  16. New generation of the multimedia search engines

    Science.gov (United States)

    Mijes Cruz, Mario Humberto; Soto Aldaco, Andrea; Maldonado Cano, Luis Alejandro; López Rodríguez, Mario; Rodríguez Vázqueza, Manuel Antonio; Amaya Reyes, Laura Mariel; Cano Martínez, Elizabeth; Pérez Rosas, Osvaldo Gerardo; Rodríguez Espejo, Luis; Flores Secundino, Jesús Abimelek; Rivera Martínez, José Luis; García Vázquez, Mireya Saraí; Zamudio Fuentes, Luis Miguel; Sánchez Valenzuela, Juan Carlos; Montoya Obeso, Abraham; Ramírez Acosta, Alejandro Álvaro

    2016-09-01

    Current search engines are based upon search methods that involve the combination of words (text-based search); which has been efficient until now. However, the Internet's growing demand indicates that there's more diversity on it with each passing day. Text-based searches are becoming limited, as most of the information on the Internet can be found in different types of content denominated multimedia content (images, audio files, video files). Indeed, what needs to be improved in current search engines is: search content, and precision; as well as an accurate display of expected search results by the user. Any search can be more precise if it uses more text parameters, but it doesn't help improve the content or speed of the search itself. One solution is to improve them through the characterization of the content for the search in multimedia files. In this article, an analysis of the new generation multimedia search engines is presented, focusing the needs according to new technologies. Multimedia content has become a central part of the flow of information in our daily life. This reflects the necessity of having multimedia search engines, as well as knowing the real tasks that it must comply. Through this analysis, it is shown that there are not many search engines that can perform content searches. The area of research of multimedia search engines of new generation is a multidisciplinary area that's in constant growth, generating tools that satisfy the different needs of new generation systems.

  17. $A$ searches

    CERN Document Server

    Beacham, James

    The Standard Model of particle physics encompasses three of the four known fundamental forces of nature, and has remarkably withstood all of the experimental tests of its predictions, a fact solidified by the discovery, in 2012, of the Higgs boson. However, it cannot be the complete picture. Many measurements have been made that hint at physics beyond the Standard Model, and the main task of the high- energy experimental physics community is to conduct searches for new physics in as many di↵erent places and regimes as possible. I present three searches for new phenomena in three di↵erent high-energy collider experiments, namely, a search for events with at least three photons in the final state, which is sensitive to an exotic decay of a Higgs boson into four photons via intermediate pseudoscalar particles, a, with ATLAS, at the Large Hadron Collider; a search for a dark photon, also known as an A0 , with APEX, at Thomas Je↵erson National Accelerator Facility; and a search for a Higgs decaying into four...

  18. Efficient and accurate fragmentation methods.

    Science.gov (United States)

    Pruitt, Spencer R; Bertoni, Colleen; Brorsen, Kurt R; Gordon, Mark S

    2014-09-16

    Conspectus Three novel fragmentation methods that are available in the electronic structure program GAMESS (general atomic and molecular electronic structure system) are discussed in this Account. The fragment molecular orbital (FMO) method can be combined with any electronic structure method to perform accurate calculations on large molecular species with no reliance on capping atoms or empirical parameters. The FMO method is highly scalable and can take advantage of massively parallel computer systems. For example, the method has been shown to scale nearly linearly on up to 131 000 processor cores for calculations on large water clusters. There have been many applications of the FMO method to large molecular clusters, to biomolecules (e.g., proteins), and to materials that are used as heterogeneous catalysts. The effective fragment potential (EFP) method is a model potential approach that is fully derived from first principles and has no empirically fitted parameters. Consequently, an EFP can be generated for any molecule by a simple preparatory GAMESS calculation. The EFP method provides accurate descriptions of all types of intermolecular interactions, including Coulombic interactions, polarization/induction, exchange repulsion, dispersion, and charge transfer. The EFP method has been applied successfully to the study of liquid water, π-stacking in substituted benzenes and in DNA base pairs, solvent effects on positive and negative ions, electronic spectra and dynamics, non-adiabatic phenomena in electronic excited states, and nonlinear excited state properties. The effective fragment molecular orbital (EFMO) method is a merger of the FMO and EFP methods, in which interfragment interactions are described by the EFP potential, rather than the less accurate electrostatic potential. The use of EFP in this manner facilitates the use of a smaller value for the distance cut-off (Rcut). Rcut determines the distance at which EFP interactions replace fully quantum

  19. Accurate determination of antenna directivity

    DEFF Research Database (Denmark)

    Dich, Mikael

    1997-01-01

    The derivation of a formula for accurate estimation of the total radiated power from a transmitting antenna for which the radiated power density is known in a finite number of points on the far-field sphere is presented. The main application of the formula is determination of directivity from power......-pattern measurements. The derivation is based on the theory of spherical wave expansion of electromagnetic fields, which also establishes a simple criterion for the required number of samples of the power density. An array antenna consisting of Hertzian dipoles is used to test the accuracy and rate of convergence...

  20. Integrated Semantic Similarity Model Based on Ontology

    Institute of Scientific and Technical Information of China (English)

    LIU Ya-Jun; ZHAO Yun

    2004-01-01

    To solve the problem of the inadequacy of semantic processing in the intelligent question answering system, an integrated semantic similarity model which calculates the semantic similarity using the geometric distance and information content is presented in this paper.With the help of interrelationship between concepts, the information content of concepts and the strength of the edges in the ontology network, we can calculate the semantic similarity between two concepts and provide information for the further calculation of the semantic similarity between user's question and answers in knowlegdge base.The results of the experiments on the prototype have shown that the semantic problem in natural language processing can also be solved with the help of the knowledge and the abundant semantic information in ontology.More than 90% accuracy with less than 50 ms average searching time in the intelligent question answering prototype system based on ontology has been reached.The result is very satisfied.

  1. Capacity Planning for Vertical Search Engines

    CERN Document Server

    Badue, Claudine; Almeida, Virgilio; Baeza-Yates, Ricardo; Ribeiro-Neto, Berthier; Ziviani, Artur; Ziviani, Nivio

    2010-01-01

    Vertical search engines focus on specific slices of content, such as the Web of a single country or the document collection of a large corporation. Despite this, like general open web search engines, they are expensive to maintain, expensive to operate, and hard to design. Because of this, predicting the response time of a vertical search engine is usually done empirically through experimentation, requiring a costly setup. An alternative is to develop a model of the search engine for predicting performance. However, this alternative is of interest only if its predictions are accurate. In this paper we propose a methodology for analyzing the performance of vertical search engines. Applying the proposed methodology, we present a capacity planning model based on a queueing network for search engines with a scale typically suitable for the needs of large corporations. The model is simple and yet reasonably accurate and, in contrast to previous work, considers the imbalance in query service times among homogeneous...

  2. A Novel Personalized Web Search Model

    Institute of Scientific and Technical Information of China (English)

    ZHU Zhengyu; XU Jingqiu; TIAN Yunyan; REN Xiang

    2007-01-01

    A novel personalized Web search model is proposed.The new system, as a middleware between a user and a Web search engine, is set up on the client machine. It can learn a user's preference implicitly and then generate the user profile automatically. When the user inputs query keywords, the system can automatically generate a few personalized expansion words by computing the term-term associations according to the current user profile, and then these words together with the query keywords are submitted to a popular search engine such as Yahoo or Google.These expansion words help to express accurately the user's search intention. The new Web search model can make a common search engine personalized, that is, the search engine can return different search results to different users who input the same keywords. The experimental results show the feasibility and applicability of the presented work.

  3. Face Search at Scale.

    Science.gov (United States)

    Wang, Dayong; Otto, Charles; Jain, Anil K

    2016-06-20

    rsons of interest among the billions of shared photos on these websites. Despite significant progress in face recognition, searching a large collection of unconstrained face images remains a difficult problem. To address this challenge, we propose a face search system which combines a fast search procedure, coupled with a state-of-the-art commercial off the shelf (COTS) matcher, in a cascaded framework. Given a probe face, we first filter the large gallery of photos to find the top-k most similar faces using features learned by a convolutional neural network. The k retrieved candidates are re-ranked by combining similarities based on deep features and those output by the COTS matcher. We evaluate the proposed face search system on a gallery containing 80 million web-downloaded face images. Experimental results demonstrate that while the deep features perform worse than the COTS matcher on a mugshot dataset (93.7% vs. 98.6% TAR@FAR of 0.01%), fusing the deep features with the COTS matcher improves the overall performance (99.5% TAR@FAR of 0.01%). This shows that the learned deep features provide complementary information over representations used in state-of-the-art face matchers. On the unconstrained face image benchmarks, the performance of the learned deep features is competitive with reported accuracies. LFW database: 98.20% accuracy under the standard protocol and 88.03% TAR@FAR of 0.1% under the BLUFR protocol; IJB-A benchmark: 51.0% TAR@FAR of 0.1% (verification), rank 1 retrieval of 82.2% (closed-set search), 61.5% FNIR@FAR of 1% (open-set search). The proposed face search system offers an excellent trade-off between accuracy and scalability on galleries with millions of images. Additionally, in a face search experiment involving photos of the Tsarnaev brothers, convicted of the Boston Marathon bombing, the proposed cascade face search system could find the younger brother's (Dzhokhar Tsarnaev) photo at rank 1 in 1 second on a 5M gallery and at rank 8 in 7

  4. Fast and accurate marker-based projective registration method for uncalibrated transmission electron microscope tilt series.

    Science.gov (United States)

    Lee, Ho; Lee, Jeongjin; Shin, Yeong Gil; Lee, Rena; Xing, Lei

    2010-06-21

    This paper presents a fast and accurate marker-based automatic registration technique for aligning uncalibrated projections taken from a transmission electron microscope (TEM) with different tilt angles and orientations. Most of the existing TEM image alignment methods estimate the similarity between images using the projection model with least-squares metric and guess alignment parameters by computationally expensive nonlinear optimization schemes. Approaches based on the least-squares metric which is sensitive to outliers may cause misalignment since automatic tracking methods, though reliable, can produce a few incorrect trajectories due to a large number of marker points. To decrease the influence of outliers, we propose a robust similarity measure using the projection model with a Gaussian weighting function. This function is very effective in suppressing outliers that are far from correct trajectories and thus provides a more robust metric. In addition, we suggest a fast search strategy based on the non-gradient Powell's multidimensional optimization scheme to speed up optimization as only meaningful parameters are considered during iterative projection model estimation. Experimental results show that our method brings more accurate alignment with less computational cost compared to conventional automatic alignment methods.

  5. Internet Search Engines

    OpenAIRE

    Fatmaa El Zahraa Mohamed Abdou

    2004-01-01

    A general study about the internet search engines, the study deals main 7 points; the differance between search engines and search directories, components of search engines, the percentage of sites covered by search engines, cataloging of sites, the needed time for sites appearance in search engines, search capabilities, and types of search engines.

  6. Internet Search Engines

    Directory of Open Access Journals (Sweden)

    Fatmaa El Zahraa Mohamed Abdou

    2004-09-01

    Full Text Available A general study about the internet search engines, the study deals main 7 points; the differance between search engines and search directories, components of search engines, the percentage of sites covered by search engines, cataloging of sites, the needed time for sites appearance in search engines, search capabilities, and types of search engines.

  7. Perceptual training for visual search.

    Science.gov (United States)

    Schuster, David; Rivera, Javier; Sellers, Brittany C; Fiore, Stephen M; Jentsch, Florian

    2013-01-01

    People are better at visual search than the best fully automated methods. Despite this, visual search remains a difficult perceptual task. The goal of this investigation was to experimentally test the ways in which visual search performance could be improved through two categories of training interventions: perceptual training and conceptual training. To determine the effects of each training on a later performance task, the two types of trainings were manipulated using a between-subjects design (conceptual vs. perceptual × training present vs. training absent). Perceptual training led to speed and accuracy improvements in visual search. Issues with the design and administration of the conceptual training limited conclusions on its effectiveness but provided useful lessons for conceptual training design. The results suggest that when the visual search task involves detecting heterogeneous or otherwise unpredictable stimuli, perceptual training can improve visual search performance. Similarly, careful consideration of the performance task and training design is required to evaluate the effectiveness of conceptual training. Visual search is a difficult, yet critical, task in industries such as baggage screening and radiology. This study investigated the effectiveness of perceptual training for visual search. The results suggest that when visual search involves detecting heterogeneous or otherwise unpredictable stimuli, perceptual training may improve the speed and accuracy of visual search.

  8. Practical fulltext search in medical records

    Directory of Open Access Journals (Sweden)

    Vít Volšička

    2015-09-01

    Full Text Available Performing a search through previously existing documents, including medical reports, is an integral part of acquiring new information and educational processes. Unfortunately, finding relevant information is not always easy, since many documents are saved in free text formats, thereby making it difficult to search through them. A full-text search is a viable solution for searching through documents. The full-text search makes it possible to efficiently search through large numbers of documents and to find those that contain specific search phrases in a short time. All leading database systems currently offer full-text search, but some do not support the complex morphology of the Czech language. Apache Solr provides full support options and some full-text libraries. This programme provides the good support of the Czech language in the basic installation, and a wide range of settings and options for its deployment over any platform. The library had been satisfactorily tested using real data from the hospitals. Solr provided useful, fast, and accurate searches. However, there is still a need to make adjustments in order to receive effective search results, particularly by correcting typographical errors made not only in the text, but also when entering words in the search box and creating a list of frequently used abbreviations and synonyms for more accurate results.

  9. Accurate ab initio spin densities

    CERN Document Server

    Boguslawski, Katharina; Legeza, Örs; Reiher, Markus

    2012-01-01

    We present an approach for the calculation of spin density distributions for molecules that require very large active spaces for a qualitatively correct description of their electronic structure. Our approach is based on the density-matrix renormalization group (DMRG) algorithm to calculate the spin density matrix elements as basic quantity for the spatially resolved spin density distribution. The spin density matrix elements are directly determined from the second-quantized elementary operators optimized by the DMRG algorithm. As an analytic convergence criterion for the spin density distribution, we employ our recently developed sampling-reconstruction scheme [J. Chem. Phys. 2011, 134, 224101] to build an accurate complete-active-space configuration-interaction (CASCI) wave function from the optimized matrix product states. The spin density matrix elements can then also be determined as an expectation value employing the reconstructed wave function expansion. Furthermore, the explicit reconstruction of a CA...

  10. The Accurate Particle Tracer Code

    CERN Document Server

    Wang, Yulei; Qin, Hong; Yu, Zhi

    2016-01-01

    The Accurate Particle Tracer (APT) code is designed for large-scale particle simulations on dynamical systems. Based on a large variety of advanced geometric algorithms, APT possesses long-term numerical accuracy and stability, which are critical for solving multi-scale and non-linear problems. Under the well-designed integrated and modularized framework, APT serves as a universal platform for researchers from different fields, such as plasma physics, accelerator physics, space science, fusion energy research, computational mathematics, software engineering, and high-performance computation. The APT code consists of seven main modules, including the I/O module, the initialization module, the particle pusher module, the parallelization module, the field configuration module, the external force-field module, and the extendible module. The I/O module, supported by Lua and Hdf5 projects, provides a user-friendly interface for both numerical simulation and data analysis. A series of new geometric numerical methods...

  11. Accurate Modeling of Advanced Reflectarrays

    DEFF Research Database (Denmark)

    Zhou, Min

    Analysis and optimization methods for the design of advanced printed re ectarrays have been investigated, and the study is focused on developing an accurate and efficient simulation tool. For the analysis, a good compromise between accuracy and efficiency can be obtained using the spectral domain...... to the POT. The GDOT can optimize for the size as well as the orientation and position of arbitrarily shaped array elements. Both co- and cross-polar radiation can be optimized for multiple frequencies, dual polarization, and several feed illuminations. Several contoured beam reflectarrays have been designed...... using the GDOT to demonstrate its capabilities. To verify the accuracy of the GDOT, two offset contoured beam reflectarrays that radiate a high-gain beam on a European coverage have been designed and manufactured, and subsequently measured at the DTU-ESA Spherical Near-Field Antenna Test Facility...

  12. Learning deep similarity in fundus photography

    Science.gov (United States)

    Chudzik, Piotr; Al-Diri, Bashir; Caliva, Francesco; Ometto, Giovanni; Hunter, Andrew

    2017-02-01

    Similarity learning is one of the most fundamental tasks in image analysis. The ability to extract similar images in the medical domain as part of content-based image retrieval (CBIR) systems has been researched for many years. The vast majority of methods used in CBIR systems are based on hand-crafted feature descriptors. The approximation of a similarity mapping for medical images is difficult due to the big variety of pixel-level structures of interest. In fundus photography (FP) analysis, a subtle difference in e.g. lesions and vessels shape and size can result in a different diagnosis. In this work, we demonstrated how to learn a similarity function for image patches derived directly from FP image data without the need of manually designed feature descriptors. We used a convolutional neural network (CNN) with a novel architecture adapted for similarity learning to accomplish this task. Furthermore, we explored and studied multiple CNN architectures. We show that our method can approximate the similarity between FP patches more efficiently and accurately than the state-of- the-art feature descriptors, including SIFT and SURF using a publicly available dataset. Finally, we observe that our approach, which is purely data-driven, learns that features such as vessels calibre and orientation are important discriminative factors, which resembles the way how humans reason about similarity. To the best of authors knowledge, this is the first attempt to approximate a visual similarity mapping in FP.

  13. Accurate thickness measurement of graphene

    Science.gov (United States)

    Shearer, Cameron J.; Slattery, Ashley D.; Stapleton, Andrew J.; Shapter, Joseph G.; Gibson, Christopher T.

    2016-03-01

    Graphene has emerged as a material with a vast variety of applications. The electronic, optical and mechanical properties of graphene are strongly influenced by the number of layers present in a sample. As a result, the dimensional characterization of graphene films is crucial, especially with the continued development of new synthesis methods and applications. A number of techniques exist to determine the thickness of graphene films including optical contrast, Raman scattering and scanning probe microscopy techniques. Atomic force microscopy (AFM), in particular, is used extensively since it provides three-dimensional images that enable the measurement of the lateral dimensions of graphene films as well as the thickness, and by extension the number of layers present. However, in the literature AFM has proven to be inaccurate with a wide range of measured values for single layer graphene thickness reported (between 0.4 and 1.7 nm). This discrepancy has been attributed to tip-surface interactions, image feedback settings and surface chemistry. In this work, we use standard and carbon nanotube modified AFM probes and a relatively new AFM imaging mode known as PeakForce tapping mode to establish a protocol that will allow users to accurately determine the thickness of graphene films. In particular, the error in measuring the first layer is reduced from 0.1-1.3 nm to 0.1-0.3 nm. Furthermore, in the process we establish that the graphene-substrate adsorbate layer and imaging force, in particular the pressure the tip exerts on the surface, are crucial components in the accurate measurement of graphene using AFM. These findings can be applied to other 2D materials.

  14. Accurate thickness measurement of graphene.

    Science.gov (United States)

    Shearer, Cameron J; Slattery, Ashley D; Stapleton, Andrew J; Shapter, Joseph G; Gibson, Christopher T

    2016-03-29

    Graphene has emerged as a material with a vast variety of applications. The electronic, optical and mechanical properties of graphene are strongly influenced by the number of layers present in a sample. As a result, the dimensional characterization of graphene films is crucial, especially with the continued development of new synthesis methods and applications. A number of techniques exist to determine the thickness of graphene films including optical contrast, Raman scattering and scanning probe microscopy techniques. Atomic force microscopy (AFM), in particular, is used extensively since it provides three-dimensional images that enable the measurement of the lateral dimensions of graphene films as well as the thickness, and by extension the number of layers present. However, in the literature AFM has proven to be inaccurate with a wide range of measured values for single layer graphene thickness reported (between 0.4 and 1.7 nm). This discrepancy has been attributed to tip-surface interactions, image feedback settings and surface chemistry. In this work, we use standard and carbon nanotube modified AFM probes and a relatively new AFM imaging mode known as PeakForce tapping mode to establish a protocol that will allow users to accurately determine the thickness of graphene films. In particular, the error in measuring the first layer is reduced from 0.1-1.3 nm to 0.1-0.3 nm. Furthermore, in the process we establish that the graphene-substrate adsorbate layer and imaging force, in particular the pressure the tip exerts on the surface, are crucial components in the accurate measurement of graphene using AFM. These findings can be applied to other 2D materials.

  15. Accurate upper body rehabilitation system using kinect.

    Science.gov (United States)

    Sinha, Sanjana; Bhowmick, Brojeshwar; Chakravarty, Kingshuk; Sinha, Aniruddha; Das, Abhijit

    2016-08-01

    The growing importance of Kinect as a tool for clinical assessment and rehabilitation is due to its portability, low cost and markerless system for human motion capture. However, the accuracy of Kinect in measuring three-dimensional body joint center locations often fails to meet clinical standards of accuracy when compared to marker-based motion capture systems such as Vicon. The length of the body segment connecting any two joints, measured as the distance between three-dimensional Kinect skeleton joint coordinates, has been observed to vary with time. The orientation of the line connecting adjoining Kinect skeletal coordinates has also been seen to differ from the actual orientation of the physical body segment. Hence we have proposed an optimization method that utilizes Kinect Depth and RGB information to search for the joint center location that satisfies constraints on body segment length and as well as orientation. An experimental study have been carried out on ten healthy participants performing upper body range of motion exercises. The results report 72% reduction in body segment length variance and 2° improvement in Range of Motion (ROM) angle hence enabling to more accurate measurements for upper limb exercises.

  16. Textual and chemical information processing: different domains but similar algorithms

    Directory of Open Access Journals (Sweden)

    Peter Willett

    2000-01-01

    Full Text Available This paper discusses the extent to which algorithms developed for the processing of textual databases are also applicable to the processing of chemical structure databases, and vice versa. Applications discussed include: an algorithm for distribution sorting that has been applied to the design of screening systems for rapid chemical substructure searching; the use of measures of inter-molecular structural similarity for the analysis of hypertext graphs; a genetic algorithm for calculating term weights for relevance feedback searching for determining whether a molecule is likely to exhibit biological activity; and the use of data fusion to combine the results of different chemical similarity searches.

  17. Conjunctive Wildcard Search over Encrypted Data

    NARCIS (Netherlands)

    Bösch, Christoph; Brinkman, Richard; Hartel, Pieter; Jonker, Willem; Jonker, Willem; Petkovic, Milan

    2011-01-01

    Searchable encryption allows a party to search over encrypted data without decrypting it. Prior schemes in the symmetric setting deal only with exact or similar keyword matches. We describe a scheme for the problem of wildcard searches over encrypted data to make search queries more flexible, provid

  18. Notions of similarity for computational biology models

    KAUST Repository

    Waltemath, Dagmar

    2016-03-21

    Computational models used in biology are rapidly increasing in complexity, size, and numbers. To build such large models, researchers need to rely on software tools for model retrieval, model combination, and version control. These tools need to be able to quantify the differences and similarities between computational models. However, depending on the specific application, the notion of similarity may greatly vary. A general notion of model similarity, applicable to various types of models, is still missing. Here, we introduce a general notion of quantitative model similarities, survey the use of existing model comparison methods in model building and management, and discuss potential applications of model comparison. To frame model comparison as a general problem, we describe a theoretical approach to defining and computing similarities based on different model aspects. Potentially relevant aspects of a model comprise its references to biological entities, network structure, mathematical equations and parameters, and dynamic behaviour. Future similarity measures could combine these model aspects in flexible, problem-specific ways in order to mimic users\\' intuition about model similarity, and to support complex model searches in databases.

  19. Estimating similarity of XML Schemas using path similarity measure

    Directory of Open Access Journals (Sweden)

    Veena Trivedi

    2012-07-01

    Full Text Available In this paper, an attempt has been made to develop an algorithm which estimates the similarity for XML Schemas using multiple similarity measures. For performing the task, the XML Schema element information has been represented in the form of string and four different similarity measure approaches have been employed. To further improve the similarity measure, an overall similarity measure has also been calculated. The approach used in this paper is a distinguished one, as it calculates the similarity between two XML schemas using four approaches and gives an integrated values for the similarity measure. Keywords-componen

  20. Search for $\

    OpenAIRE

    Astier, P.; Autiero, D.; Baldisseri, A.; Baldo-Ceolin, M.; Banner, M.; Bassompierre, G.; Benslama, K.; Besson, N.; Bird, I.; Blumenfeld, B.; Bobisut, F.; J. Bouchez; Boyd, S.; A. Bueno; Bunyatov, S.

    2003-01-01

    Neutrinos; We present the results of a search for nu_mu → nu_e oscillations in the NOMAD experiment at Cern. The experiment looked for the appearance of nu_e in a predominantly nu_mu wide-band neutrino beam at the CERN SPS. No evidence for oscillations was found. The 90% confidence limits obtained are Delta m^2 ~ 10 eV^2.

  1. Search for $\

    CERN Document Server

    Astier, Pierre; Baldisseri, Alberto; Baldo-Ceolin, Massimilla; Banner, M; Bassompierre, Gabriel; Benslama, K; Besson, N; Bird, I; Blumenfeld, B; Bobisut, F; Bouchez, J; Boyd, S; Bueno, A G; Bunyatov, S A; Camilleri, L L; Cardini, A; Cattaneo, Paolo Walter; Cavasinni, V; Cervera-Villanueva, A; Challis, R C; Chukanov, A; Collazuol, G; Conforto, G; Conta, C; Contalbrigo, M; Cousins, R D; Daniels, D; De Santo, A; Degaudenzi, H M; Del Prete, T; Di Lella, L; Dignan, T; Dumarchez, J; Feldman, G J; Ferrari, A; Ferrari, R; Ferrère, D; Flaminio, Vincenzo; Fraternali, M; Gaillard, J M; Gangler, E; Geiser, A; Geppert, D; Gibin, D; Gninenko, S N; Godley, A; Gosset, J; Gouanère, M; Grant, A; Graziani, G; Guglielmi, A M; Gómez-Cadenas, J J; Gössling, C; Hagner, C; Hernando, J; Hong, T M; Hubbard, D B; Hurst, P; Hyett, N; Iacopini, E; Joseph, C L; Juget, F R; Kent, N; Kirsanov, M M; Klimov, O; Kokkonen, J; Kovzelev, A; Krasnoperov, A V; Kustov, D; La Rotonda, L; Lacaprara, S; Lachaud, C; Lakic, B; Lanza, A; Laveder, M; Letessier-Selvon, A A; Linssen, Lucie; Ljubicic, A; Long, J; Lupi, A; Lévy, J M; Marchionni, A; Martelli, F; Mendiburu, J P; Meyer, J P; Mezzetto, Mauro; Mishra, S R; Moorhead, G F; Méchain, X; Naumov, D V; Nefedov, Yu A; Nguyen-Mau, C; Nédélec, P; Orestano, D; Pastore, F; Peak, L S; Pennacchio, E; Pessard, H; Petti, R; Placci, A; Polesello, G; Pollmann, D; Polyarush, A Yu; Popov, B; Poulsen, C; Rebuffi, L; Renò, R; Rico, J; Riemann, P; Roda, C; Rubbia, André; Salvatore, F; Schahmaneche, K; Schmidt, B; Schmidt, T; Sconza, A; Sevior, M E; Shih, D; Sillou, D; Soler, F J P; Sozzi, G; Steele, D; Stiegler, U; Stipcevic, M; Stolarczyk, T; Tareb-Reyes, M; Taylor, G N; Tereshchenko, V V; Toropin, A N; Touchard, A M; Tovey, Stuart N; Tran, M T; Tsesmelis, E; Ulrichs, J; Vacavant, L; Valdata-Nappi, M; Valuev, V Yu; Vannucci, François; Varvell, K E; Veltri, M; Vercesi, V; Vidal-Sitjes, G; Vieira, J M; Vinogradova, T G; Weber, F V; Weisse, T; Wilson, F F; Winton, L J; Yabsley, B D; Zaccone, Henri; Zuber, K; Zuccon, P; do Couto e Silva, E

    2003-01-01

    We present the results of a search for nu_mu → nu_e oscillations in the NOMAD experiment at Cern. The experiment looked for the appearance of nu_e in a predominantly nu_mu wide-band neutrino beam at the CERN SPS. No evidence for oscillations was found. The 90% confidence limits obtained are Delta m^2 ~ 10 eV^2.

  2. Search for $\

    CERN Document Server

    Astier, Pierre; Baldisseri, Alberto; Baldo-Ceolin, Massimilla; Banner, M; Bassompierre, Gabriel; Benslama, K; Besson, N; Bird, I; Blumenfeld, B; Bobisut, F; Bouchez, J; Boyd, S; Bueno, A G; Bunyatov, S; Camilleri, L L; Cardini, A; Cattaneo, Paolo Walter; Cavasinni, V; Cervera-Villanueva, A; Challis, R C; Chukanov, A; Collazuol, G; Conforto, G; Conta, C; Contalbrigo, M; Cousins, R; Daniels, D; Degaudenzi, H M; Del Prete, T; De Santo, A; Dignan, T; Di Lella, L; do Couto e Silva, E; Dumarchez, J; Ellis, M; Feldman, G J; Ferrari, R; Ferrère, D; Flaminio, Vincenzo; Fraternali, M; Gaillard, J M; Gangler, E; Geiser, A; Geppert, D; Gibin, D; Gninenko, S N; Godley, A; Gómez-Cadenas, J J; Gosset, J; Gössling, C; Gouanère, M; Grant, A; Graziani, G; Guglielmi, A M; Hagner, C; Hernando, J A; Hubbard, D B; Hurst, P; Hyett, N; Iacopini, E; Joseph, C L; Juget, F R; Kent, N; Kirsanov, M M; Klimov, O; Kokkonen, J; Kovzelev, A; Krasnoperov, A V; Kustov, D; Lacaprara, S; Lachaud, C; Lakic, B; Lanza, A; La Rotonda, L; Laveder, M; Letessier-Selvon, A A; Lévy, J M; Linssen, Lucie; Ljubicic, A; Long, J; Lupi, A; Marchionni, A; Martelli, F; Méchain, X; Mendiburu, J P; Meyer, J P; Mezzetto, Mauro; Mishra, S R; Moorhead, G F; Naumov, D V; Nédélec, P; Nefedov, Yu A; Nguyen-Mau, C; Orestano, D; Pastore, F; Peak, L S; Pennacchio, E; Pessard, H; Petti, R; Placci, A; Polesello, G; Pollmann, D; Polyarush, A Yu; Popov, B; Poulsen, C; Rebuffi, L; Renò, R; Rico, J; Riemann, P; Roda, C; Rubbia, André; Salvatore, F; Schahmaneche, K; Schmidt, B; Schmidt, T; Sconza, A; Sevior, M E; Sillou, D; Soler, F J P; Sozzi, G; Steele, D; Stiegler, U; Stipcevic, M; Stolarczyk, T; Tareb-Reyes, M; Taylor, G; Tereshchenko, V V; Toropin, A N; Touchard, A M; Tovey, Stuart N; Tran, M T; Tsesmelis, E; Ulrichs, J; Vacavant, L; Valdata-Nappi, M; Valuev, V Y; Vannucci, François; Varvell, K E; Veltri, M; Vercesi, V; Vidal-Sitjes, G; Vieira, J M; Vinogradova, T G; Weber, F V; Weisse, T; Wilson, F F; Winton, L J; Yabsley, B D; Zaccone, Henri; Zuber, K; Zuccon, P

    2001-01-01

    We present the results of a search for nu(mu)-->nu(e) oscillations in the NOMAD experiment at CERN. The experiment looked for the appearance of nu(e) in a predominantly nu(mu) wide-band neutrino beam at the CERN SPS. No evidence for oscillations was found. The 90% confidence limits obtained are delta m^2 10 eV^2.

  3. A More Accurate Fourier Transform

    CERN Document Server

    Courtney, Elya

    2015-01-01

    Fourier transform methods are used to analyze functions and data sets to provide frequencies, amplitudes, and phases of underlying oscillatory components. Fast Fourier transform (FFT) methods offer speed advantages over evaluation of explicit integrals (EI) that define Fourier transforms. This paper compares frequency, amplitude, and phase accuracy of the two methods for well resolved peaks over a wide array of data sets including cosine series with and without random noise and a variety of physical data sets, including atmospheric $\\mathrm{CO_2}$ concentrations, tides, temperatures, sound waveforms, and atomic spectra. The FFT uses MIT's FFTW3 library. The EI method uses the rectangle method to compute the areas under the curve via complex math. Results support the hypothesis that EI methods are more accurate than FFT methods. Errors range from 5 to 10 times higher when determining peak frequency by FFT, 1.4 to 60 times higher for peak amplitude, and 6 to 10 times higher for phase under a peak. The ability t...

  4. Pentaquark searches with ALICE

    CERN Document Server

    Bobulska, Dana

    2016-01-01

    In this report we present the results of the data analysis for searching for possible invariant mass signals from pentaquarks in the ALICE data. Analysis was based on filtered data from real p-Pb events at psNN=5.02 TeV collected in 2013. The motivation for this project was the recent discovery of pentaquark states by the LHCb collaboration (c ¯ cuud resonance P+ c ) [1]. The search for similar not yet observed pentaquarks is an interesting research topic [2]. In this analysis we searched for a s ¯ suud pentaquark resonance P+ s and its possible decay channel to f meson and proton. The ALICE detector is well suited for the search of certain candidates thanks to its low material budget and strong PID capabilities. Additionally we might expect the production of such particles in ALICE as in heavy-ion and proton-ion collisions the thermal models describes well the particle yields and ratios [3]. Therefore it is reasonable to expect other species of hadrons, including also possible pentaquarks, to be produced w...

  5. Autonomous Search

    CERN Document Server

    Hamadi, Youssef; Saubion, Frédéric

    2012-01-01

    Decades of innovations in combinatorial problem solving have produced better and more complex algorithms. These new methods are better since they can solve larger problems and address new application domains. They are also more complex which means that they are hard to reproduce and often harder to fine-tune to the peculiarities of a given problem. This last point has created a paradox where efficient tools are out of reach of practitioners. Autonomous search (AS) represents a new research field defined to precisely address the above challenge. Its major strength and originality consist in the

  6. Search in

    OpenAIRE

    Gaona Román, Alejandro

    2015-01-01

    "Search in" consiste en una instalación artística compuesta por escultura y video con un trasfondo conceptual sobre la identidad. Es una obra que invita al espectador a rodearla e introducirse en ella viéndose así como parte de la obra, al igual que el concepto de identidad puede vivirse desde la sensación del “yo” separado del mundo y a su vez desde el “yo” como parte de la sociedad. Nos hace viajar desde nuestros inicios como sociedad y seres conscientes hasta la actualidad, la era de las c...

  7. Similarity measures for protein ensembles

    DEFF Research Database (Denmark)

    Lindorff-Larsen, Kresten; Ferkinghoff-Borg, Jesper

    2009-01-01

    Analyses of similarities and changes in protein conformation can provide important information regarding protein function and evolution. Many scores, including the commonly used root mean square deviation, have therefore been developed to quantify the similarities of different protein conformatio...

  8. A new adaptive fast motion estimation algorithm based on local motion similarity degree (LMSD)

    Institute of Scientific and Technical Information of China (English)

    LIU Long; HAN Chongzhao; BAI Yan

    2005-01-01

    In the motion vector field adaptive search technique (MVFAST) and the predictive motion vector field adaptive search technique (PMVFAST), the size of the largest motion vector from the three adjacent blocks (left, top, top-right) is compared with the threshold to select different search scheme. But a suitable search center and search pattern will not be selected in the adaptive search technique when the adjacent motion vectors are not coherent in local region. This paper presents an efficient adaptive search algorithm. The motion vector variation degree (MVVD) is considered a reasonable factor for adaptive search selection. By the relationship between local motion similarity degree (LMSD) and the variation degree of motion vector (MVVD), the motion vectors are classified as three categories according to corresponding LMSD; then different proposed search schemes are adopted for motion estimation. The experimental results show that the proposed algorithm has a significant computational speedup compared with MVFAST and PMVFAST algorithms, and offers a similar, even better performance.

  9. Functional Similarity and Interpersonal Attraction.

    Science.gov (United States)

    Neimeyer, Greg J.; Neimeyer, Robert A.

    1981-01-01

    Students participated in dyadic disclosure exercises over a five-week period. Results indicated members of high functional similarity dyads evidenced greater attraction to one another than did members of low functional similarity dyads. "Friendship" pairs of male undergraduates displayed greater functional similarity than did…

  10. Functional Similarity and Interpersonal Attraction.

    Science.gov (United States)

    Neimeyer, Greg J.; Neimeyer, Robert A.

    1981-01-01

    Students participated in dyadic disclosure exercises over a five-week period. Results indicated members of high functional similarity dyads evidenced greater attraction to one another than did members of low functional similarity dyads. "Friendship" pairs of male undergraduates displayed greater functional similarity than did "nominal" pairs from…

  11. 38 CFR 4.46 - Accurate measurement.

    Science.gov (United States)

    2010-07-01

    ... 38 Pensions, Bonuses, and Veterans' Relief 1 2010-07-01 2010-07-01 false Accurate measurement. 4... RATING DISABILITIES Disability Ratings The Musculoskeletal System § 4.46 Accurate measurement. Accurate measurement of the length of stumps, excursion of joints, dimensions and location of scars with respect...

  12. Perceived and actual similarities in biological and adoptive families: does perceived similarity bias genetic inferences?

    Science.gov (United States)

    Scarr, S; Scarf, E; Weinberg, R A

    1980-09-01

    Critics of the adoption method to estimate the relative effects of genetic and environmental differences on behavioral development claim that important biases are created by the knowledge of biological relatedness or adoptive status. Since the 1950s, agency policy has led to nearly all adopted children knowing that they are adopted. To test the hypothesis that knowledge of biological or adoptive status influences actual similarity, we correlated absolute differences in objective test scores with ratings of similarity by adolescents and their parents in adoptive and biological families. Although biological family members see themselves as more similar than adoptive family members, there are also important generational and gender differences in perceived similarity that cut across family type. There is moderate agreement among family members on the degree of perceived similarity, but there is no correlation between perceived and actual similarity in intelligence or temperament. However, family members are more accurate about shared social attitudes. Knowledge of adoptive or biological relatedness is related to the degree of perceived similarity, but perceptions of similarity are not related to objective similarities and thus do not constitute a bias in comparisons of measured differences in intelligence or temperament in adoptive and biological families.

  13. Search Engines Selection Based on Relevance Terms%基于相关术语集的搜索引擎选择

    Institute of Scientific and Technical Information of China (English)

    欧洁

    2003-01-01

    Metasearch can effectively search distributed immense electronic resources. It is built on top of severalsearch engines, providing user with uniform access to these engines. Metasearch first passes user's query to underly-ing useful search engines, and then collects and reorganizes the results from the search engines used. It is calledsearch engines selection when metasearch selects underlying useful search engines. In this paper, we present a statis-tical method based on relevance terms to estimate the usefulness of a search engine for any given query, which is suit-able for both Boolean query and vector query. Experimental results indicate that the proposed estimation method isquite accurate, especially when the critical similarity is high between the query and the results.

  14. Web Search Engines: Search Syntax and Features.

    Science.gov (United States)

    Ojala, Marydee

    2002-01-01

    Presents a chart that explains the search syntax, features, and commands used by the 12 most widely used general Web search engines. Discusses Web standardization, expanded types of content searched, size of databases, and search engines that include both simple and advanced versions. (LRW)

  15. Web Search Engines: Search Syntax and Features.

    Science.gov (United States)

    Ojala, Marydee

    2002-01-01

    Presents a chart that explains the search syntax, features, and commands used by the 12 most widely used general Web search engines. Discusses Web standardization, expanded types of content searched, size of databases, and search engines that include both simple and advanced versions. (LRW)

  16. Similarity-Based Prediction of Travel Times for Vehicles Traveling on Known Routes

    DEFF Research Database (Denmark)

    Tiesyte, Dalia; Jensen, Christian Søndergaard

    2008-01-01

    , historical data in combination with real-time data may be used to predict the future travel times of vehicles more accurately, thus improving the experience of the users who rely on such information. We propose a Nearest-Neighbor Trajectory (NNT) technique that identifies the historical trajectory......The use of centralized, real-time position tracking is proliferating in the areas of logistics and public transportation. Real-time positions can be used to provide up-to-date information to a variety of users, and they can also be accumulated for uses in subsequent data analyses. In particular...... of vehicles that travel along known routes. In empirical studies with real data from buses, we evaluate how well the proposed distance functions are capable of predicting future vehicle movements. Second, we propose a main-memory index structure that enables incremental similarity search and that is capable...

  17. The development of organized visual search

    Science.gov (United States)

    Woods, Adam J.; Goksun, Tilbe; Chatterjee, Anjan; Zelonis, Sarah; Mehta, Anika; Smith, Sabrina E.

    2013-01-01

    Visual search plays an important role in guiding behavior. Children have more difficulty performing conjunction search tasks than adults. The present research evaluates whether developmental differences in children's ability to organize serial visual search (i.e., search organization skills) contribute to performance limitations in a typical conjunction search task. We evaluated 134 children between the ages of 2 and 17 on separate tasks measuring search for targets defined by a conjunction of features or by distinct features. Our results demonstrated that children organize their visual search better as they get older. As children's skills at organizing visual search improve they become more accurate at locating targets with conjunction of features amongst distractors, but not for targets with distinct features. Developmental limitations in children's abilities to organize their visual search of the environment are an important component of poor conjunction search in young children. In addition, our findings provide preliminary evidence that, like other visuospatial tasks, exposure to reading may influence children's spatial orientation to the visual environment when performing a visual search. PMID:23584560

  18. Comparing NEO Search Telescopes

    CERN Document Server

    Myhrvold, Nathan

    2015-01-01

    Multiple terrestrial and space-based telescopes have been proposed for detecting and tracking near-Earth objects (NEOs). Detailed simulations of the search performance of these systems have used complex computer codes that are not widely available, which hinders accurate cross- comparison of the proposals and obscures whether they have consistent assumptions. Moreover, some proposed instruments would survey infrared (IR) bands, whereas others would operate in the visible band, and differences among asteroid thermal and visible light models used in the simulations further complicate like-to-like comparisons. I use simple physical principles to estimate basic performance metrics for the ground-based Large Synoptic Survey Telescope and three space-based instruments - Sentinel, NEOCam, and a Cubesat constellation. The performance is measured against two different NEO distributions, the Bottke et al. distribution of general NEOs, and the Veres et al. distribution of earth impacting NEO. The results of the comparis...

  19. Quantum-Accurate Molecular Dynamics Potential for Tungsten

    Energy Technology Data Exchange (ETDEWEB)

    Wood, Mitchell; Thompson, Aidan P.

    2017-03-01

    The purpose of this short contribution is to report on the development of a Spectral Neighbor Analysis Potential (SNAP) for tungsten. We have focused on the characterization of elastic and defect properties of the pure material in order to support molecular dynamics simulations of plasma-facing materials in fusion reactors. A parallel genetic algorithm approach was used to efficiently search for fitting parameters optimized against a large number of objective functions. In addition, we have shown that this many-body tungsten potential can be used in conjunction with a simple helium pair potential1 to produce accurate defect formation energies for the W-He binary system.

  20. Fundamentals of database indexing and searching

    CERN Document Server

    Bhattacharya, Arnab

    2014-01-01

    Fundamentals of Database Indexing and Searching presents well-known database searching and indexing techniques. It focuses on similarity search queries, showing how to use distance functions to measure the notion of dissimilarity.After defining database queries and similarity search queries, the book organizes the most common and representative index structures according to their characteristics. The author first describes low-dimensional index structures, memory-based index structures, and hierarchical disk-based index structures. He then outlines useful distance measures and index structures

  1. A COMPARISON OF SEMANTIC SIMILARITY MODELS IN EVALUATING CONCEPT SIMILARITY

    Directory of Open Access Journals (Sweden)

    Q. X. Xu

    2012-08-01

    Full Text Available The semantic similarities are important in concept definition, recognition, categorization, interpretation, and integration. Many semantic similarity models have been established to evaluate semantic similarities of objects or/and concepts. To find out the suitability and performance of different models in evaluating concept similarities, we make a comparison of four main types of models in this paper: the geometric model, the feature model, the network model, and the transformational model. Fundamental principles and main characteristics of these models are introduced and compared firstly. Land use and land cover concepts of NLCD92 are employed as examples in the case study. The results demonstrate that correlations between these models are very high for a possible reason that all these models are designed to simulate the similarity judgement of human mind.

  2. a Comparison of Semantic Similarity Models in Evaluating Concept Similarity

    Science.gov (United States)

    Xu, Q. X.; Shi, W. Z.

    2012-08-01

    The semantic similarities are important in concept definition, recognition, categorization, interpretation, and integration. Many semantic similarity models have been established to evaluate semantic similarities of objects or/and concepts. To find out the suitability and performance of different models in evaluating concept similarities, we make a comparison of four main types of models in this paper: the geometric model, the feature model, the network model, and the transformational model. Fundamental principles and main characteristics of these models are introduced and compared firstly. Land use and land cover concepts of NLCD92 are employed as examples in the case study. The results demonstrate that correlations between these models are very high for a possible reason that all these models are designed to simulate the similarity judgement of human mind.

  3. Learning Multi-modal Similarity

    CERN Document Server

    McFee, Brian

    2010-01-01

    In many applications involving multi-media data, the definition of similarity between items is integral to several key tasks, e.g., nearest-neighbor retrieval, classification, and recommendation. Data in such regimes typically exhibits multiple modalities, such as acoustic and visual content of video. Integrating such heterogeneous data to form a holistic similarity space is therefore a key challenge to be overcome in many real-world applications. We present a novel multiple kernel learning technique for integrating heterogeneous data into a single, unified similarity space. Our algorithm learns an optimal ensemble of kernel transfor- mations which conform to measurements of human perceptual similarity, as expressed by relative comparisons. To cope with the ubiquitous problems of subjectivity and inconsistency in multi- media similarity, we develop graph-based techniques to filter similarity measurements, resulting in a simplified and robust training procedure.

  4. Renewing the Respect for Similarity

    Directory of Open Access Journals (Sweden)

    Shimon eEdelman

    2012-07-01

    Full Text Available In psychology, the concept of similarity has traditionally evoked a mixture of respect, stemmingfrom its ubiquity and intuitive appeal, and concern, due to its dependence on the framing of the problemat hand and on its context. We argue for a renewed focus on similarity as an explanatory concept, bysurveying established results and new developments in the theory and methods of similarity-preservingassociative lookup and dimensionality reduction — critical components of many cognitive functions, aswell as of intelligent data management in computer vision. We focus in particular on the growing familyof algorithms that support associative memory by performing hashing that respects local similarity, andon the uses of similarity in representing structured objects and scenes. Insofar as these similarity-basedideas and methods are useful in cognitive modeling and in AI applications, they should be included inthe core conceptual toolkit of computational neuroscience.

  5. Similarity Learning of Manifold Data.

    Science.gov (United States)

    Chen, Si-Bao; Ding, Chris H Q; Luo, Bin

    2015-09-01

    Without constructing adjacency graph for neighborhood, we propose a method to learn similarity among sample points of manifold in Laplacian embedding (LE) based on adding constraints of linear reconstruction and least absolute shrinkage and selection operator type minimization. Two algorithms and corresponding analyses are presented to learn similarity for mix-signed and nonnegative data respectively. The similarity learning method is further extended to kernel spaces. The experiments on both synthetic and real world benchmark data sets demonstrate that the proposed LE with new similarity has better visualization and achieves higher accuracy in classification.

  6. A study of Consistency in the Selection of Search Terms and Search Concepts: A Case Study in National Taiwan University

    Directory of Open Access Journals (Sweden)

    Mu-hsuan Huang

    2001-12-01

    Full Text Available This article analyzes the consistency in the selection of search terms and search contents of college and graduate students in National Taiwan University when they are using PsycLIT CD-ROM database. 31 students conducted pre-assigned searches, doing 59 searches generating 609 search terms. The study finds the consistency in selection of search terms of first level is 22.14% and second level is 35%. These results are similar with others’ researches. About the consistency in search concepts, no matter the overlaps of searched articles or judge relevant articles are lower than other researches. [Article content in Chinese

  7. Improved Scatter Search Using Cuckoo Search

    Directory of Open Access Journals (Sweden)

    Ahmed T.Sadiq Al-Obaidi

    2013-02-01

    Full Text Available The Scatter Search (SS is a deterministic strategy that has been applied successfully to some combinatorial and continuous optimization problems. Cuckoo Search (CS is heuristic search algorithm which is inspired by the reproduction strategy of cuckoos. This paper presents enhanced scatter search algorithm using CS algorithm. The improvement provides Scatter Search with random exploration for search space of problem and more of diversity and intensification for promising solutions. The original and improved Scatter Search has been tested on Traveling Salesman Problem. A computational experiment with benchmark instances is reported. The results demonstrate that the improved Scatter Search algorithms produce better performance than original Scatter Search algorithm. The improvement in the value of average fitness is 23.2% comparing with original SS. The developed algorithm has been compared with other algorithms for the same problem, and the result was competitive with some algorithm and insufficient with another.

  8. Multiple-Goal Heuristic Search

    CERN Document Server

    Davidov, D; 10.1613/jair.1940

    2011-01-01

    This paper presents a new framework for anytime heuristic search where the task is to achieve as many goals as possible within the allocated resources. We show the inadequacy of traditional distance-estimation heuristics for tasks of this type and present alternative heuristics that are more appropriate for multiple-goal search. In particular, we introduce the marginal-utility heuristic, which estimates the cost and the benefit of exploring a subtree below a search node. We developed two methods for online learning of the marginal-utility heuristic. One is based on local similarity of the partial marginal utility of sibling nodes, and the other generalizes marginal-utility over the state feature space. We apply our adaptive and non-adaptive multiple-goal search algorithms to several problems, including focused crawling, and show their superiority over existing methods.

  9. Attacks on Local Searching Tools

    CERN Document Server

    Nielson, Seth James; Wallach, Dan S

    2011-01-01

    The Google Desktop Search is an indexing tool, currently in beta testing, designed to allow users fast, intuitive, searching for local files. The principle interface is provided through a local web server which supports an interface similar to Google.com's normal web page. Indexing of local files occurs when the system is idle, and understands a number of common file types. A optional feature is that Google Desktop can integrate a short summary of a local search results with Google.com web searches. This summary includes 30-40 character snippets of local files. We have uncovered a vulnerability that would release private local data to an unauthorized remote entity. Using two different attacks, we expose the small snippets of private local data to a remote third party.

  10. The Search for Directed Intelligence

    CERN Document Server

    Lubin, Philip

    2016-01-01

    We propose a search for sources of directed energy systems such as those now becoming technologically feasible on Earth. Recent advances in our own abilities allow us to foresee our own capability that will radically change our ability to broadcast our presence. We show that systems of this type have the ability to be detected at vast distances and indeed can be detected across the entire horizon. This profoundly changes the possibilities for searches for extra-terrestrial technology advanced civilizations. We show that even modest searches can be extremely effective at detecting or limiting many civilization classes. We propose a search strategy that will observe more than 10 12 stellar and planetary systems with possible extensions to more than 10 20 systems allowing us to test the hypothesis that other similarly or more advanced civilization with this same capability, and are broadcasting, exist.

  11. Dynamic similarity in erosional processes

    Science.gov (United States)

    Scheidegger, A.E.

    1963-01-01

    A study is made of the dynamic similarity conditions obtaining in a variety of erosional processes. The pertinent equations for each type of process are written in dimensionless form; the similarity conditions can then easily be deduced. The processes treated are: raindrop action, slope evolution and river erosion. ?? 1963 Istituto Geofisico Italiano.

  12. Similarity of samples and trimming

    CERN Document Server

    Álvarez-Esteban, Pedro C; Cuesta-Albertos, Juan A; Matrán, Carlos; 10.3150/11-BEJ351

    2012-01-01

    We say that two probabilities are similar at level $\\alpha$ if they are contaminated versions (up to an $\\alpha$ fraction) of the same common probability. We show how this model is related to minimal distances between sets of trimmed probabilities. Empirical versions turn out to present an overfitting effect in the sense that trimming beyond the similarity level results in trimmed samples that are closer than expected to each other. We show how this can be combined with a bootstrap approach to assess similarity from two data samples.

  13. Self-similar aftershock rates

    Science.gov (United States)

    Davidsen, Jörn; Baiesi, Marco

    2016-08-01

    In many important systems exhibiting crackling noise—an intermittent avalanchelike relaxation response with power-law and, thus, self-similar distributed event sizes—the "laws" for the rate of activity after large events are not consistent with the overall self-similar behavior expected on theoretical grounds. This is particularly true for the case of seismicity, and a satisfying solution to this paradox has remained outstanding. Here, we propose a generalized description of the aftershock rates which is both self-similar and consistent with all other known self-similar features. Comparing our theoretical predictions with high-resolution earthquake data from Southern California we find excellent agreement, providing particularly clear evidence for a unified description of aftershocks and foreshocks. This may offer an improved framework for time-dependent seismic hazard assessment and earthquake forecasting.

  14. Unmixing of spectrally similar minerals

    CSIR Research Space (South Africa)

    Debba, Pravesh

    2009-01-01

    Full Text Available -bearing oxide/hydroxide/sulfate minerals in complex mixtures be obtained using hyperspectral data? Debba (CSIR) Unmixing of spectrally similar minerals MERAKA 2009 3 / 18 Method of spectral unmixing Old method: problem Linear Spectral Mixture Analysis (LSMA...

  15. Self-similar aftershock rates

    CERN Document Server

    Davidsen, Jörn

    2016-01-01

    In many important systems exhibiting crackling noise --- intermittent avalanche-like relaxation response with power-law and, thus, self-similar distributed event sizes --- the "laws" for the rate of activity after large events are not consistent with the overall self-similar behavior expected on theoretical grounds. This is in particular true for the case of seismicity and a satisfying solution to this paradox has remained outstanding. Here, we propose a generalized description of the aftershock rates which is both self-similar and consistent with all other known self-similar features. Comparing our theoretical predictions with high resolution earthquake data from Southern California we find excellent agreement, providing in particular clear evidence for a unified description of aftershocks and foreshocks. This may offer an improved way of time-dependent seismic hazard assessment and earthquake forecasting.

  16. Contextual Factors for Finding Similar Experts

    DEFF Research Database (Denmark)

    Hofmann, Katja; Balog, Krisztian; Bogers, Toine;

    2010-01-01

    Expertise-seeking research studies how people search for expertise and choose whom to contact in the context of a specific task. An important outcome are models that identify factors that influence expert finding. Expertise retrieval addresses the same problem, expert finding, but from a system......-seeking models, are rarely taken into account. In this article, we extend content-based expert-finding approaches with contextual factors that have been found to influence human expert finding. We focus on a task of science communicators in a knowledge-intensive environment, the task of finding similar experts......, given an example expert. Our approach combines expertise-seeking and retrieval research. First, we conduct a user study to identify contextual factors that may play a role in the studied task and environment. Then, we design expert retrieval models to capture these factors. We combine these with content...

  17. Community Detection by Neighborhood Similarity

    Institute of Scientific and Technical Information of China (English)

    LIU Xu; XIE Zheng; YI Dong-Yun

    2012-01-01

    Detection of the community structure in a network is important for understanding the structure and dynamics of the network.By exploring the neighborhood of vertices,a local similarity metric is proposed,which can be quickly computed.The resulting similarity matrix retains the same support as the adjacency matrix.Based on local similarity,an agglomerative hierarchical clustering algorithm is proposed for community detection.The algorithm is implemented by an efficient max-heap data structure and runs in nearly linear time,thus is capable of dealing with large sparse networks with tens of thousands of nodes.Experiments on synthesized and real-world networks demonstrate that our method is efficient to detect community structures,and the proposed metric is the most suitable one among all the tested similarity indices.%Detection of the community structure in a network is important for understanding the structure and dynamics of the network. By exploring the neighborhood of vertices, a local similarity metric is proposed, which can be quickly computed. The resulting similarity matrix retains the same support as the adjacency matrix. Based on local similarity, an agglomerative hierarchical clustering algorithm is proposed for community detection. The algorithm is implemented by an efficient max-heap data structure and runs in nearly linear time, thus is capable of dealing with large sparse networks with tens of thousands of nodes. Experiments on synthesized and real-world networks demonstrate that our method is efficient to detect community structures, and the proposed metric is the most suitable one among all the tested similarity indices.

  18. Similarity measures for protein ensembles

    DEFF Research Database (Denmark)

    Lindorff-Larsen, Kresten; Ferkinghoff-Borg, Jesper

    2009-01-01

    Analyses of similarities and changes in protein conformation can provide important information regarding protein function and evolution. Many scores, including the commonly used root mean square deviation, have therefore been developed to quantify the similarities of different protein conformations...... a synthetic example from molecular dynamics simulations. We then apply the algorithms to revisit the problem of ensemble averaging during structure determination of proteins, and find that an ensemble refinement method is able to recover the correct distribution of conformations better than standard single...

  19. Fit3D: a web application for highly accurate screening of spatial residue patterns in protein structure data.

    Science.gov (United States)

    Kaiser, Florian; Eisold, Alexander; Bittrich, Sebastian; Labudde, Dirk

    2016-03-01

    The clarification of linkage between protein structure and function is still a demanding process and can be supported by comparison of spatial residue patterns, so-called structural motifs. However, versatile up-to-date resources to search for local structure similarities are rare. We present Fit3D, an easily accessible web application for highly accurate screening of structural motifs in 3D protein data. The web application is accessible at https://biosciences.hs-mittweida.de/fit3d and program sources of the command line version were released under the terms of GNU GPLv3. Platform-independent binaries and documentations for offline usage are available at https://bitbucket.org/fkaiser/fit3d florian.kaiser@hs-mittweida.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  20. Generating personalized web search using semantic context.

    Science.gov (United States)

    Xu, Zheng; Chen, Hai-Yan; Yu, Jie

    2015-01-01

    The "one size fits the all" criticism of search engines is that when queries are submitted, the same results are returned to different users. In order to solve this problem, personalized search is proposed, since it can provide different search results based upon the preferences of users. However, existing methods concentrate more on the long-term and independent user profile, and thus reduce the effectiveness of personalized search. In this paper, the method captures the user context to provide accurate preferences of users for effectively personalized search. First, the short-term query context is generated to identify related concepts of the query. Second, the user context is generated based on the click through data of users. Finally, a forgetting factor is introduced to merge the independent user context in a user session, which maintains the evolution of user preferences. Experimental results fully confirm that our approach can successfully represent user context according to individual user information needs.

  1. Similarity of atoms in molecules

    Energy Technology Data Exchange (ETDEWEB)

    Cioslowski, J.; Nanayakkara, A. (Florida State Univ., Tallahassee, FL (United States))

    1993-12-01

    Similarity of atoms in molecules is quantitatively assessed with a measure that employs electron densities within respective atomic basins. This atomic similarity measure does not rely on arbitrary assumptions concerning basis functions or 'atomic orbitals', is relatively inexpensive to compute, and has straightforward interpretation. Inspection of similarities between pairs of carbon, hydrogen, and fluorine atoms in the CH[sub 4], CH[sub 3]F, CH[sub 2]F[sub 2], CHF[sub 3], CF[sub 4], C[sub 2]H[sub 2], C[sub 2]H[sub 4], and C[sub 2]H[sub 6] molecules, calculated at the MP2/6-311G[sup **] level of theory, reveals that the atomic similarity is greatly reduced by a change in the number or the character of ligands (i.e. the atoms with nuclei linked through bond paths to the nucleus of the atom in question). On the other hand, atoms with formally identical (i.e. having the same nuclei and numbers of ligands) ligands resemble each other to a large degree, with the similarity indices greater than 0.95 for hydrogens and 0.99 for non-hydrogens. 19 refs., 6 tabs.

  2. Quantifying Similarity in Seismic Polarizations

    Science.gov (United States)

    Eaton, D. W. S.; Jones, J. P.; Caffagni, E.

    2015-12-01

    Measuring similarity in seismic attributes can help identify tremor, low S/N signals, and converted or reflected phases, in addition to diagnosing site noise and sensor misalignment in arrays. Polarization analysis is a widely accepted method for studying the orientation and directional characteristics of seismic phases via. computed attributes, but similarity is ordinarily discussed using qualitative comparisons with reference values. Here we introduce a technique for quantitative polarization similarity that uses weighted histograms computed in short, overlapping time windows, drawing on methods adapted from the image processing and computer vision literature. Our method accounts for ambiguity in azimuth and incidence angle and variations in signal-to-noise (S/N) ratio. Using records of the Mw=8.3 Sea of Okhotsk earthquake from CNSN broadband sensors in British Columbia and Yukon Territory, Canada, and vertical borehole array data from a monitoring experiment at Hoadley gas field, central Alberta, Canada, we demonstrate that our method is robust to station spacing. Discrete wavelet analysis extends polarization similarity to the time-frequency domain in a straightforward way. Because histogram distance metrics are bounded by [0 1], clustering allows empirical time-frequency separation of seismic phase arrivals on single-station three-component records. Array processing for automatic seismic phase classification may be possible using subspace clustering of polarization similarity, but efficient algorithms are required to reduce the dimensionality.

  3. Reading and visual search: a developmental study in normal children.

    Directory of Open Access Journals (Sweden)

    Magali Seassau

    Full Text Available Studies dealing with developmental aspects of binocular eye movement behaviour during reading are scarce. In this study we have explored binocular strategies during reading and during visual search tasks in a large population of normal young readers. Binocular eye movements were recorded using an infrared video-oculography system in sixty-nine children (aged 6 to 15 and in a group of 10 adults (aged 24 to 39. The main findings are (i in both tasks the number of progressive saccades (to the right and regressive saccades (to the left decreases with age; (ii the amplitude of progressive saccades increases with age in the reading task only; (iii in both tasks, the duration of fixations as well as the total duration of the task decreases with age; (iv in both tasks, the amplitude of disconjugacy recorded during and after the saccades decreases with age; (v children are significantly more accurate in reading than in visual search after 10 years of age. Data reported here confirms and expands previous studies on children's reading. The new finding is that younger children show poorer coordination than adults, both while reading and while performing a visual search task. Both reading skills and binocular saccades coordination improve with age and children reach a similar level to adults after the age of 10. This finding is most likely related to the fact that learning mechanisms responsible for saccade yoking develop during childhood until adolescence.

  4. Searching chemical space with the Bayesian Idea Generator.

    Science.gov (United States)

    van Hoorn, Willem P; Bell, Andrew S

    2009-10-01

    The Pfizer Global Virtual Library (PGVL) is defined as a set compounds that could be synthesized using validated protocols and monomers. However, it is too large (10(12) compounds) to search by brute-force methods for close analogues of a given input structure. In this paper the Bayesian Idea Generator is described which is based on a novel application of Bayesian statistics to narrow down the search space to a prioritized set of existing library arrays (the default is 16). For each of these libraries the 6 closest neighbors are retrieved from the existing compound file, resulting in a screenable hypothesis of 96 compounds. Using the Bayesian models for library space, the Pfizer file of singleton compounds has been mapped to library space and is optionally searched as well. The method is >99% accurate in retrieving known library provenance from an independent test set. The compounds retrieved strike a balance between similarity and diversity resulting in frequent scaffold hops. Four examples of how the Bayesian Idea Generator has been successfully used in drug discovery are provided. The methodology of the Bayesian Idea Generator can be used for any collection of compounds containing distinct clusters, and an example using compound vendor catalogues has been included.

  5. Google Ajax Search API

    CERN Document Server

    Fitzgerald, Michael

    2007-01-01

    Use the Google Ajax Search API to integrateweb search, image search, localsearch, and other types of search intoyour web site by embedding a simple, dynamicsearch box to display search resultsin your own web pages using a fewlines of JavaScript. For those who do not want to write code,the search wizards and solutions builtwith the Google Ajax Search API generatecode to accomplish common taskslike adding local search results to a GoogleMaps API mashup, adding videosearch thumbnails to your web site, oradding a news reel with the latest up todate stories to your blog. More advanced users can

  6. A Survey of Binary Similarity and Distance Measures

    Directory of Open Access Journals (Sweden)

    Seung-Seok Choi

    2010-02-01

    Full Text Available The binary feature vector is one of the most common representations of patterns and measuring similarity and distance measures play a critical role in many problems such as clustering, classification, etc. Ever since Jaccard proposed a similarity measure to classify ecological species in 1901, numerous binary similarity and distance measures have been proposed in various fields. Applying appropriate measures results in more accurate data analysis. Notwithstanding, few comprehensive surveys on binary measures have been conducted. Hence we collected 76 binary similarity and distance measures used over the last century and reveal their correlations through the hierarchical clustering technique.

  7. Haptic Influence on Visual Search

    Directory of Open Access Journals (Sweden)

    Marcia Grabowecky

    2011-10-01

    Full Text Available Information from different sensory modalities influences perception and attention in tasks such as visual search. We have previously reported identity-based crossmodal influences of audition on visual search (Iordanescu, Guzman-Martinez, Grabowecky, & Suzuki, 2008; Iordanescu, Grabowecky, Franconeri, Theeuwes, and Suzuki, 2010; Iordanescu, Grabowecky and Suzuki, 2011. Here, we extend those results and demonstrate a novel crossmodal interaction between haptic shape information and visual attention. Manually-explored, but unseen, shapes facilitated visual search for similarly-shaped objects. This effect manifests as a reduction in both overall search times and initial saccade latencies when the haptic shape (eg, a sphere is consistent with a visual target (eg, an orange compared to when it is inconsistent with a visual target (eg, a hockey puck]. This haptic-visual interaction occurred even though the manually held shapes were not predictive of the visual target's shape or location, suggesting that the interaction occurs automatically. Furthermore, when the haptic shape was consistent with a distracter in the visual search array (instead of with the target, initial saccades toward the target were disrupted. Together, these results demonstrate a robust shape-specific haptic influence on visual search.

  8. Similarity measures for face recognition

    CERN Document Server

    Vezzetti, Enrico

    2015-01-01

    Face recognition has several applications, including security, such as (authentication and identification of device users and criminal suspects), and in medicine (corrective surgery and diagnosis). Facial recognition programs rely on algorithms that can compare and compute the similarity between two sets of images. This eBook explains some of the similarity measures used in facial recognition systems in a single volume. Readers will learn about various measures including Minkowski distances, Mahalanobis distances, Hansdorff distances, cosine-based distances, among other methods. The book also summarizes errors that may occur in face recognition methods. Computer scientists "facing face" and looking to select and test different methods of computing similarities will benefit from this book. The book is also useful tool for students undertaking computer vision courses.

  9. An intelligent method for geographic Web search

    Science.gov (United States)

    Mei, Kun; Yuan, Ying

    2008-10-01

    While the electronically available information in the World-Wide Web is explosively growing and thus increasing, the difficulty to find relevant information is also increasing for search engine user. In this paper we discuss how to constrain web queries geographically. A number of search queries are associated with geographical locations, either explicitly or implicitly. Accurately and effectively detecting the locations where search queries are truly about has huge potential impact on increasing search relevance, bringing better targeted search results, and improving search user satisfaction. Our approach focus on both in the way geographic information is extracted from the web and, as far as we can tell, in the way it is integrated into query processing. This paper gives an overview of a spatially aware search engine for semantic querying of web document. It also illustrates algorithms for extracting location from web documents and query requests using the location ontologies to encode and reason about formal semantics of geographic web search. Based on a real-world scenario of tourism guide search, the application of our approach shows that the geographic information retrieval can be efficiently supported.

  10. Search Engine For Ebook Portal

    Directory of Open Access Journals (Sweden)

    Prashant Kanade

    2015-08-01

    Full Text Available The purpose of this paper is to establish the textual analytics involved in developing a search engine for an ebook portal. We have extracted our dataset from Project Gutenberg using a robot harvester. Textual Analytics is used for efficient search retrieval. The entire dataset is represented using Vector Space Model where each document is a vector in the vector space. Further for computational purposes we represent our dataset in the form of a Term Frequency- Inverse Document Frequency tf-idf matrix. The first step involves obtaining the most coherent sequence of words of the search query entered. The entered query is processed using Front End algorithms this includes-Spell Checker Text Segmentation and Language Modeling. Back End processing includes Similarity Modeling Clustering Indexing and Retrieval. The relationship between documents and words is established using cosine similarity measured between the documents and words in Vector Space. Clustering performed is used to suggest books that are similar to the search query entered by the user. Lastly the Lucene Based Elasticsearch engine is used for indexing on the documents. This allows faster retrieval of data. Elasticsearch returns a dictionary and creates a tf-idf matrix. The processed query is compared with the dictionary obtained and tf-idf matrix is used to calculate the score for each match to give most relevant result.

  11. Distance learning for similarity estimation

    NARCIS (Netherlands)

    Yu, J.; Amores, J.; Sebe, N.; Radeva, P.; Tian, Q.

    2008-01-01

    In this paper, we present a general guideline to find a better distance measure for similarity estimation based on statistical analysis of distribution models and distance functions. A new set of distance measures are derived from the harmonic distance, the geometric distance, and their generalized

  12. Revisiting Inter-Genre Similarity

    DEFF Research Database (Denmark)

    Sturm, Bob L.; Gouyon, Fabien

    2013-01-01

    We revisit the idea of ``inter-genre similarity'' (IGS) for machine learning in general, and music genre recognition in particular. We show analytically that the probability of error for IGS is higher than naive Bayes classification with zero-one loss (NB). We show empirically that IGS does...

  13. Comparison of hydrological similarity measures

    Science.gov (United States)

    Rianna, Maura; Ridolfi, Elena; Manciola, Piergiorgio; Napolitano, Francesco; Russo, Fabio

    2016-04-01

    The use of a traditional at site approach for the statistical characterization and simulation of spatio-temporal precipitation fields has a major recognized drawback. Indeed, the weakness of the methodology is related to the estimation of rare events and it involves the uncertainty of the at-site sample statistical inference, because of the limited length of records. In order to overcome the lack of at-site observations, regional frequency approach uses the idea of substituting space for time to estimate design floods. The conventional regional frequency analysis estimates quantile values at a specific site from multi-site analysis. The main idea is that homogeneous sites, once pooled together, have similar probability distribution curves of extremes, except for a scaling factor. The method for pooling groups of sites can be based on geographical or climatological considerations. In this work the region of influence (ROI) pooling method is compared with an entropy-based one. The ROI is a flexible pooling group approach which defines for each site its own "region" formed by a unique set of similar stations. The similarity is found through the Euclidean distance metric in the attribute space. Here an alternative approach based on entropy is introduced to cluster homogeneous sites. The core idea is that homogeneous sites share a redundant (i.e. similar) amount of information. Homogeneous sites are pooled through a hierarchical selection based on the mutual information index (i.e. a measure of redundancy). The method is tested on precipitation data in Central Italy area.

  14. HOW DISSIMILARLY SIMILAR ARE BIOSIMILARS?

    Directory of Open Access Journals (Sweden)

    Ramshankar Vijayalakshmi

    2012-05-01

    Full Text Available Recently Biopharmaceuticals are the new chemotherapeutical agents that are called as “Biosimilars” or “follow on protein products” by the European Medicines Agency (EMA and the American regulatory agencies (Food and Drug Administration respectively. Biosimilars are extremely similar to the reference molecule but not identical, however close their similarities may be. A regulatory framework is therefore in place to assess the application for marketing authorisation of biosimilars. When a biosimilar is similar to the reference biopharmaceutical in terms of safety, quality, and efficacy, it can be registered. It is important to document data from clinical trials with a view of similar safety and efficacy. If the development time for a generic medicine is around 3 years, a biosimilar takes about 6-9 years. Generic medicines need to demonstrate bioequivalence only unlike biosimilars that need to conduct phase I and Phase III clinical trials. In this review, different biosimilars that are already being used successfully in the field on Oncology is discussed. Their similarity, differences and guidelines to be followed before a clinically informed decision to be taken, is discussed. More importantly the regulatory guidelines that are operational in India with a work flow of making a biosimilar with relevant dos and dont’s are discussed. For a large populous country like India, where with improved treatments in all sectors including oncology, our ageing population is increasing. For the health care of this sector, we need more newer, cheaper and effective biosimilars in the market. It becomes therefore important to understand the regulatory guidelines and steps to come up with more biosimilars for the existing population and also more information is mandatory for the practicing clinicians to translate these effectively into clinical practice.

  15. AN FFT-BASED SELF-SIMILAR TRAFFIC GENERATOR

    Institute of Scientific and Technical Information of China (English)

    施建俊; 薛质; 诸鸿文

    2001-01-01

    The self-similarity of the network traffic has great influences on the performance. But there are few analytical or even numerical solutions for such a model. So simulation becomes the most efficient method for research. Fractal Gaussian noise (FGN) is the most popularly used self-similar model. This paper presented an FGN generator based on fast Fourier transform (FFT). The study indicates that this algorithm is fairly fast and accurate.

  16. Slowed Search in the Context of Unimpaired Grouping in Autism: Evidence from Multiple Conjunction Search.

    Science.gov (United States)

    Keehn, Brandon; Joseph, Robert M

    2016-03-01

    In multiple conjunction search, the target is not known in advance but is defined only with respect to the distractors in a given search array, thus reducing the contributions of bottom-up and top-down attentional and perceptual processes during search. This study investigated whether the superior visual search skills typically demonstrated by individuals with autism spectrum disorder (ASD) would be evident in multiple conjunction search. Thirty-two children with ASD and 32 age- and nonverbal IQ-matched typically developing (TD) children were administered a multiple conjunction search task. Contrary to findings from the large majority of studies on visual search in ASD, response times of individuals with ASD were significantly slower than those of their TD peers. Evidence of slowed performance in ASD suggests that the mechanisms responsible for superior ASD performance in other visual search paradigms are not available in multiple conjunction search. Although the ASD group failed to exhibit superior performance, they showed efficient search and intertrial priming levels similar to the TD group. Efficient search indicates that ASD participants were able to group distractors into distinct subsets. In summary, while demonstrating grouping and priming effects comparable to those exhibited by their TD peers, children with ASD were slowed in their performance on a multiple conjunction search task, suggesting that their usual superior performance in visual search tasks is specifically dependent on top-down and/or bottom-up attentional and perceptual processes.

  17. Enhancing Divergent Search through Extinction Events

    DEFF Research Database (Denmark)

    Lehman, Joel; Miikkulainen, Risto

    2015-01-01

    A challenge in evolutionary computation is to create representations as evolvable as those in natural evolution. This paper hypothesizes that extinction events, i.e. mass extinctions, can significantly increase evolvability, but only when combined with a divergent search algorithm, i.e. a search...... for the capacity to evolve. This hypothesis is tested through experiments in two evolutionary robotics domains. The results show that combining extinction events with divergent search increases evolvability, while combining them with convergent search offers no similar benefit. The conclusion is that extinction...

  18. Laboratory Building for Accurate Determination of Plutonium

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    <正>The accurate determination of plutonium is one of the most important assay techniques of nuclear fuel, also the key of the chemical measurement transfer and the base of the nuclear material balance. An

  19. Understanding the Code: keeping accurate records.

    Science.gov (United States)

    Griffith, Richard

    2015-10-01

    In his continuing series looking at the legal and professional implications of the Nursing and Midwifery Council's revised Code of Conduct, Richard Griffith discusses the elements of accurate record keeping under Standard 10 of the Code. This article considers the importance of accurate record keeping for the safety of patients and protection of district nurses. The legal implications of records are explained along with how district nurses should write records to ensure these legal requirements are met.

  20. Landscape similarity, retrieval, and machine mapping of physiographic units

    Science.gov (United States)

    Jasiewicz, Jaroslaw; Netzel, Pawel; Stepinski, Tomasz F.

    2014-09-01

    We introduce landscape similarity - a numerical measure that assesses affinity between two landscapes on the basis of similarity between the patterns of their constituent landform elements. Such a similarity function provides core technology for a landscape search engine - an algorithm that parses the topography of a study area and finds all places with landscapes broadly similar to a landscape template. A landscape search can yield answers to a query in real time, enabling a highly effective means to explore large topographic datasets. In turn, a landscape search facilitates auto-mapping of physiographic units within a study area. The country of Poland serves as a test bed for these novel concepts. The topography of Poland is given by a 30 m resolution DEM. The geomorphons method is applied to this DEM to classify the topography into ten common types of landform elements. A local landscape is represented by a square tile cut out of a map of landform elements. A histogram of cell-pair features is used to succinctly encode the composition and texture of a pattern within a local landscape. The affinity between two local landscapes is assessed using the Wave-Hedges similarity function applied to the two corresponding histograms. For a landscape search the study area is organized into a lattice of local landscapes. During the search the algorithm calculates the similarity between each local landscape and a given query. Our landscape search for Poland is implemented as a GeoWeb application called TerraEx-Pl and is available at http://sil.uc.edu/. Given a sample, or a number of samples, from a target physiographic unit the landscape search delineates this unit using the principles of supervised machine learning. Repeating this procedure for all units yields a complete physiographic map. The application of this methodology to topographic data of Poland results in the delineation of nine physiographic units. The resultant map bears a close resemblance to a conventional

  1. Sound Search Engine Concept

    DEFF Research Database (Denmark)

    2006-01-01

    Sound search is provided by the major search engines, however, indexing is text based, not sound based. We will establish a dedicated sound search services with based on sound feature indexing. The current demo shows the concept of the sound search engine. The first engine will be realased June...

  2. Sound Search Engine Concept

    DEFF Research Database (Denmark)

    2006-01-01

    Sound search is provided by the major search engines, however, indexing is text based, not sound based. We will establish a dedicated sound search services with based on sound feature indexing. The current demo shows the concept of the sound search engine. The first engine will be realased June...

  3. Web Search Engines

    OpenAIRE

    Rajashekar, TB

    1998-01-01

    The World Wide Web is emerging as an all-in-one information source. Tools for searching Web-based information include search engines, subject directories and meta search tools. We take a look at key features of these tools and suggest practical hints for effective Web searching.

  4. Self-similarity Driven Demosaicking

    Directory of Open Access Journals (Sweden)

    Antoni Buades

    2011-06-01

    Full Text Available Digital cameras record only one color component per pixel, red, green or blue. Demosaicking is the process by which one can infer a whole color matrix from such a matrix of values, thus interpolating the two missing color values per pixel. In this article we propose a demosaicking method based on the property of non-local self-similarity of images.

  5. Sparse Similarity-Based Fisherfaces

    DEFF Research Database (Denmark)

    Fagertun, Jens; Gomez, David Delgado; Hansen, Mads Fogtmann;

    2011-01-01

    In this work, the effect of introducing Sparse Principal Component Analysis within the Similarity-based Fisherfaces algorithm is examined. The technique aims at mimicking the human ability to discriminate faces by projecting the faces in a highly discriminative and easy interpretative way. Pixel...... obtain the same recognition results as the technique in a dense version using only a fraction of the input data. Furthermore, the presented results suggest that using SPCA in the technique offers robustness to occlusions....

  6. Roget's Thesaurus and Semantic Similarity

    CERN Document Server

    Jarmasz, Mario

    2012-01-01

    We have implemented a system that measures semantic similarity using a computerized 1987 Roget's Thesaurus, and evaluated it by performing a few typical tests. We compare the results of these tests with those produced by WordNet-based similarity measures. One of the benchmarks is Miller and Charles' list of 30 noun pairs to which human judges had assigned similarity measures. We correlate these measures with those computed by several NLP systems. The 30 pairs can be traced back to Rubenstein and Goodenough's 65 pairs, which we have also studied. Our Roget's-based system gets correlations of .878 for the smaller and .818 for the larger list of noun pairs; this is quite close to the .885 that Resnik obtained when he employed humans to replicate the Miller and Charles experiment. We further evaluate our measure by using Roget's and WordNet to answer 80 TOEFL, 50 ESL and 300 Reader's Digest questions: the correct synonym must be selected amongst a group of four words. Our system gets 78.75%, 82.00% and 74.33% of ...

  7. Self-Similar Collisionless Shocks

    CERN Document Server

    Katz, B; Waxman, E; Katz, Boaz; Keshet, Uri; Waxman, Eli

    2006-01-01

    Observations of gamma-ray burst afterglows suggest that the correlation length of magnetic field fluctuations downstream of relativistic non-magnetized collisionless shocks grows with distance from the shock to scales much larger than the plasma skin depth. We argue that this indicates that the plasma properties are described by a self-similar solution, and derive constraints on the scaling properties of the solution. For example, we find that the scaling of the characteristic magnetic field amplitude with distance from the shock is B \\propto D^{s_B} with -1 \\propto x^{2s_B} (for x>>D). We show that the plasma may be approximated as a combination of two self-similar components: a kinetic component of energetic particles and an MHD-like component representing "thermal" particles. We argue that the latter may be considered as infinitely conducting, in which case s_B=0 and the scalings are completely determined (e.g. dn/dE \\propto E^{-2} and B \\propto D^0). Similar claims apply to non- relativistic shocks such a...

  8. Gait Recognition Using Image Self-Similarity

    Directory of Open Access Journals (Sweden)

    Cutler Ross G

    2004-01-01

    Full Text Available Gait is one of the few biometrics that can be measured at a distance, and is hence useful for passive surveillance as well as biometric applications. Gait recognition research is still at its infancy, however, and we have yet to solve the fundamental issue of finding gait features which at once have sufficient discrimination power and can be extracted robustly and accurately from low-resolution video. This paper describes a novel gait recognition technique based on the image self-similarity of a walking person. We contend that the similarity plot encodes a projection of gait dynamics. It is also correspondence-free, robust to segmentation noise, and works well with low-resolution video. The method is tested on multiple data sets of varying sizes and degrees of difficulty. Performance is best for fronto-parallel viewpoints, whereby a recognition rate of 98% is achieved for a data set of 6 people, and 70% for a data set of 54 people.

  9. Large Neighborhood Search

    DEFF Research Database (Denmark)

    Pisinger, David; Røpke, Stefan

    2010-01-01

    Heuristics based on large neighborhood search have recently shown outstanding results in solving various transportation and scheduling problems. Large neighborhood search methods explore a complex neighborhood by use of heuristics. Using large neighborhoods makes it possible to find better...... candidate solutions in each iteration and hence traverse a more promising search path. Starting from the large neighborhood search method,we give an overview of very large scale neighborhood search methods and discuss recent variants and extensions like variable depth search and adaptive large neighborhood...... search....

  10. Simulated Milky Way analogues: implications for dark matter indirect searches

    CERN Document Server

    Calore, F; Lovell, M; Bertone, G; Schaller, M; Frenk, C S; Crain, R A; Schaye, J; Theuns, T; Trayford, J W

    2015-01-01

    We study high-resolution hydrodynamic simulations of Milky Way type galaxies obtained within the "Evolution and Assembly of GaLaxies and their Environments" (EAGLE) project, and identify the those that best satisfy observational constraints on the Milky Way total stellar mass, rotation curve, and galaxy shape. Contrary to mock galaxies selected on the basis of their total virial mass, the Milky Way analogues so identified consistently exhibit very similar dark matter profiles inside the solar circle, therefore enabling more accurate predictions for indirect dark matter searches. We find in particular that high resolution simulated haloes satisfying observational constraints exhibit, within the inner few kiloparsecs, dark matter profiles shallower than those required to explain the so-called Fermi GeV excess via dark matter annihilation.

  11. Rotational invariant similarity measurement for content-based image indexing

    Science.gov (United States)

    Ro, Yong M.; Yoo, Kiwon

    2000-04-01

    We propose a similarity matching technique for contents based image retrieval. The proposed technique is invariant from rotated images. Since image contents for image indexing and retrieval would be arbitrarily extracted from still image or key frame of video, the rotation invariant property of feature description of image is important for general application of contents based image indexing and retrieval. In this paper, we propose a rotation invariant similarity measurement in cooperating with texture featuring base on HVS. To simplify computational complexity, we employed hierarchical similarity distance searching. To verify the method, experiments with MPEG-7 data set are performed.

  12. Fast and accurate determination of modularity and its effect size

    CERN Document Server

    Treviño, Santiago; Del Genio, Charo I; Bassler, Kevin E

    2014-01-01

    We present a fast spectral algorithm for community detection in complex networks. Our method searches for the partition with the maximum value of the modularity via the interplay of several refinement steps that include both agglomeration and division. We validate the accuracy of the algorithm by applying it to several real-world benchmark networks. On all these, our algorithm performs as well or better than any other known polynomial scheme. This allows us to extensively study the modularity distribution in ensembles of Erd\\H{o}s-R\\'enyi networks, producing theoretical predictions for means and variances inclusive of finite-size corrections. Our work provides a way to accurately estimate the effect size of modularity, providing a $z$-score measure of it and enabling a more informative comparison of networks with different numbers of nodes and links.

  13. Assessing protein kinase target similarity

    DEFF Research Database (Denmark)

    Gani, Osman A; Thakkar, Balmukund; Narayanan, Dilip

    2015-01-01

    : focussed chemical libraries, drug repurposing, polypharmacological design, to name a few. Protein kinase target similarity is easily quantified by sequence, and its relevance to ligand design includes broad classification by key binding sites, evaluation of resistance mutations, and the use of surrogate......" of sequence and crystal structure information, with statistical methods able to identify key correlates to activity but also here, "the devil is in the details." Examples from specific repurposing and polypharmacology applications illustrate these points. This article is part of a Special Issue entitled...

  14. The Search Performance Evaluation and Prediction in Exploratory Search

    OpenAIRE

    2016-01-01

    The exploratory search for complex search tasks requires an effective search behavior model to evaluate and predict user search performance. Few studies have investigated the relationship between user search behavior and search performance in exploratory search. This research adopts a mixed approach combining search system development, user search experiment, search query log analysis, and multivariate regression analysis to resolve the knowledge gap. Through this study, it is shown that expl...

  15. The Search Performance Evaluation and Prediction in Exploratory Search

    OpenAIRE

    Liu, Fei

    2016-01-01

    The exploratory search for complex search tasks requires an effective search behavior model to evaluate and predict user search performance. Few studies have investigated the relationship between user search behavior and search performance in exploratory search. This research adopts a mixed approach combining search system development, user search experiment, search query log analysis, and multivariate regression analysis to resolve the knowledge gap. Through this study, it is shown that expl...

  16. Accurate discrimination of conserved coding and non-coding regions through multiple indicators of evolutionary dynamics

    Directory of Open Access Journals (Sweden)

    Pesole Graziano

    2009-09-01

    Full Text Available Abstract Background The conservation of sequences between related genomes has long been recognised as an indication of functional significance and recognition of sequence homology is one of the principal approaches used in the annotation of newly sequenced genomes. In the context of recent findings that the number non-coding transcripts in higher organisms is likely to be much higher than previously imagined, discrimination between conserved coding and non-coding sequences is a topic of considerable interest. Additionally, it should be considered desirable to discriminate between coding and non-coding conserved sequences without recourse to the use of sequence similarity searches of protein databases as such approaches exclude the identification of novel conserved proteins without characterized homologs and may be influenced by the presence in databases of sequences which are erroneously annotated as coding. Results Here we present a machine learning-based approach for the discrimination of conserved coding sequences. Our method calculates various statistics related to the evolutionary dynamics of two aligned sequences. These features are considered by a Support Vector Machine which designates the alignment coding or non-coding with an associated probability score. Conclusion We show that our approach is both sensitive and accurate with respect to comparable methods and illustrate several situations in which it may be applied, including the identification of conserved coding regions in genome sequences and the discrimination of coding from non-coding cDNA sequences.

  17. Mechanisms for similarity based cooperation

    Science.gov (United States)

    Traulsen, A.

    2008-06-01

    Cooperation based on similarity has been discussed since Richard Dawkins introduced the term “green beard” effect. In these models, individuals cooperate based on an aribtrary signal (or tag) such as the famous green beard. Here, two different models for such tag based cooperation are analysed. As neutral drift is important in both models, a finite population framework is applied. The first model, which we term “cooperative tags” considers a situation in which groups of cooperators are formed by some joint signal. Defectors adopting the signal and exploiting the group can lead to a breakdown of cooperation. In this case, conditions are derived under which the average abundance of the more cooperative strategy exceeds 50%. The second model considers a situation in which individuals start defecting towards others that are not similar to them. This situation is termed “defective tags”. It is shown that in this case, individuals using tags to cooperate exclusively with their own kind dominate over unconditional cooperators.

  18. A Short Survey of Document Structure Similarity Algorithms

    Energy Technology Data Exchange (ETDEWEB)

    Buttler, D

    2004-02-27

    This paper provides a brief survey of document structural similarity algorithms, including the optimal Tree Edit Distance algorithm and various approximation algorithms. The approximation algorithms include the simple weighted tag similarity algorithm, Fourier transforms of the structure, and a new application of the shingle technique to structural similarity. We show three surprising results. First, the Fourier transform technique proves to be the least accurate of any of approximation algorithms, while also being slowest. Second, optimal Tree Edit Distance algorithms may not be the best technique for clustering pages from different sites. Third, the simplest approximation to structure may be the most effective and efficient mechanism for many applications.

  19. White Noise in Quantum Random Walk Search Algorithm

    Institute of Scientific and Technical Information of China (English)

    MA Lei; DU Jiang-Feng; LI Yun; LI Hui; KWEK L. C.; OH C. H.

    2006-01-01

    @@ The quantum random walk is a possible approach to construct new quantum search algorithms. It has been shown by Shenvi et al. [Phys. Rev. A 67 (2003)52307] that a kind of algorithm can perform an oracle search on a database of N items with O(√N) calling to the oracle, yielding a speedup similar to other quantum search algorithms.

  20. Dynamic Search and Working Memory in Social Recall

    Science.gov (United States)

    Hills, Thomas T.; Pachur, Thorsten

    2012-01-01

    What are the mechanisms underlying search in social memory (e.g., remembering the people one knows)? Do the search mechanisms involve dynamic local-to-global transitions similar to semantic search, and are these transitions governed by the general control of attention, associated with working memory span? To find out, we asked participants to…

  1. Dynamic Search and Working Memory in Social Recall

    Science.gov (United States)

    Hills, Thomas T.; Pachur, Thorsten

    2012-01-01

    What are the mechanisms underlying search in social memory (e.g., remembering the people one knows)? Do the search mechanisms involve dynamic local-to-global transitions similar to semantic search, and are these transitions governed by the general control of attention, associated with working memory span? To find out, we asked participants to…

  2. Relativistic mergers of black hole binaries have large, similar masses, low spins and are circular

    Science.gov (United States)

    Amaro-Seoane, Pau; Chen, Xian

    2016-05-01

    Gravitational waves are a prediction of general relativity, and with ground-based detectors now running in their advanced configuration, we will soon be able to measure them directly for the first time. Binaries of stellar-mass black holes are among the most interesting sources for these detectors. Unfortunately, the many different parameters associated with the problem make it difficult to promptly produce a large set of waveforms for the search in the data stream. To reduce the number of templates to develop, one must restrict some of the physical parameters to a certain range of values predicted by either (electromagnetic) observations or theoretical modelling. In this work, we show that `hyperstellar' black holes (HSBs) with masses 30 ≲ MBH/M⊙ ≲ 100, i.e black holes significantly larger than the nominal 10 M⊙, will have an associated low value for the spin, i.e. a < 0.5. We prove that this is true regardless of the formation channel, and that when two HSBs build a binary, each of the spin magnitudes is also low, and the binary members have similar masses. We also address the distribution of the eccentricities of HSB binaries in dense stellar systems using a large suite of three-body scattering experiments that include binary-single interactions and long-lived hierarchical systems with a highly accurate integrator, including relativistic corrections up to O(1/c^5). We find that most sources in the detector band will have nearly zero eccentricities. This correlation between large, similar masses, low spin and low eccentricity will help to accelerate the searches for gravitational-wave signals.

  3. The Search for Another Earth

    Indian Academy of Sciences (India)

    2016-07-01

    Is there life anywhere else in the vast cosmos?Are there planets similar to the Earth? For centuries,these questions baffled curious minds. Eithera positive or negative answer, if found oneday, would carry a deep philosophical significancefor our very existence in the universe. Althoughthe search for extra-terrestrial intelligence wasinitiated decades ago, a systematic scientific andglobal quest towards achieving a convincing answerbegan in 1995 with the discovery of the firstconfirmed planet orbiting around the solar-typestar 51 Pegasi. Since then, astronomers have discoveredmany exoplanets using two main techniques,radial velocity and transit measurements.In the first part of this article, we shall describethe different astronomical methods through whichthe extrasolar planets of various kinds are discovered.In the second part of the article we shalldiscuss the various kinds of exoplanets, in particularabout the habitable planets discovered tilldate and the present status of our search for ahabitable planet similar to the Earth.

  4. Interneurons targeting similar layers receive synaptic inputs with similar kinetics.

    Science.gov (United States)

    Cossart, Rosa; Petanjek, Zdravko; Dumitriu, Dani; Hirsch, June C; Ben-Ari, Yehezkel; Esclapez, Monique; Bernard, Christophe

    2006-01-01

    GABAergic interneurons play diverse and important roles in controlling neuronal network dynamics. They are characterized by an extreme heterogeneity morphologically, neurochemically, and physiologically, but a functionally relevant classification is still lacking. Present taxonomy is essentially based on their postsynaptic targets, but a physiological counterpart to this classification has not yet been determined. Using a quantitative analysis based on multidimensional clustering of morphological and physiological variables, we now demonstrate a strong correlation between the kinetics of glutamate and GABA miniature synaptic currents received by CA1 hippocampal interneurons and the laminar distribution of their axons: neurons that project to the same layer(s) receive synaptic inputs with similar kinetics distributions. In contrast, the kinetics distributions of GABAergic and glutamatergic synaptic events received by a given interneuron do not depend upon its somatic location or dendritic arborization. Although the mechanisms responsible for this unexpected observation are still unclear, our results suggest that interneurons may be programmed to receive synaptic currents with specific temporal dynamics depending on their targets and the local networks in which they operate.

  5. Accurate tracking control in LOM application

    Institute of Scientific and Technical Information of China (English)

    2003-01-01

    The fabrication of accurate prototype from CAD model directly in short time depends on the accurate tracking control and reference trajectory planning in (Laminated Object Manufacture) LOM application. An improvement on contour accuracy is acquired by the introduction of a tracking controller and a trajectory generation policy. A model of the X-Y positioning system of LOM machine is developed as the design basis of tracking controller. The ZPETC (Zero Phase Error Tracking Controller) is used to eliminate single axis following error, thus reduce the contour error. The simulation is developed on a Maltab model based on a retrofitted LOM machine and the satisfied result is acquired.

  6. SIMILARITIES AND DIFFERENCES BETWEEN COMPANIES

    Directory of Open Access Journals (Sweden)

    NAGY CRISTINA MIHAELA

    2015-05-01

    Full Text Available act: Similarities between the accounting of companies and territorial administrative units accounting are the following: organizing double entry accounting; accounting method both in terms of fundamental theoretical principles and specific practical tools. The differences between the accounting of companies and of territorial administrative units refer to: the accounting of territorial administrative units includes besides general accounting (financial also budgetary accounting, and the accounts system of the budgetary accounting is completely different from that of companies; financial statements of territorial administrative units to which leaders are not main authorizing officers are submitted to the hierarchically superior body (not at MPF; the accounts of territorial administrative units are opened at treasury and financial institutions, accounts at commercial banks being prohibited; equity accounts in territorial administrative units are structured into groups of funds; long term debts have a specific structure in territorial administrative units (internal local public debt and external local public debt.

  7. Performance Indexes: Similarities and Differences

    Directory of Open Access Journals (Sweden)

    André Machado Caldeira

    2013-06-01

    Full Text Available The investor of today is more rigorous on monitoring a financial assets portfolio. He no longer thinks only in terms of the expected return (one dimension, but in terms of risk-return (two dimensions. Thus new perception is more complex, since the risk measurement can vary according to anyone’s perception; some use the standard deviation for that, others disagree with this measure by proposing others. In addition to this difficulty, there is the problem of how to consider these two dimensions. The objective of this essay is to study the main performance indexes through an empirical study in order to verify the differences and similarities for some of the selected assets. One performance index proposed in Caldeira (2005 shall be included in this analysis.

  8. Features Based Text Similarity Detection

    CERN Document Server

    Kent, Chow Kok

    2010-01-01

    As the Internet help us cross cultural border by providing different information, plagiarism issue is bound to arise. As a result, plagiarism detection becomes more demanding in overcoming this issue. Different plagiarism detection tools have been developed based on various detection techniques. Nowadays, fingerprint matching technique plays an important role in those detection tools. However, in handling some large content articles, there are some weaknesses in fingerprint matching technique especially in space and time consumption issue. In this paper, we propose a new approach to detect plagiarism which integrates the use of fingerprint matching technique with four key features to assist in the detection process. These proposed features are capable to choose the main point or key sentence in the articles to be compared. Those selected sentence will be undergo the fingerprint matching process in order to detect the similarity between the sentences. Hence, time and space usage for the comparison process is r...

  9. Semantic Features for Classifying Referring Search Terms

    Energy Technology Data Exchange (ETDEWEB)

    May, Chandler J.; Henry, Michael J.; McGrath, Liam R.; Bell, Eric B.; Marshall, Eric J.; Gregory, Michelle L.

    2012-05-11

    When an internet user clicks on a result in a search engine, a request is submitted to the destination web server that includes a referrer field containing the search terms given by the user. Using this information, website owners can analyze the search terms leading to their websites to better understand their visitors needs. This work explores some of the features that can be used for classification-based analysis of such referring search terms. We present initial results for the example task of classifying HTTP requests countries of origin. A system that can accurately predict the country of origin from query text may be a valuable complement to IP lookup methods which are susceptible to the obfuscation of dereferrers or proxies. We suggest that the addition of semantic features improves classifier performance in this example application. We begin by looking at related work and presenting our approach. After describing initial experiments and results, we discuss paths forward for this work.

  10. Optimal directed searches for continuous gravitational waves

    CERN Document Server

    Ming, Jing; Papa, Maria Alessandra; Aulbert, Carsten; Fehrmann, Henning

    2015-01-01

    Wide parameter space searches for long lived continuous gravitational wave signals are computationally limited. It is therefore critically important that available computational resources are used rationally. In this paper we consider directed searches, i.e. targets for which the sky position is known accurately but the frequency and spindown parameters are completely unknown. Given a list of such potential astrophysical targets, we therefore need to prioritize. On which target(s) should we spend scarce computing resources? What parameter space region in frequency and spindown should we search? Finally, what is the optimal search set-up that we should use? In this paper we present a general framework that allows to solve all three of these problems. This framework is based on maximizing the probability of making a detection subject to a constraint on the maximum available computational cost. We illustrate the method for a simplified problem.

  11. Accurate source location from P waves scattered by surface topography

    Science.gov (United States)

    Wang, N.; Shen, Y.

    2015-12-01

    Accurate source locations of earthquakes and other seismic events are fundamental in seismology. The location accuracy is limited by several factors, including velocity models, which are often poorly known. In contrast, surface topography, the largest velocity contrast in the Earth, is often precisely mapped at the seismic wavelength (> 100 m). In this study, we explore the use of P-coda waves generated by scattering at surface topography to obtain high-resolution locations of near-surface seismic events. The Pacific Northwest region is chosen as an example. The grid search method is combined with the 3D strain Green's tensor database type method to improve the search efficiency as well as the quality of hypocenter solution. The strain Green's tensor is calculated by the 3D collocated-grid finite difference method on curvilinear grids. Solutions in the search volume are then obtained based on the least-square misfit between the 'observed' and predicted P and P-coda waves. A 95% confidence interval of the solution is also provided as a posterior error estimation. We find that the scattered waves are mainly due to topography in comparison with random velocity heterogeneity characterized by the von Kάrmάn-type power spectral density function. When only P wave data is used, the 'best' solution is offset from the real source location mostly in the vertical direction. The incorporation of P coda significantly improves solution accuracy and reduces its uncertainty. The solution remains robust with a range of random noises in data, un-modeled random velocity heterogeneities, and uncertainties in moment tensors that we tested.

  12. Accurate source location from waves scattered by surface topography

    Science.gov (United States)

    Wang, Nian; Shen, Yang; Flinders, Ashton; Zhang, Wei

    2016-06-01

    Accurate source locations of earthquakes and other seismic events are fundamental in seismology. The location accuracy is limited by several factors, including velocity models, which are often poorly known. In contrast, surface topography, the largest velocity contrast in the Earth, is often precisely mapped at the seismic wavelength (>100 m). In this study, we explore the use of P coda waves generated by scattering at surface topography to obtain high-resolution locations of near-surface seismic events. The Pacific Northwest region is chosen as an example to provide realistic topography. A grid search algorithm is combined with the 3-D strain Green's tensor database to improve search efficiency as well as the quality of hypocenter solutions. The strain Green's tensor is calculated using a 3-D collocated-grid finite difference method on curvilinear grids. Solutions in the search volume are obtained based on the least squares misfit between the "observed" and predicted P and P coda waves. The 95% confidence interval of the solution is provided as an a posteriori error estimation. For shallow events tested in the study, scattering is mainly due to topography in comparison with stochastic lateral velocity heterogeneity. The incorporation of P coda significantly improves solution accuracy and reduces solution uncertainty. The solution remains robust with wide ranges of random noises in data, unmodeled random velocity heterogeneities, and uncertainties in moment tensors. The method can be extended to locate pairs of sources in close proximity by differential waveforms using source-receiver reciprocity, further reducing errors caused by unmodeled velocity structures.

  13. Measurement and Improvement of Subject Specialists Performance Searching Chemical Abstracts Online as Available on SDC Search Systems.

    Science.gov (United States)

    Copeland, Richard F.; And Others

    The first phase of a project to design a prompting system to help semi-experienced end users to search Chemical Abstracts online, this study focused on the differences and similarities in the search approaches used by experienced users and those with less expertise. Four online searches on topics solicited from chemistry professors in small…

  14. MMDB and VAST+: tracking structural similarities between macromolecular complexes.

    Science.gov (United States)

    Madej, Thomas; Lanczycki, Christopher J; Zhang, Dachuan; Thiessen, Paul A; Geer, Renata C; Marchler-Bauer, Aron; Bryant, Stephen H

    2014-01-01

    The computational detection of similarities between protein 3D structures has become an indispensable tool for the detection of homologous relationships, the classification of protein families and functional inference. Consequently, numerous algorithms have been developed that facilitate structure comparison, including rapid searches against a steadily growing collection of protein structures. To this end, NCBI's Molecular Modeling Database (MMDB), which is based on the Protein Data Bank (PDB), maintains a comprehensive and up-to-date archive of protein structure similarities computed with the Vector Alignment Search Tool (VAST). These similarities have been recorded on the level of single proteins and protein domains, comprising in excess of 1.5 billion pairwise alignments. Here we present VAST+, an extension to the existing VAST service, which summarizes and presents structural similarity on the level of biological assemblies or macromolecular complexes. VAST+ simplifies structure neighboring results and shows, for macromolecular complexes tracked in MMDB, lists of similar complexes ranked by the extent of similarity. VAST+ replaces the previous VAST service as the default presentation of structure neighboring data in NCBI's Entrez query and retrieval system. MMDB and VAST+ can be accessed via http://www.ncbi.nlm.nih.gov/Structure.

  15. A similarity-based data warehousing environment for medical images.

    Science.gov (United States)

    Teixeira, Jefferson William; Annibal, Luana Peixoto; Felipe, Joaquim Cezar; Ciferri, Ricardo Rodrigues; Ciferri, Cristina Dutra de Aguiar

    2015-11-01

    A core issue of the decision-making process in the medical field is to support the execution of analytical (OLAP) similarity queries over images in data warehousing environments. In this paper, we focus on this issue. We propose imageDWE, a non-conventional data warehousing environment that enables the storage of intrinsic features taken from medical images in a data warehouse and supports OLAP similarity queries over them. To comply with this goal, we introduce the concept of perceptual layer, which is an abstraction used to represent an image dataset according to a given feature descriptor in order to enable similarity search. Based on this concept, we propose the imageDW, an extended data warehouse with dimension tables specifically designed to support one or more perceptual layers. We also detail how to build an imageDW and how to load image data into it. Furthermore, we show how to process OLAP similarity queries composed of a conventional predicate and a similarity search predicate that encompasses the specification of one or more perceptual layers. Moreover, we introduce an index technique to improve the OLAP query processing over images. We carried out performance tests over a data warehouse environment that consolidated medical images from exams of several modalities. The results demonstrated the feasibility and efficiency of our proposed imageDWE to manage images and to process OLAP similarity queries. The results also demonstrated that the use of the proposed index technique guaranteed a great improvement in query processing.

  16. Analysis of a librarian-mediated literature search service.

    Science.gov (United States)

    Friesen, Carol; Lê, Mê-Linh; Cooke, Carol; Raynard, Melissa

    2015-01-01

    Librarian-mediated literature searching is a key service provided at medical libraries. This analysis outlines ten years of data on 19,248 literature searches and describes information on the volume and frequency of search requests, time spent per search, databases used, and professional designations of the patron requestors. Combined with information on best practices for expert searching and evaluations of similar services, these findings were used to form recommendations on the improvement and standardization of a literature search service at a large health library system.

  17. Fast and accurate methods for phylogenomic analyses

    Directory of Open Access Journals (Sweden)

    Warnow Tandy

    2011-10-01

    Full Text Available Abstract Background Species phylogenies are not estimated directly, but rather through phylogenetic analyses of different gene datasets. However, true gene trees can differ from the true species tree (and hence from one another due to biological processes such as horizontal gene transfer, incomplete lineage sorting, and gene duplication and loss, so that no single gene tree is a reliable estimate of the species tree. Several methods have been developed to estimate species trees from estimated gene trees, differing according to the specific algorithmic technique used and the biological model used to explain differences between species and gene trees. Relatively little is known about the relative performance of these methods. Results We report on a study evaluating several different methods for estimating species trees from sequence datasets, simulating sequence evolution under a complex model including indels (insertions and deletions, substitutions, and incomplete lineage sorting. The most important finding of our study is that some fast and simple methods are nearly as accurate as the most accurate methods, which employ sophisticated statistical methods and are computationally quite intensive. We also observe that methods that explicitly consider errors in the estimated gene trees produce more accurate trees than methods that assume the estimated gene trees are correct. Conclusions Our study shows that highly accurate estimations of species trees are achievable, even when gene trees differ from each other and from the species tree, and that these estimations can be obtained using fairly simple and computationally tractable methods.

  18. Accurate Switched-Voltage voltage averaging circuit

    OpenAIRE

    金光, 一幸; 松本, 寛樹

    2006-01-01

    Abstract ###This paper proposes an accurate Switched-Voltage (SV) voltage averaging circuit. It is presented ###to compensated for NMOS missmatch error at MOS differential type voltage averaging circuit. ###The proposed circuit consists of a voltage averaging and a SV sample/hold (S/H) circuit. It can ###operate using nonoverlapping three phase clocks. Performance of this circuit is verified by PSpice ###simulations.

  19. Accurate overlaying for mobile augmented reality

    NARCIS (Netherlands)

    Pasman, W; van der Schaaf, A; Lagendijk, RL; Jansen, F.W.

    1999-01-01

    Mobile augmented reality requires accurate alignment of virtual information with objects visible in the real world. We describe a system for mobile communications to be developed to meet these strict alignment criteria using a combination of computer vision. inertial tracking and low-latency renderi

  20. Accurate overlaying for mobile augmented reality

    NARCIS (Netherlands)

    Pasman, W; van der Schaaf, A; Lagendijk, RL; Jansen, F.W.

    1999-01-01

    Mobile augmented reality requires accurate alignment of virtual information with objects visible in the real world. We describe a system for mobile communications to be developed to meet these strict alignment criteria using a combination of computer vision. inertial tracking and low-latency

  1. Combination of visual and textual similarity retrieval from medical documents.

    Science.gov (United States)

    Eggel, Ivan; Müller, Henning

    2009-01-01

    Medical visual information retrieval has been an active research area over the past ten years as an increasing amount of images are produced digitally and have become available in patient records, scientific literature, and other medical documents. Most visual retrieval systems concentrate on images only, but it has become apparent that the retrieval of similar images alone is of limited interest, and rather the retrieval of similar documents is an important domain. Most medical institutions as well as the World Health Organization (WHO) produce many complex documents. Searching them, including a visual search, can help finding important information and also facilitates the reuse of document content and images. The work described in this paper is based on a proposal of the WHO that produces large amounts of documents from studies but also for training. The majority of these documents are in complex formats such as PDF, Microsoft Word, Excel, or PowerPoint. Goal is to create an information retrieval system that allows easy addition of documents and search by keywords and visual content. For text retrieval, Lucene is used and for image retrieval the GNU Image Finding Tool (GIFT). A Web 2.0 interface allows for an easy upload as well as simple searching.

  2. Partial Recurrent Laryngeal Nerve Paralysis or Paresis? In Search for the Accurate Diagnosis

    Directory of Open Access Journals (Sweden)

    Alexander Delides

    2015-01-01

    Full Text Available “Partial paralysis” of the larynx is a term often used to describe a hypomobile vocal fold as is the term “paresis.” We present a case of a dysphonic patient with a mobility disorder of the vocal fold, for whom idiopathic “partial paralysis” was the diagnosis made after laryngeal electromyography, and discuss a proposition for a different implementation of the term.

  3. A New Generalized Similarity-Based Topic Distillation Algorithm

    Institute of Scientific and Technical Information of China (English)

    ZHOU Hongfang; DANG Xiaohui

    2007-01-01

    The procedure of hypertext induced topic search based on a semantic relation model is analyzed, and the reason for the topic drift of HITS algorithm was found to prove that Web pages are projected to a wrong latent semantic basis. A new concept-generalized similarity is introduced and, based on this, a new topic distillation algorithm GSTDA(generalized similarity based topic distillation algorithm) was presented to improve the quality of topic distillation. GSTDA was applied not only to avoid the topic drift, but also to explore relative topics to user query. The experimental results on 10 queries show that GSTDA reduces topic drift rate by 10% to 58% compared to that of HITS(hypertext induced topic search) algorithm, and discovers several relative topics to queries that have multiple meanings.

  4. Modeling of Hysteresis in Piezoelectric Actuator Based on Segment Similarity

    Directory of Open Access Journals (Sweden)

    Rui Xiong

    2015-11-01

    Full Text Available To successfully exploit the full potential of piezoelectric actuators in micro/nano positioning systems, it is essential to model their hysteresis behavior accurately. A novel hysteresis model for piezoelectric actuator is proposed in this paper. Firstly, segment-similarity, which describes the similarity relationship between hysteresis curve segments with different turning points, is proposed. Time-scale similarity, which describes the similarity relationship between hysteresis curves with different rates, is used to solve the problem of dynamic effect. The proposed model is formulated using these similarities. Finally, the experiments are performed with respect to a micro/nano-meter movement platform system. The effectiveness of the proposed model is verified as compared with the Preisach model. The experimental results show that the proposed model is able to precisely predict the hysteresis trajectories of piezoelectric actuators and performs better than the Preisach model.

  5. Cultural similarity, cultural competence, and nurse workforce diversity.

    Science.gov (United States)

    McGinnis, Sandra L; Brush, Barbara L; Moore, Jean

    2010-11-01

    Proponents of health workforce diversity argue that increasing the number of minority health care providers will enhance cultural similarity between patients and providers as well as the health system's capacity to provide culturally competent care. Measuring cultural similarity has been difficult, however, given that current benchmarks of workforce diversity categorize health workers by major racial/ethnic classifications rather than by cultural measures. This study examined the use of national racial/ethnic categories in both patient and registered nurse (RN) populations and found them to be a poor indicator of cultural similarity. Rather, we found that cultural similarity between RN and patient populations needs to be established at the level of local labor markets and broadened to include other cultural parameters such as country of origin, primary language, and self-identified ancestry. Only then can the relationship between cultural similarity and cultural competence be accurately determined and its outcomes measured.

  6. Predicting consumer behavior with Web search.

    Science.gov (United States)

    Goel, Sharad; Hofman, Jake M; Lahaie, Sébastien; Pennock, David M; Watts, Duncan J

    2010-10-12

    Recent work has demonstrated that Web search volume can "predict the present," meaning that it can be used to accurately track outcomes such as unemployment levels, auto and home sales, and disease prevalence in near real time. Here we show that what consumers are searching for online can also predict their collective future behavior days or even weeks in advance. Specifically we use search query volume to forecast the opening weekend box-office revenue for feature films, first-month sales of video games, and the rank of songs on the Billboard Hot 100 chart, finding in all cases that search counts are highly predictive of future outcomes. We also find that search counts generally boost the performance of baseline models fit on other publicly available data, where the boost varies from modest to dramatic, depending on the application in question. Finally, we reexamine previous work on tracking flu trends and show that, perhaps surprisingly, the utility of search data relative to a simple autoregressive model is modest. We conclude that in the absence of other data sources, or where small improvements in predictive performance are material, search queries provide a useful guide to the near future.

  7. Searching Databases with Keywords

    Institute of Scientific and Technical Information of China (English)

    Shan Wang; Kun-Long Zhang

    2005-01-01

    Traditionally, SQL query language is used to search the data in databases. However, it is inappropriate for end-users, since it is complex and hard to learn. It is the need of end-user, searching in databases with keywords, like in web search engines. This paper presents a survey of work on keyword search in databases. It also includes a brief introduction to the SEEKER system which has been developed.

  8. Integrated vs. Federated Search

    DEFF Research Database (Denmark)

    Løvschall, Kasper

    2009-01-01

    Oplæg om forskelle og ligheder mellem integrated og federated search i bibliotekskontekst. Holdt ved temadag om "Integrated Search - samsøgning i alle kilder" på Danmarks Biblioteksskole den 22. januar 2009.......Oplæg om forskelle og ligheder mellem integrated og federated search i bibliotekskontekst. Holdt ved temadag om "Integrated Search - samsøgning i alle kilder" på Danmarks Biblioteksskole den 22. januar 2009....

  9. Semantic Web Based Efficient Search Using Ontology and Mathematical Model

    Directory of Open Access Journals (Sweden)

    K.Palaniammal

    2014-01-01

    Full Text Available The semantic web is the forthcoming technology in the world of search engine. It becomes mainly focused towards the search which is more meaningful rather than the syntactic search prevailing now. This proposed work concerns about the semantic search with respect to the educational domain. In this paper, we propose semantic web based efficient search using ontology and mathematical model that takes into account the misleading, unmatched kind of service information, lack of relevant domain knowledge and the wrong service queries. To solve these issues in this framework is designed to make three major contributions, which are ontology knowledge base, Natural Language Processing (NLP techniques and search model. Ontology knowledge base is to store domain specific service ontologies and service description entity (SDE metadata. The search model is to retrieve SDE metadata as efficient for Education lenders, which include mathematical model. The Natural language processing techniques for spell-check and synonym based search. The results are retrieved and stored in an ontology, which in terms prevents the data redundancy. The results are more accurate to search, sensitive to spell check and synonymous context. This paper reduces the user’s time and complexity in finding for the correct results of his/her search text and our model provides more accurate results. A series of experiments are conducted in order to respectively evaluate the mechanism and the employed mathematical model.

  10. Durham Zoo: Powering a Search-&-Innovation Engine with Collective Intelligence

    Directory of Open Access Journals (Sweden)

    Richard Absalom

    2015-02-01

    graphical representations has been used.Practical implications – Concept searching is seen as having two categories: prior art searching, which is searching for what already exists, and solution searching: a search for a novel solution to an existing problem.Prior art searching is not as efficient a process, as all encompassing in scope, or as accurate in result, as it could and probably should be. The prior art includes library collections, journals, conference proceedings and everything else that has been written, drawn, spoken or made public in any way. Much technical information is only published in patents. There is a good reason to improve prior art searching: research, industry, and indeed humanity faces the spectre of patent thickets: an impenetrable legal space that effectively hinders innovation rather than promotes it. Improved prior-art searching would help with the gardening and result in fewer and higher-quality patents. Poor-quality patents can reward patenting activity per se, which is not what the system was designed for. Improved prior-art searching could also result in less duplication in research, and/or lead to improved collaboration.As regards solution search, the authors of the paper believe that much better use could be made of the existing literature to find solutions from non-obvious areas of science and technology. The so-called cross industry innovation could be joined by biomimetics, the inspiration of solutions from nature.Crowdsourcing the concept shorthand could produce a system ‘by the people, for the people’, to quote Abraham Lincoln out of context. A Citizen Science and Technology initiative that developed a working search engine could generate revenue for academia. Any monies accruing could be invested in research for the common good, such as the development of climate change mitigation technologies, or the discovery of new antibiotics.Originality – The authors know of no similar systems in development

  11. How doctors search

    DEFF Research Database (Denmark)

    Lykke, Marianne; Price, Susan; Delcambre, Lois

    2012-01-01

    to context-specific aspects of the main topic of the documents. We have tested the model in an interactive searching study with family doctors with the purpose to explore doctors’ querying behaviour, how they applied the means for specifying a search, and how these features contributed to the search outcome...

  12. The Information Search

    Science.gov (United States)

    Doraiswamy, Uma

    2011-01-01

    This paper in the form of story discusses a college student's information search process. In this story we see Kuhlthau's information search process: initiation, selection, exploration, formulation, collection, and presentation. Katie is a student who goes in search of information for her class research paper. Katie's class readings, her interest…

  13. Search and the city

    NARCIS (Netherlands)

    P.A. Gautier; C.N. Teulings

    2009-01-01

    We develop a model of an economy with several regions, which differ in scale. Within each region, workers have to search for a job-type that matches their skill. They face a trade-off between match quality and the cost of extended search. This trade-off differs between regions, because search is mor

  14. FACT: functional annotation transfer between proteins with similar feature architectures.

    Science.gov (United States)

    Koestler, Tina; von Haeseler, Arndt; Ebersberger, Ingo

    2010-08-09

    The increasing number of sequenced genomes provides the basis for exploring the genetic and functional diversity within the tree of life. Only a tiny fraction of the encoded proteins undergoes a thorough experimental characterization. For the remainder, bioinformatics annotation tools are the only means to infer their function. Exploiting significant sequence similarities to already characterized proteins, commonly taken as evidence for homology, is the prevalent method to deduce functional equivalence. Such methods fail when homologs are too diverged, or when they have assumed a different function. Finally, due to convergent evolution, functional equivalence is not necessarily linked to common ancestry. Therefore complementary approaches are required to identify functional equivalents. We present the Feature Architecture Comparison Tool http://www.cibiv.at/FACT to search for functionally equivalent proteins. FACT uses the similarity between feature architectures of two proteins, i.e., the arrangements of functional domains, secondary structure elements and compositional properties, as a proxy for their functional equivalence. A scoring function measures feature architecture similarities, which enables searching for functional equivalents in entire proteomes. Our evaluation of 9,570 EC classified enzymes revealed that FACT, using the full feature, set outperformed the existing architecture-based approaches by identifying significantly more functional equivalents as highest scoring proteins. We show that FACT can identify functional equivalents that share no significant sequence similarity. However, when the highest scoring protein of FACT is also the protein with the highest local sequence similarity, it is in 99% of the cases functionally equivalent to the query. We demonstrate the versatility of FACT by identifying a missing link in the yeast glutathione metabolism and also by searching for the human GolgA5 equivalent in Trypanosoma brucei. FACT facilitates a

  15. Stereotypes of age differences in personality traits: universal and accurate?

    Science.gov (United States)

    Chan, Wayne; McCrae, Robert R; De Fruyt, Filip; Jussim, Lee; Löckenhoff, Corinna E; De Bolle, Marleen; Costa, Paul T; Sutin, Angelina R; Realo, Anu; Allik, Jüri; Nakazato, Katsuharu; Shimonaka, Yoshiko; Hřebíčková, Martina; Graf, Sylvie; Yik, Michelle; Brunner-Sciarra, Marina; de Figueora, Nora Leibovich; Schmidt, Vanina; Ahn, Chang-Kyu; Ahn, Hyun-nie; Aguilar-Vafaie, Maria E; Siuta, Jerzy; Szmigielska, Barbara; Cain, Thomas R; Crawford, Jarret T; Mastor, Khairul Anwar; Rolland, Jean-Pierre; Nansubuga, Florence; Miramontez, Daniel R; Benet-Martínez, Veronica; Rossier, Jérôme; Bratko, Denis; Marušić, Iris; Halberstadt, Jamin; Yamaguchi, Mami; Knežević, Goran; Martin, Thomas A; Gheorghiu, Mirona; Smith, Peter B; Barbaranelli, Claudio; Wang, Lei; Shakespeare-Finch, Jane; Lima, Margarida P; Klinkosz, Waldemar; Sekowski, Andrzej; Alcalay, Lidia; Simonetti, Franco; Avdeyeva, Tatyana V; Pramila, V S; Terracciano, Antonio

    2012-12-01

    Age trajectories for personality traits are known to be similar across cultures. To address whether stereotypes of age groups reflect these age-related changes in personality, we asked participants in 26 countries (N = 3,323) to rate typical adolescents, adults, and old persons in their own country. Raters across nations tended to share similar beliefs about different age groups; adolescents were seen as impulsive, rebellious, undisciplined, preferring excitement and novelty, whereas old people were consistently considered lower on impulsivity, activity, antagonism, and Openness. These consensual age group stereotypes correlated strongly with published age differences on the five major dimensions of personality and most of 30 specific traits, using as criteria of accuracy both self-reports and observer ratings, different survey methodologies, and data from up to 50 nations. However, personal stereotypes were considerably less accurate, and consensual stereotypes tended to exaggerate differences across age groups. (PsycINFO Database Record (c) 2012 APA, all rights reserved).

  16. Stereotypes of Age Differences in Personality Traits: Universal and Accurate?

    Science.gov (United States)

    Chan, Wayne; McCrae, Robert R.; De Fruyt, Filip; Jussim, Lee; Löckenhoff, Corinna E.; De Bolle, Marleen; Costa, Paul T.; Sutin, Angelina R.; Realo, Anu; Allik, Jüri; Nakazato, Katsuharu; Shimonaka, Yoshiko; Hřebíčková, Martina; Kourilova, Sylvie; Yik, Michelle; Ficková, Emília; Brunner-Sciarra, Marina; de Figueora, Nora Leibovich; Schmidt, Vanina; Ahn, Chang-kyu; Ahn, Hyun-nie; Aguilar-Vafaie, Maria E.; Siuta, Jerzy; Szmigielska, Barbara; Cain, Thomas R.; Crawford, Jarret T.; Mastor, Khairul Anwar; Rolland, Jean-Pierre; Nansubuga, Florence; Miramontez, Daniel R.; Benet-Martínez, Veronica; Rossier, Jérôme; Bratko, Denis; Halberstadt, Jamin; Yamaguchi, Mami; Knežević, Goran; Martin, Thomas A.; Gheorghiu, Mirona; Smith, Peter B.; Barbaranelli, Claduio; Wang, Lei; Shakespeare-Finch, Jane; Lima, Margarida P.; Klinkosz, Waldemar; Sekowski, Andrzej; Alcalay, Lidia; Simonetti, Franco; Avdeyeva, Tatyana V.; Pramila, V. S.; Terracciano, Antonio

    2012-01-01

    Age trajectories for personality traits are known to be similar across cultures. To address whether stereotypes of age groups reflect these age-related changes in personality, we asked participants in 26 countries (N = 3,323) to rate typical adolescents, adults, and old persons in their own country. Raters across nations tended to share similar beliefs about different age groups; adolescents were seen as impulsive, rebellious, undisciplined, preferring excitement and novelty, whereas old people were consistently considered lower on impulsivity, activity, antagonism, and Openness. These consensual age group stereotypes correlated strongly with published age differences on the five major dimensions of personality and most of 30 specific traits, using as criteria of accuracy both self-reports and observer ratings, different survey methodologies, and data from up to 50 nations. However, personal stereotypes were considerably less accurate, and consensual stereotypes tended to exaggerate differences across age groups. PMID:23088227

  17. Faceted Semantic Search for Personalized Social Search

    CERN Document Server

    Mas, Massimiliano Dal

    2012-01-01

    Actual social networks (like Facebook, Twitter, Linkedin, ...) need to deal with vagueness on ontological indeterminacy. In this paper is analyzed the prototyping of a faceted semantic search for personalized social search using the "joint meaning" in a community environment. User researches in a "collaborative" environment defined by folksonomies can be supported by the most common features on the faceted semantic search. A solution for the context-aware personalized search is based on "joint meaning" understood as a joint construal of the creators of the contents and the user of the contents using the faced taxonomy with the Semantic Web. A proof-of concept prototype shows how the proposed methodological approach can also be applied to existing presentation components, built with different languages and/or component technologies.

  18. Accurate estimation of indoor travel times

    DEFF Research Database (Denmark)

    Prentow, Thor Siiger; Blunck, Henrik; Stisen, Allan

    2014-01-01

    The ability to accurately estimate indoor travel times is crucial for enabling improvements within application areas such as indoor navigation, logistics for mobile workers, and facility management. In this paper, we study the challenges inherent in indoor travel time estimation, and we propose...... the InTraTime method for accurately estimating indoor travel times via mining of historical and real-time indoor position traces. The method learns during operation both travel routes, travel times and their respective likelihood---both for routes traveled as well as for sub-routes thereof. InTraTime...... allows to specify temporal and other query parameters, such as time-of-day, day-of-week or the identity of the traveling individual. As input the method is designed to take generic position traces and is thus interoperable with a variety of indoor positioning systems. The method's advantages include...

  19. Accurate colorimetric feedback for RGB LED clusters

    Science.gov (United States)

    Man, Kwong; Ashdown, Ian

    2006-08-01

    We present an empirical model of LED emission spectra that is applicable to both InGaN and AlInGaP high-flux LEDs, and which accurately predicts their relative spectral power distributions over a wide range of LED junction temperatures. We further demonstrate with laboratory measurements that changes in LED spectral power distribution with temperature can be accurately predicted with first- or second-order equations. This provides the basis for a real-time colorimetric feedback system for RGB LED clusters that can maintain the chromaticity of white light at constant intensity to within +/-0.003 Δuv over a range of 45 degrees Celsius, and to within 0.01 Δuv when dimmed over an intensity range of 10:1.

  20. Keep Searching and You’ll Find

    DEFF Research Database (Denmark)

    Laursen, Keld

    2012-01-01

    triggers for different kinds of search. It argues that the initial focus on local search was a consequence, in part, of the attention in evolutionary economics to path-dependent behavior, but that as localized behavior was increasingly accepted as the standard mode, studies began to question whether local...... search was the best solution in all cases. More recently, the literature has focused on the trade-offs being created, by firms having to balance local and non-local search. We account also for the apparent “variety paradox” in the stylized fact that organizations within the same industry tend to follow...... different search strategies, but end up with very similar technological profiles in fast-growing technologies. The article concludes by highlighting what we have learnt from the literature and suggesting some new avenues for research....

  1. Interactions of visual odometry and landmark guidance during food search in honeybees

    NARCIS (Netherlands)

    Vladusich, T; Hemmi, JM; Srinivasan, MV; Zeil, J

    2005-01-01

    How do honeybees use visual odometry and goal-defining landmarks to guide food search? In one experiment, bees were trained to forage in an optic-flow-rich tunnel with a landmark positioned directly above the feeder. Subsequent food-search tests indicated that bees searched much more accurately when

  2. Synthesizing Accurate Floating-Point Formulas

    OpenAIRE

    Ioualalen, Arnault; Martel, Matthieu

    2013-01-01

    International audience; Many critical embedded systems perform floating-point computations yet their accuracy is difficult to assert and strongly depends on how formulas are written in programs. In this article, we focus on the synthesis of accurate formulas mathematically equal to the original formulas occurring in source codes. In general, an expression may be rewritten in many ways. To avoid any combinatorial explosion, we use an intermediate representation, called APEG, enabling us to rep...

  3. Efficient Accurate Context-Sensitive Anomaly Detection

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    For program behavior-based anomaly detection, the only way to ensure accurate monitoring is to construct an efficient and precise program behavior model. A new program behavior-based anomaly detection model,called combined pushdown automaton (CPDA) model was proposed, which is based on static binary executable analysis. The CPDA model incorporates the optimized call stack walk and code instrumentation technique to gain complete context information. Thereby the proposed method can detect more attacks, while retaining good performance.

  4. Accurate Control of Josephson Phase Qubits

    Science.gov (United States)

    2016-04-14

    61 ~1986!. 23 K. Kraus, States, Effects, and Operations: Fundamental Notions of Quantum Theory, Lecture Notes in Physics , Vol. 190 ~Springer-Verlag... PHYSICAL REVIEW B 68, 224518 ~2003!Accurate control of Josephson phase qubits Matthias Steffen,1,2,* John M. Martinis,3 and Isaac L. Chuang1 1Center...for Bits and Atoms and Department of Physics , MIT, Cambridge, Massachusetts 02139, USA 2Solid State and Photonics Laboratory, Stanford University

  5. On accurate determination of contact angle

    Science.gov (United States)

    Concus, P.; Finn, R.

    1992-01-01

    Methods are proposed that exploit a microgravity environment to obtain highly accurate measurement of contact angle. These methods, which are based on our earlier mathematical results, do not require detailed measurement of a liquid free-surface, as they incorporate discontinuous or nearly-discontinuous behavior of the liquid bulk in certain container geometries. Physical testing is planned in the forthcoming IML-2 space flight and in related preparatory ground-based experiments.

  6. Accurate guitar tuning by cochlear implant musicians.

    Science.gov (United States)

    Lu, Thomas; Huang, Juan; Zeng, Fan-Gang

    2014-01-01

    Modern cochlear implant (CI) users understand speech but find difficulty in music appreciation due to poor pitch perception. Still, some deaf musicians continue to perform with their CI. Here we show unexpected results that CI musicians can reliably tune a guitar by CI alone and, under controlled conditions, match simultaneously presented tones to electric analysis, showed that accurate tuning was achieved by listening to beats rather than discriminating pitch, effectively turning a spectral task into a temporal discrimination task.

  7. Accurate integration of forced and damped oscillators

    OpenAIRE

    García Alonso, Fernando Luis; Cortés Molina, Mónica; Villacampa, Yolanda; Reyes Perales, José Antonio

    2016-01-01

    The new methods accurately integrate forced and damped oscillators. A family of analytical functions is introduced known as T-functions which are dependent on three parameters. The solution is expressed as a series of T-functions calculating their coefficients by means of recurrences which involve the perturbation function. In the T-functions series method the perturbation parameter is the factor in the local truncation error. Furthermore, this method is zero-stable and convergent. An applica...

  8. Keyword Search in Databases

    CERN Document Server

    Yu, Jeffrey Xu; Chang, Lijun

    2009-01-01

    It has become highly desirable to provide users with flexible ways to query/search information over databases as simple as keyword search like Google search. This book surveys the recent developments on keyword search over databases, and focuses on finding structural information among objects in a database using a set of keywords. Such structural information to be returned can be either trees or subgraphs representing how the objects, that contain the required keywords, are interconnected in a relational database or in an XML database. The structural keyword search is completely different from

  9. Accurate structural correlations from maximum likelihood superpositions.

    Directory of Open Access Journals (Sweden)

    Douglas L Theobald

    2008-02-01

    Full Text Available The cores of globular proteins are densely packed, resulting in complicated networks of structural interactions. These interactions in turn give rise to dynamic structural correlations over a wide range of time scales. Accurate analysis of these complex correlations is crucial for understanding biomolecular mechanisms and for relating structure to function. Here we report a highly accurate technique for inferring the major modes of structural correlation in macromolecules using likelihood-based statistical analysis of sets of structures. This method is generally applicable to any ensemble of related molecules, including families of nuclear magnetic resonance (NMR models, different crystal forms of a protein, and structural alignments of homologous proteins, as well as molecular dynamics trajectories. Dominant modes of structural correlation are determined using principal components analysis (PCA of the maximum likelihood estimate of the correlation matrix. The correlations we identify are inherently independent of the statistical uncertainty and dynamic heterogeneity associated with the structural coordinates. We additionally present an easily interpretable method ("PCA plots" for displaying these positional correlations by color-coding them onto a macromolecular structure. Maximum likelihood PCA of structural superpositions, and the structural PCA plots that illustrate the results, will facilitate the accurate determination of dynamic structural correlations analyzed in diverse fields of structural biology.

  10. Accurate finite element modeling of acoustic waves

    Science.gov (United States)

    Idesman, A.; Pham, D.

    2014-07-01

    In the paper we suggest an accurate finite element approach for the modeling of acoustic waves under a suddenly applied load. We consider the standard linear elements and the linear elements with reduced dispersion for the space discretization as well as the explicit central-difference method for time integration. The analytical study of the numerical dispersion shows that the most accurate results can be obtained with the time increments close to the stability limit. However, even in this case and the use of the linear elements with reduced dispersion, mesh refinement leads to divergent numerical results for acoustic waves under a suddenly applied load. This is explained by large spurious high-frequency oscillations. For the quantification and the suppression of spurious oscillations, we have modified and applied a two-stage time-integration technique that includes the stage of basic computations and the filtering stage. This technique allows accurate convergent results at mesh refinement as well as significantly reduces the numerical anisotropy of solutions. We should mention that the approach suggested is very general and can be equally applied to any loading as well as for any space-discretization technique and any explicit or implicit time-integration method.

  11. Scaling, similarity, and the fourth paradigm for hydrology

    Science.gov (United States)

    Peters-Lidard, Christa D.; Clark, Martyn; Samaniego, Luis; Verhoest, Niko E. C.; van Emmerik, Tim; Uijlenhoet, Remko; Achieng, Kevin; Franz, Trenton E.; Woods, Ross

    2017-07-01

    In this synthesis paper addressing hydrologic scaling and similarity, we posit that roadblocks in the search for universal laws of hydrology are hindered by our focus on computational simulation (the third paradigm) and assert that it is time for hydrology to embrace a fourth paradigm of data-intensive science. Advances in information-based hydrologic science, coupled with an explosion of hydrologic data and advances in parameter estimation and modeling, have laid the foundation for a data-driven framework for scrutinizing hydrological scaling and similarity hypotheses. We summarize important scaling and similarity concepts (hypotheses) that require testing; describe a mutual information framework for testing these hypotheses; describe boundary condition, state, flux, and parameter data requirements across scales to support testing these hypotheses; and discuss some challenges to overcome while pursuing the fourth hydrological paradigm. We call upon the hydrologic sciences community to develop a focused effort towards adopting the fourth paradigm and apply this to outstanding challenges in scaling and similarity.

  12. A Survey of Meta Search Engine%元搜索引擎研究

    Institute of Scientific and Technical Information of China (English)

    张卫丰; 徐宝文; 周晓宇; 李东; 许蕾

    2001-01-01

    With the explosive increase of the network information,it is more and more difficult for people to look up information. The occurrence of the Web search engines overcomes this problem in some degree. However, because different search engines use different mechanisms, scope and algorithms, the repetition of the search results for the same query is no more than 34 %. If wish to get relativly fullscale ,accurate search results,multi-search engines should be used and the meta search engines occur. In this paper ,the meta search engines are surveyed. At first ,the history ,the principles and the elements of the meta search engines are discussed. Then,the related creteria of the meta search engines are analyzed and several typical meta search engines are compared. Finally,on this base,the trend of the meta search engine is introduced.

  13. Searching and Indexing Genomic Databases via Kernelization

    Directory of Open Access Journals (Sweden)

    Travis eGagie

    2015-02-01

    Full Text Available The rapid advance of DNA sequencing technologies has yielded databases of thousands of genomes. To search and index these databases effectively, it is important that we take advantage of the similarity between those genomes. Several authors have recently suggested searching or indexing only one reference genome and the parts of the other genomes where they differ. In this paper we survey the twenty-year history of this idea and discuss its relation to kernelization in parameterized complexity.

  14. Evaluating search effectiveness of some selected search engines ...

    African Journals Online (AJOL)

    Evaluating search effectiveness of some selected search engines. ... AFRICAN JOURNALS ONLINE (AJOL) · Journals · Advanced Search · USING AJOL ... seek for information on the World Wide Web (WWW) using variety of search engines.

  15. Judging the Capability of Search Engines and Search Terms

    National Research Council Canada - National Science Library

    Anna Kaushik

    2012-01-01

    .... The present study aims to judge the capability of five selected search engines and search terms on the basis of first ten results and to identify most appropriate search term and search engine...

  16. Cerebral fat embolism: Use of MR spectroscopy for accurate diagnosis

    Directory of Open Access Journals (Sweden)

    Laxmi Kokatnur

    2015-01-01

    Full Text Available Cerebral fat embolism (CFE is an uncommon but serious complication following orthopedic procedures. It usually presents with altered mental status, and can be a part of fat embolism syndrome (FES if associated with cutaneous and respiratory manifestations. Because of the presence of other common factors affecting the mental status, particularly in the postoperative period, the diagnosis of CFE can be challenging. Magnetic resonance imaging (MRI of brain typically shows multiple lesions distributed predominantly in the subcortical region, which appear as hyperintense lesions on T2 and diffusion weighted images. Although the location offers a clue, the MRI findings are not specific for CFE. Watershed infarcts, hypoxic encephalopathy, disseminated infections, demyelinating disorders, diffuse axonal injury can also show similar changes on MRI of brain. The presence of fat in these hyperintense lesions, identified by MR spectroscopy as raised lipid peaks will help in accurate diagnosis of CFE. Normal brain tissue or conditions producing similar MRI changes will not show any lipid peak on MR spectroscopy. We present a case of CFE initially misdiagnosed as brain stem stroke based on clinical presentation and cranial computed tomography (CT scan, and later, MR spectroscopy elucidated the accurate diagnosis.

  17. Automatic Planning of External Search Engine Optimization

    Directory of Open Access Journals (Sweden)

    Vita Jasevičiūtė

    2015-07-01

    Full Text Available This paper describes an investigation of the external search engine optimization (SEO action planning tool, dedicated to automatically extract a small set of most important keywords for each month during whole year period. The keywords in the set are extracted accordingly to external measured parameters, such as average number of searches during the year and for every month individually. Additionally the position of the optimized web site for each keyword is taken into account. The generated optimization plan is similar to the optimization plans prepared manually by the SEO professionals and can be successfully used as a support tool for web site search engine optimization.

  18. Efficient Proposed Framework for Semantic Search Engine using New Semantic Ranking Algorithm

    Directory of Open Access Journals (Sweden)

    M. M. El-gayar

    2015-08-01

    Full Text Available The amount of information raises billions of databases every year and there is an urgent need to search for that information by a specialize tool called search engine. There are many of search engines available today, but the main challenge in these search engines is that most of them cannot retrieve meaningful information intelligently. The semantic web technology is a solution that keeps data in a readable format that helps machines to match smartly this data with related information based on meanings. In this paper, we will introduce a proposed semantic framework that includes four phases crawling, indexing, ranking and retrieval phase. This semantic framework operates over a sorting RDF by using efficient proposed ranking algorithm and enhanced crawling algorithm. The enhanced crawling algorithm crawls relevant forum content from the web with minimal overhead. The proposed ranking algorithm is produced to order and evaluate similar meaningful data in order to make the retrieval process becomes faster, easier and more accurate. We applied our work on a standard database and achieved 99 percent effectiveness on semantic performance in minimum time and less than 1 percent error rate compared with the other semantic systems.

  19. A toolbox for representational similarity analysis.

    Directory of Open Access Journals (Sweden)

    Hamed Nili

    2014-04-01

    Full Text Available Neuronal population codes are increasingly being investigated with multivariate pattern-information analyses. A key challenge is to use measured brain-activity patterns to test computational models of brain information processing. One approach to this problem is representational similarity analysis (RSA, which characterizes a representation in a brain or computational model by the distance matrix of the response patterns elicited by a set of stimuli. The representational distance matrix encapsulates what distinctions between stimuli are emphasized and what distinctions are de-emphasized in the representation. A model is tested by comparing the representational distance matrix it predicts to that of a measured brain region. RSA also enables us to compare representations between stages of processing within a given brain or model, between brain and behavioral data, and between individuals and species. Here, we introduce a Matlab toolbox for RSA. The toolbox supports an analysis approach that is simultaneously data- and hypothesis-driven. It is designed to help integrate a wide range of computational models into the analysis of multichannel brain-activity measurements as provided by modern functional imaging and neuronal recording techniques. Tools for visualization and inference enable the user to relate sets of models to sets of brain regions and to statistically test and compare the models using nonparametric inference methods. The toolbox supports searchlight-based RSA, to continuously map a measured brain volume in search of a neuronal population code with a specific geometry. Finally, we introduce the linear-discriminant t value as a measure of representational discriminability that bridges the gap between linear decoding analyses and RSA. In order to demonstrate the capabilities of the toolbox, we apply it to both simulated and real fMRI data. The key functions are equally applicable to other modalities of brain-activity measurement. The

  20. Scattering from Rough Surfaces with Extended Self-Similarity

    Institute of Scientific and Technical Information of China (English)

    张延冬; 吴振森

    2002-01-01

    An extended self-similarity (ESS) model is developed by extending the self-similarity condition in fractional Brownian motion (FBM), then the incremental Fourier synthesis algorithm is introduced to generate ESS rough surfaces, and an estimation algorithm is presented to extract the generalized multiscale Hurst parameter, which can also be modified to estimate the Hurst parameter for FBM more accurately. Finally, the scattering coefficient from ESS rough surfaces is calculated with the scalar Kirchhoff approximation, and its variation with the parameters in the ESS model is obtained. Compared with experimental measurements, it can be concluded that the ESS model provides a good tool to model natural rough surfaces.

  1. University Students' Online Information Searching Strategies in Different Search Contexts

    Science.gov (United States)

    Tsai, Meng-Jung; Liang, Jyh-Chong; Hou, Huei-Tse; Tsai, Chin-Chung

    2012-01-01

    This study investigates the role of search context played in university students' online information searching strategies. A total of 304 university students in Taiwan were surveyed with questionnaires in which two search contexts were defined as searching for learning, and searching for daily life information. Students' online search strategies…

  2. [Advanced online search techniques and dedicated search engines for physicians].

    Science.gov (United States)

    Nahum, Yoav

    2008-02-01

    In recent years search engines have become an essential tool in the work of physicians. This article will review advanced search techniques from the world of information specialists, as well as some advanced search engine operators that may help physicians improve their online search capabilities, and maximize the yield of their searches. This article also reviews popular dedicated scientific and biomedical literature search engines.

  3. Similarity-based denoising of point-sampled surface

    Institute of Scientific and Technical Information of China (English)

    Ren-fang WANG; Wen-zhi CHEN; San-yuan ZHANG; Yin ZHANG; Xiu-zi YE

    2008-01-01

    A non-local denoising (NLD) algorithm for point-sampled surfaces (PSSs) is presented based on similarities, including geometry intensity and features of sample points. By using the trilateral filtering operator, the differential signal of each sample point is determined and called "geometry intensity". Based on covariance analysis, a regular grid of geometry intensity of a sample point is constructed, and the geometry-intensity similarity of two points is measured according to their grids. Based on mean shift clustering, the PSSs are clustered in terms of the local geometry-features similarity. The smoothed geometry intensity, i.e., offset distance, of the sample point is estimated according to the two similarities. Using the resulting intensity, the noise component from PSSs is finally removed by adjusting the position of each sample point along its own normal direction. Experimental results demonstrate that the algorithm is robust and can produce a more accurate denoising result while having better feature preservation.

  4. Personalized Predictive Modeling and Risk Factor Identification using Patient Similarity.

    Science.gov (United States)

    Ng, Kenney; Sun, Jimeng; Hu, Jianying; Wang, Fei

    2015-01-01

    Personalized predictive models are customized for an individual patient and trained using information from similar patients. Compared to global models trained on all patients, they have the potential to produce more accurate risk scores and capture more relevant risk factors for individual patients. This paper presents an approach for building personalized predictive models and generating personalized risk factor profiles. A locally supervised metric learning (LSML) similarity measure is trained for diabetes onset and used to find clinically similar patients. Personalized risk profiles are created by analyzing the parameters of the trained personalized logistic regression models. A 15,000 patient data set, derived from electronic health records, is used to evaluate the approach. The predictive results show that the personalized models can outperform the global model. Cluster analysis of the risk profiles show groups of patients with similar risk factors, differences in the top risk factors for different groups of patients and differences between the individual and global risk factors.

  5. Similarity-Based Classification in Partially Labeled Networks

    Science.gov (United States)

    Zhang, Qian-Ming; Shang, Ming-Sheng; Lü, Linyuan

    Two main difficulties in the problem of classification in partially labeled networks are the sparsity of the known labeled nodes and inconsistency of label information. To address these two difficulties, we propose a similarity-based method, where the basic assumption is that two nodes are more likely to be categorized into the same class if they are more similar. In this paper, we introduce ten similarity indices defined based on the network structure. Empirical results on the co-purchase network of political books show that the similarity-based method can, to some extent, overcome these two difficulties and give higher accurate classification than the relational neighbors method, especially when the labeled nodes are sparse. Furthermore, we find that when the information of known labeled nodes is sufficient, the indices considering only local information can perform as good as those global indices while having much lower computational complexity.

  6. Assessment of metabolome annotation quality: a method for evaluating the false discovery rate of elemental composition searches.

    Directory of Open Access Journals (Sweden)

    Fumio Matsuda

    Full Text Available BACKGROUND: In metabolomics researches using mass spectrometry (MS, systematic searching of high-resolution mass data against compound databases is often the first step of metabolite annotation to determine elemental compositions possessing similar theoretical mass numbers. However, incorrect hits derived from errors in mass analyses will be included in the results of elemental composition searches. To assess the quality of peak annotation information, a novel methodology for false discovery rates (FDR evaluation is presented in this study. Based on the FDR analyses, several aspects of an elemental composition search, including setting a threshold, estimating FDR, and the types of elemental composition databases most reliable for searching are discussed. METHODOLOGY/PRINCIPAL FINDINGS: The FDR can be determined from one measured value (i.e., the hit rate for search queries and four parameters determined by Monte Carlo simulation. The results indicate that relatively high FDR values (30-50% were obtained when searching time-of-flight (TOF/MS data using the KNApSAcK and KEGG databases. In addition, searches against large all-in-one databases (e.g., PubChem always produced unacceptable results (FDR >70%. The estimated FDRs suggest that the quality of search results can be improved not only by performing more accurate mass analysis but also by modifying the properties of the compound database. A theoretical analysis indicates that FDR could be improved by using compound database with smaller but higher completeness entries. CONCLUSIONS/SIGNIFICANCE: High accuracy mass analysis, such as Fourier transform (FT-MS, is needed for reliable annotation (FDR <10%. In addition, a small, customized compound database is preferable for high-quality annotation of metabolome data.

  7. Genetic algorithms as global random search methods

    Science.gov (United States)

    Peck, Charles C.; Dhawan, Atam P.

    1995-01-01

    Genetic algorithm behavior is described in terms of the construction and evolution of the sampling distributions over the space of candidate solutions. This novel perspective is motivated by analysis indicating that the schema theory is inadequate for completely and properly explaining genetic algorithm behavior. Based on the proposed theory, it is argued that the similarities of candidate solutions should be exploited directly, rather than encoding candidate solutions and then exploiting their similarities. Proportional selection is characterized as a global search operator, and recombination is characterized as the search process that exploits similarities. Sequential algorithms and many deletion methods are also analyzed. It is shown that by properly constraining the search breadth of recombination operators, convergence of genetic algorithms to a global optimum can be ensured.

  8. Comparison of PubMed and Google Scholar literature searches.

    Science.gov (United States)

    Anders, Michael E; Evans, Dennis P

    2010-05-01

    Literature searches are essential to evidence-based respiratory care. To conduct literature searches, respiratory therapists rely on search engines to retrieve information, but there is a dearth of literature on the comparative efficiencies of search engines for researching clinical questions in respiratory care. To compare PubMed and Google Scholar search results for clinical topics in respiratory care to that of a benchmark. We performed literature searches with PubMed and Google Scholar, on 3 clinical topics. In PubMed we used the Clinical Queries search filter. In Google Scholar we used the search filters in the Advanced Scholar Search option. We used the reference list of a related Cochrane Collaboration evidence-based systematic review as the benchmark for each of the search results. We calculated recall (sensitivity) and precision (positive predictive value) with 2 x 2 contingency tables. We compared the results with the chi-square test of independence and Fisher's exact test. PubMed and Google Scholar had similar recall for both overall search results (71% vs 69%) and full-text results (43% vs 51%). PubMed had better precision than Google Scholar for both overall search results (13% vs 0.07%, P Advanced Scholar Search in Google Scholar for respiratory care topics. PubMed appears to be more practical to conduct efficient, valid searches for informing evidence-based patient-care protocols, for guiding the care of individual patients, and for educational purposes.

  9. Automatization and training in visual search.

    Science.gov (United States)

    Czerwinski, M; Lightfoot, N; Shiffrin, R M

    1992-01-01

    In several search tasks, the amount of practice on particular combinations of targets and distractors was equated in varied-mapping (VM) and consistent-mapping (CM) conditions. The results indicate the importance of distinguishing between memory and visual search tasks, and implicate a number of factors that play important roles in visual search and its learning. Visual search was studied in Experiment 1. VM and CM performance were almost equal, and slope reductions occurred during practice for both, suggesting the learning of efficient attentive search based on features, and no important role for automatic attention attraction. However, positive transfer effects occurred when previous CM targets were re-paired with previous CM distractors, even though these targets and distractors had not been trained together. Also, the introduction of a demanding simultaneous task produced advantages of CM over VM. These latter two results demonstrated the operation of automatic attention attraction. Visual search was further studied in Experiment 2, using novel characters for which feature overlap and similarity were controlled. The design and many of the findings paralleled Experiment 1. In addition, enormous search improvement was seen over 35 sessions of training, suggesting the operation of perceptual unitization for the novel characters. Experiment 3 showed a large, persistent advantage for CM over VM performance in memory search, even when practice on particular combinations of targets and distractors was equated in the two training conditions. A multifactor theory of automatization and attention is put forth to account for these findings and others in the literature.

  10. Cascade category-aware visual search.

    Science.gov (United States)

    Zhang, Shiliang; Tian, Qi; Huang, Qingming; Gao, Wen; Rui, Yong

    2014-06-01

    Incorporating image classification into image retrieval system brings many attractive advantages. For instance, the search space can be narrowed down by rejecting images in irrelevant categories of the query. The retrieved images can be more consistent in semantics by indexing and returning images in the relevant categories together. However, due to their different goals on recognition accuracy and retrieval scalability, it is hard to efficiently incorporate most image classification works into large-scale image search. To study this problem, we propose cascade category-aware visual search, which utilizes weak category clue to achieve better retrieval accuracy, efficiency, and memory consumption. To capture the category and visual clues of an image, we first learn category-visual words, which are discriminative and repeatable local features labeled with categories. By identifying category-visual words in database images, we are able to discard noisy local features and extract image visual and category clues, which are hence recorded in a hierarchical index structure. Our retrieval system narrows down the search space by: 1) filtering the noisy local features in query; 2) rejecting irrelevant categories in database; and 3) preforming discriminative visual search in relevant categories. The proposed algorithm is tested on object search, landmark search, and large-scale similar image search on the large-scale LSVRC10 data set. Although the category clue introduced is weak, our algorithm still shows substantial advantages in retrieval accuracy, efficiency, and memory consumption than the state-of-the-art.

  11. When Gravity Fails: Local Search Topology

    Science.gov (United States)

    Frank, Jeremy; Cheeseman, Peter; Stutz, John; Lau, Sonie (Technical Monitor)

    1997-01-01

    Local search algorithms for combinatorial search problems frequently encounter a sequence of states in which it is impossible to improve the value of the objective function; moves through these regions, called {\\em plateau moves), dominate the time spent in local search. We analyze and characterize {\\em plateaus) for three different classes of randomly generated Boolean Satisfiability problems. We identify several interesting features of plateaus that impact the performance of local search algorithms. We show that local minima tend to be small but occasionally may be very large. We also show that local minima can be escaped without unsatisfying a large number of clauses, but that systematically searching for an escape route may be computationally expensive if the local minimum is large. We show that plateaus with exits, called benches, tend to be much larger than minima, and that some benches have very few exit states which local search can use to escape. We show that the solutions (i.e. global minima) of randomly generated problem instances form clusters, which behave similarly to local minima. We revisit several enhancements of local search algorithms and explain their performance in light of our results. Finally we discuss strategies for creating the next generation of local search algorithms.

  12. Accurate measurement of unsteady state fluid temperature

    Science.gov (United States)

    Jaremkiewicz, Magdalena

    2017-03-01

    In this paper, two accurate methods for determining the transient fluid temperature were presented. Measurements were conducted for boiling water since its temperature is known. At the beginning the thermometers are at the ambient temperature and next they are immediately immersed into saturated water. The measurements were carried out with two thermometers of different construction but with the same housing outer diameter equal to 15 mm. One of them is a K-type industrial thermometer widely available commercially. The temperature indicated by the thermometer was corrected considering the thermometers as the first or second order inertia devices. The new design of a thermometer was proposed and also used to measure the temperature of boiling water. Its characteristic feature is a cylinder-shaped housing with the sheath thermocouple located in its center. The temperature of the fluid was determined based on measurements taken in the axis of the solid cylindrical element (housing) using the inverse space marching method. Measurements of the transient temperature of the air flowing through the wind tunnel using the same thermometers were also carried out. The proposed measurement technique provides more accurate results compared with measurements using industrial thermometers in conjunction with simple temperature correction using the inertial thermometer model of the first or second order. By comparing the results, it was demonstrated that the new thermometer allows obtaining the fluid temperature much faster and with higher accuracy in comparison to the industrial thermometer. Accurate measurements of the fast changing fluid temperature are possible due to the low inertia thermometer and fast space marching method applied for solving the inverse heat conduction problem.

  13. New law requires 'medically accurate' lesson plans.

    Science.gov (United States)

    1999-09-17

    The California Legislature has passed a bill requiring all textbooks and materials used to teach about AIDS be medically accurate and objective. Statements made within the curriculum must be supported by research conducted in compliance with scientific methods, and published in peer-reviewed journals. Some of the current lesson plans were found to contain scientifically unsupported and biased information. In addition, the bill requires material to be "free of racial, ethnic, or gender biases." The legislation is supported by a wide range of interests, but opposed by the California Right to Life Education Fund, because they believe it discredits abstinence-only material.

  14. Accurate diagnosis is essential for amebiasis

    Institute of Scientific and Technical Information of China (English)

    2004-01-01

    @@ Amebiasis is one of the three most common causes of death from parasitic disease, and Entamoeba histolytica is the most widely distributed parasites in the world. Particularly, Entamoeba histolytica infection in the developing countries is a significant health problem in amebiasis-endemic areas with a significant impact on infant mortality[1]. In recent years a world wide increase in the number of patients with amebiasis has refocused attention on this important infection. On the other hand, improving the quality of parasitological methods and widespread use of accurate tecniques have improved our knowledge about the disease.

  15. The first accurate description of an aurora

    Science.gov (United States)

    Schröder, Wilfried

    2006-12-01

    As technology has advanced, the scientific study of auroral phenomena has increased by leaps and bounds. A look back at the earliest descriptions of aurorae offers an interesting look into how medieval scholars viewed the subjects that we study.Although there are earlier fragmentary references in the literature, the first accurate description of the aurora borealis appears to be that published by the German Catholic scholar Konrad von Megenberg (1309-1374) in his book Das Buch der Natur (The Book of Nature). The book was written between 1349 and 1350.

  16. Universality: Accurate Checks in Dyson's Hierarchical Model

    Science.gov (United States)

    Godina, J. J.; Meurice, Y.; Oktay, M. B.

    2003-06-01

    In this talk we present high-accuracy calculations of the susceptibility near βc for Dyson's hierarchical model in D = 3. Using linear fitting, we estimate the leading (γ) and subleading (Δ) exponents. Independent estimates are obtained by calculating the first two eigenvalues of the linearized renormalization group transformation. We found γ = 1.29914073 ± 10 -8 and, Δ = 0.4259469 ± 10-7 independently of the choice of local integration measure (Ising or Landau-Ginzburg). After a suitable rescaling, the approximate fixed points for a large class of local measure coincide accurately with a fixed point constructed by Koch and Wittwer.

  17. Updating collection representations for federated search

    OpenAIRE

    M. Shokouhi; Baillie, M; Azzopardi, L.

    2007-01-01

    To facilitate the search for relevant information across a set of online distributed collections, a federated information retrieval system typically represents each collection, centrally, by a set of vocabularies or sampled documents. Accurate retrieval is therefore related to how precise each representation reflects the underlying content stored in that collection. As collections evolve over time, collection representations should also be updated to reflect any change, however, a current sol...

  18. Supporting complex search tasks

    DEFF Research Database (Denmark)

    Gäde, Maria; Hall, Mark; Huurdeman, Hugo

    2015-01-01

    There is broad consensus in the field of IR that search is complex in many use cases and applications, both on the Web and in domain specific collections, and both professionally and in our daily life. Yet our understanding of complex search tasks, in comparison to simple look up tasks, is fragme......There is broad consensus in the field of IR that search is complex in many use cases and applications, both on the Web and in domain specific collections, and both professionally and in our daily life. Yet our understanding of complex search tasks, in comparison to simple look up tasks......, and recommendations, and supporting exploratory search to sensemaking and analytics, UI and UX design pose an overconstrained challenge. How do we know that our approach is any good? Supporting complex search task requires new collaborations across the whole field of IR, and the proposed workshop will bring together...

  19. Adaptive Large Neighbourhood Search

    DEFF Research Database (Denmark)

    Røpke, Stefan

    Large neighborhood search is a metaheuristic that has gained popularity in recent years. The heuristic repeatedly moves from solution to solution by first partially destroying the solution and then repairing it. The best solution observed during this search is presented as the final solution....... This tutorial introduces the large neighborhood search metaheuristic and the variant adaptive large neighborhood search that dynamically tunes parameters of the heuristic while it is running. Both heuristics belong to a broader class of heuristics that are searching a solution space using very large...... neighborhoods. The tutorial also present applications of the adaptive large neighborhood search, mostly related to vehicle routing problems for which the heuristic has been extremely successful. We discuss how the heuristic can be parallelized and thereby take advantage of modern desktop computers...

  20. Search on Rugged Landscapes

    DEFF Research Database (Denmark)

    Billinger, Stephan; Stieglitz, Nils; Schumacher, Terry

    2014-01-01

    This paper presents findings from a laboratory experiment on human decision-making in a complex combinatorial task. We find strong evidence for a behavioral model of adaptive search. Success narrows down search to the neighborhood of the status quo, while failure promotes gradually more explorative...... search. Task complexity does not have a direct effect on behavior, but systematically affects the feedback conditions that guide success-induced exploitation and failure-induced exploration. The analysis also shows that human participants were prone to over-exploration, since they broke off the search...... for local improvements too early. We derive stylized decision rules that generate the search behavior observed in the experiment and discuss the implications of our findings for individual decision-making and organizational search....

  1. Supporting complex search tasks

    DEFF Research Database (Denmark)

    Gäde, Maria; Hall, Mark; Huurdeman, Hugo;

    2015-01-01

    There is broad consensus in the field of IR that search is complex in many use cases and applications, both on the Web and in domain specific collections, and both professionally and in our daily life. Yet our understanding of complex search tasks, in comparison to simple look up tasks......, is fragmented at best. The workshop addressed the many open research questions: What are the obvious use cases and applications of complex search? What are essential features of work tasks and search tasks to take into account? And how do these evolve over time? With a multitude of information, varying from...... introductory to specialized, and from authoritative to speculative or opinionated, when to show what sources of information? How does the information seeking process evolve and what are relevant differences between different stages? With complex task and search process management, blending searching, browsing...

  2. Accurate Stellar Parameters for Exoplanet Host Stars

    Science.gov (United States)

    Brewer, John Michael; Fischer, Debra; Basu, Sarbani; Valenti, Jeff A.

    2015-01-01

    A large impedement to our understanding of planet formation is obtaining a clear picture of planet radii and densities. Although determining precise ratios between planet and stellar host are relatively easy, determining accurate stellar parameters is still a difficult and costly undertaking. High resolution spectral analysis has traditionally yielded precise values for some stellar parameters but stars in common between catalogs from different authors or analyzed using different techniques often show offsets far in excess of their uncertainties. Most analyses now use some external constraint, when available, to break observed degeneracies between surface gravity, effective temperature, and metallicity which can otherwise lead to correlated errors in results. However, these external constraints are impossible to obtain for all stars and can require more costly observations than the initial high resolution spectra. We demonstrate that these discrepencies can be mitigated by use of a larger line list that has carefully tuned atomic line data. We use an iterative modeling technique that does not require external constraints. We compare the surface gravity obtained with our spectral synthesis modeling to asteroseismically determined values for 42 Kepler stars. Our analysis agrees well with only a 0.048 dex offset and an rms scatter of 0.05 dex. Such accurate stellar gravities can reduce the primary source of uncertainty in radii by almost an order of magnitude over unconstrained spectral analysis.

  3. Accurate pattern registration for integrated circuit tomography

    Energy Technology Data Exchange (ETDEWEB)

    Levine, Zachary H.; Grantham, Steven; Neogi, Suneeta; Frigo, Sean P.; McNulty, Ian; Retsch, Cornelia C.; Wang, Yuxin; Lucatorto, Thomas B.

    2001-07-15

    As part of an effort to develop high resolution microtomography for engineered structures, a two-level copper integrated circuit interconnect was imaged using 1.83 keV x rays at 14 angles employing a full-field Fresnel zone plate microscope. A major requirement for high resolution microtomography is the accurate registration of the reference axes in each of the many views needed for a reconstruction. A reconstruction with 100 nm resolution would require registration accuracy of 30 nm or better. This work demonstrates that even images that have strong interference fringes can be used to obtain accurate fiducials through the use of Radon transforms. We show that we are able to locate the coordinates of the rectilinear circuit patterns to 28 nm. The procedure is validated by agreement between an x-ray parallax measurement of 1.41{+-}0.17 {mu}m and a measurement of 1.58{+-}0.08 {mu}m from a scanning electron microscope image of a cross section.

  4. Accurate basis set truncation for wavefunction embedding

    Science.gov (United States)

    Barnes, Taylor A.; Goodpaster, Jason D.; Manby, Frederick R.; Miller, Thomas F.

    2013-07-01

    Density functional theory (DFT) provides a formally exact framework for performing embedded subsystem electronic structure calculations, including DFT-in-DFT and wavefunction theory-in-DFT descriptions. In the interest of efficiency, it is desirable to truncate the atomic orbital basis set in which the subsystem calculation is performed, thus avoiding high-order scaling with respect to the size of the MO virtual space. In this study, we extend a recently introduced projection-based embedding method [F. R. Manby, M. Stella, J. D. Goodpaster, and T. F. Miller III, J. Chem. Theory Comput. 8, 2564 (2012)], 10.1021/ct300544e to allow for the systematic and accurate truncation of the embedded subsystem basis set. The approach is applied to both covalently and non-covalently bound test cases, including water clusters and polypeptide chains, and it is demonstrated that errors associated with basis set truncation are controllable to well within chemical accuracy. Furthermore, we show that this approach allows for switching between accurate projection-based embedding and DFT embedding with approximate kinetic energy (KE) functionals; in this sense, the approach provides a means of systematically improving upon the use of approximate KE functionals in DFT embedding.

  5. How Accurately can we Calculate Thermal Systems?

    Energy Technology Data Exchange (ETDEWEB)

    Cullen, D; Blomquist, R N; Dean, C; Heinrichs, D; Kalugin, M A; Lee, M; Lee, Y; MacFarlan, R; Nagaya, Y; Trkov, A

    2004-04-20

    I would like to determine how accurately a variety of neutron transport code packages (code and cross section libraries) can calculate simple integral parameters, such as K{sub eff}, for systems that are sensitive to thermal neutron scattering. Since we will only consider theoretical systems, we cannot really determine absolute accuracy compared to any real system. Therefore rather than accuracy, it would be more precise to say that I would like to determine the spread in answers that we obtain from a variety of code packages. This spread should serve as an excellent indicator of how accurately we can really model and calculate such systems today. Hopefully, eventually this will lead to improvements in both our codes and the thermal scattering models that they use in the future. In order to accomplish this I propose a number of extremely simple systems that involve thermal neutron scattering that can be easily modeled and calculated by a variety of neutron transport codes. These are theoretical systems designed to emphasize the effects of thermal scattering, since that is what we are interested in studying. I have attempted to keep these systems very simple, and yet at the same time they include most, if not all, of the important thermal scattering effects encountered in a large, water-moderated, uranium fueled thermal system, i.e., our typical thermal reactors.

  6. Accurate taxonomic assignment of short pyrosequencing reads.

    Science.gov (United States)

    Clemente, José C; Jansson, Jesper; Valiente, Gabriel

    2010-01-01

    Ambiguities in the taxonomy dependent assignment of pyrosequencing reads are usually resolved by mapping each read to the lowest common ancestor in a reference taxonomy of all those sequences that match the read. This conservative approach has the drawback of mapping a read to a possibly large clade that may also contain many sequences not matching the read. A more accurate taxonomic assignment of short reads can be made by mapping each read to the node in the reference taxonomy that provides the best precision and recall. We show that given a suffix array for the sequences in the reference taxonomy, a short read can be mapped to the node of the reference taxonomy with the best combined value of precision and recall in time linear in the size of the taxonomy subtree rooted at the lowest common ancestor of the matching sequences. An accurate taxonomic assignment of short reads can thus be made with about the same efficiency as when mapping each read to the lowest common ancestor of all matching sequences in a reference taxonomy. We demonstrate the effectiveness of our approach on several metagenomic datasets of marine and gut microbiota.

  7. Accurate determination of characteristic relative permeability curves

    Science.gov (United States)

    Krause, Michael H.; Benson, Sally M.

    2015-09-01

    A recently developed technique to accurately characterize sub-core scale heterogeneity is applied to investigate the factors responsible for flowrate-dependent effective relative permeability curves measured on core samples in the laboratory. The dependency of laboratory measured relative permeability on flowrate has long been both supported and challenged by a number of investigators. Studies have shown that this apparent flowrate dependency is a result of both sub-core scale heterogeneity and outlet boundary effects. However this has only been demonstrated numerically for highly simplified models of porous media. In this paper, flowrate dependency of effective relative permeability is demonstrated using two rock cores, a Berea Sandstone and a heterogeneous sandstone from the Otway Basin Pilot Project in Australia. Numerical simulations of steady-state coreflooding experiments are conducted at a number of injection rates using a single set of input characteristic relative permeability curves. Effective relative permeability is then calculated from the simulation data using standard interpretation methods for calculating relative permeability from steady-state tests. Results show that simplified approaches may be used to determine flowrate-independent characteristic relative permeability provided flow rate is sufficiently high, and the core heterogeneity is relatively low. It is also shown that characteristic relative permeability can be determined at any typical flowrate, and even for geologically complex models, when using accurate three-dimensional models.

  8. A New Efficient Method for Calculating Similarity Between Web Services

    Directory of Open Access Journals (Sweden)

    T. RACHAD

    2014-08-01

    Full Text Available Web services allow communication between heterogeneous systems in a distributed environment. Their enormous success and their increased use led to the fact that thousands of Web services are present on the Internet. This significant number of Web services which not cease to increase has led to problems of the difficulty in locating and classifying web services, these problems are encountered mainly during the operations of web services discovery and substitution. Traditional ways of search based on keywords are not successful in this context, their results do not support the structure of Web services and they consider in their search only the identifiers of the web service description language (WSDL interface elements. The methods based on semantics (WSDLS, OWLS, SAWSDL… which increase the WSDL description of a Web service with a semantic description allow raising partially this problem, but their complexity and difficulty delays their adoption in real cases. Measuring the similarity between the web services interfaces is the most suitable solution for this kind of problems, it will classify available web services so as to know those that best match the searched profile and those that do not match. Thus, the main goal of this work is to study the degree of similarity between any two web services by offering a new method that is more effective than existing works.

  9. Personalized online information search and visualization

    Directory of Open Access Journals (Sweden)

    Orthner Helmuth F

    2005-03-01

    Full Text Available Abstract Background The rapid growth of online publications such as the Medline and other sources raises the questions how to get the relevant information efficiently. It is important, for a bench scientist, e.g., to monitor related publications constantly. It is also important, for a clinician, e.g., to access the patient records anywhere and anytime. Although time-consuming, this kind of searching procedure is usually similar and simple. Likely, it involves a search engine and a visualization interface. Different words or combination reflects different research topics. The objective of this study is to automate this tedious procedure by recording those words/terms in a database and online sources, and use the information for an automated search and retrieval. The retrieved information will be available anytime and anywhere through a secure web server. Results We developed such a database that stored searching terms, journals and et al., and implement a piece of software for searching the medical subject heading-indexed sources such as the Medline and other online sources automatically. The returned information were stored locally, as is, on a server and visible through a Web-based interface. The search was performed daily or otherwise scheduled and the users logon to the website anytime without typing any words. The system has potentials to retrieve similarly from non-medical subject heading-indexed literature or a privileged information source such as a clinical information system. The issues such as security, presentation and visualization of the retrieved information were thus addressed. One of the presentation issues such as wireless access was also experimented. A user survey showed that the personalized online searches saved time and increased and relevancy. Handheld devices could also be used to access the stored information but less satisfactory. Conclusion The Web-searching software or similar system has potential to be an efficient

  10. Federated Search Scalability

    OpenAIRE

    Txurruka Alberdi, Beñat

    2015-01-01

    The search of images on the internet has become a natural process for the internet surfer. Most of the search engines use complex algorithms to look up for images but their metadata is mostly ignored, in part because many image hosting sites remove metadata when the image is uploaded. The JPSearch standard has been developed to handle interoperability in metadata based searches, but it seems that the market is not interested on supporting it. The starting point of this proje...

  11. Mastering ElasticSearch

    CERN Document Server

    Kuc, Rafal

    2013-01-01

    A practical tutorial that covers the difficult design, implementation, and management of search solutions.Mastering ElasticSearch is aimed at to intermediate users who want to extend their knowledge about ElasticSearch. The topics that are described in the book are detailed, but we assume that you already know the basics, like the query DSL or data indexing. Advanced users will also find this book useful, as the examples are getting deep into the internals where it is needed.

  12. ElasticSearch cookbook

    CERN Document Server

    Paro, Alberto

    2013-01-01

    Written in an engaging, easy-to-follow style, the recipes will help you to extend the capabilities of ElasticSearch to manage your data effectively.If you are a developer who implements ElasticSearch in your web applications, manage data, or have decided to start using ElasticSearch, this book is ideal for you. This book assumes that you've got working knowledge of JSON and Java

  13. Location-based Services using Image Search

    DEFF Research Database (Denmark)

    Vertongen, Pieter-Paulus; Hansen, Dan Witzner

    2008-01-01

    situations, for example in urban environments. We propose a system to provide location-based services using image searches without requiring GPS. The goal of this system is to assist tourists in cities with additional information using their mobile phones and built-in cameras. Based upon the result......Recent developments in image search has made them sufficiently efficient to be used in real-time applications. GPS has become a popular navigation tool. While GPS information provide reasonably good accuracy, they are not always present in all hand held devices nor are they accurate in all...... of the image search engine and database image location knowledge, the location is determined of the query image and associated data can be presented to the user....

  14. Google Power Search

    CERN Document Server

    Spencer, Stephan

    2011-01-01

    Behind Google's deceptively simple interface is immense power for both market and competitive research-if you know how to use it well. Sure, basic searches are easy, but complex searches require specialized skills. This concise book takes you through the full range of Google's powerful search-refinement features, so you can quickly find the specific information you need. Learn techniques ranging from simple Boolean logic to URL parameters and other advanced tools, and see how they're applied to real-world market research examples. Incorporate advanced search operators such as filetype:, intit

  15. Delaying information search

    Directory of Open Access Journals (Sweden)

    Yaniv Shani

    2012-11-01

    Full Text Available In three studies, we examined factors that may temporarily attenuate information search. People are generally curious and dislike uncertainty, which typically encourages them to look for relevant information. Despite these strong forces that promote information search, people sometimes deliberately delay obtaining valuable information. We find they may do so when they are concerned that the information might interfere with future pleasurable activities. Interestingly, the decision to search or to postpone searching for information is influenced not only by the value and importance of the information itself but also by well-being maintenance goals related to possible detrimental effects that negative knowledge may have on unrelated future plans.

  16. Semantic Search among Heterogeneous Biological Databases Based on Gene Ontology

    Institute of Scientific and Technical Information of China (English)

    Shun-Liang CAO; Lei QIN; Wei-Zhong HE; Yang ZHONG; Yang-Yong ZHU; Yi-Xue LI

    2004-01-01

    Semantic search is a key issue in integration of heterogeneous biological databases. In thispaper, we present a methodology for implementing semantic search in BioDW, an integrated biological datawarehouse. Two tables are presented: the DB2GO table to correlate Gene Ontology (GO) annotated entriesfrom BioDW data sources with GO, and the semantic similarity table to record similarity scores derived fromany pair of GO terms. Based on the two tables, multifarious ways for semantic search are provided and thecorresponding entries in heterogeneous biological databases in semantic terms can be expediently searched.

  17. Inequalities between similarities for numerical data

    NARCIS (Netherlands)

    Warrens, Matthijs J.

    2016-01-01

    Similarity measures are entities that can be used to quantify the similarity between two vectors with real numbers. We present inequalities between seven well known similarities. The inequalities are valid if the vectors contain non-negative real numbers.

  18. Quantifying the Search Behaviour of Different Demographics Using Google Correlate.

    Directory of Open Access Journals (Sweden)

    Adrian Letchford

    Full Text Available Vast records of our everyday interests and concerns are being generated by our frequent interactions with the Internet. Here, we investigate how the searches of Google users vary across U.S. states with different birth rates and infant mortality rates. We find that users in states with higher birth rates search for more information about pregnancy, while those in states with lower birth rates search for more information about cats. Similarly, we find that users in states with higher infant mortality rates search for more information about credit, loans and diseases. Our results provide evidence that Internet search data could offer new insight into the concerns of different demographics.

  19. The Hofmethode: Computing Semantic Similarities between E-Learning Products

    Directory of Open Access Journals (Sweden)

    Oliver Michel

    2009-11-01

    Full Text Available The key task in building useful e-learning repositories is to develop a system with an algorithm allowing users to retrieve information that corresponds to their specific requirements. To achieve this, products (or their verbal descriptions, i.e. presented in metadata need to be compared and structured according to the results of this comparison. Such structuring is crucial insofar as there are many search results that correspond to the entered keyword. The Hofmethode is an algorithm (based on psychological considerations to compute semantic similarities between texts and therefore offer a way to compare e-learning products. The computed similarity values are used to build semantic maps in which the products are visually arranged according to their similarities. The paper describes how the Hofmethode is implemented in the online database edulap, and how it contributes to help the user to explore the data in which he is interested.

  20. Adaptive Levy processes and area-restricted search in human foraging.

    Directory of Open Access Journals (Sweden)

    Thomas T Hills

    Full Text Available A considerable amount of research has claimed that animals' foraging behaviors display movement lengths with power-law distributed tails, characteristic of Lévy flights and Lévy walks. Though these claims have recently come into question, the proposal that many animals forage using Lévy processes nonetheless remains. A Lévy process does not consider when or where resources are encountered, and samples movement lengths independently of past experience. However, Lévy processes too have come into question based on the observation that in patchy resource environments resource-sensitive foraging strategies, like area-restricted search, perform better than Lévy flights yet can still generate heavy-tailed distributions of movement lengths. To investigate these questions further, we tracked humans as they searched for hidden resources in an open-field virtual environment, with either patchy or dispersed resource distributions. Supporting previous research, for both conditions logarithmic binning methods were consistent with Lévy flights and rank-frequency methods-comparing alternative distributions using maximum likelihood methods-showed the strongest support for bounded power-law distributions (truncated Lévy flights. However, goodness-of-fit tests found that even bounded power-law distributions only accurately characterized movement behavior for 4 (out of 32 participants. Moreover, paths in the patchy environment (but not the dispersed environment showed a transition to intensive search following resource encounters, characteristic of area-restricted search. Transferring paths between environments revealed that paths generated in the patchy environment were adapted to that environment. Our results suggest that though power-law distributions do not accurately reflect human search, Lévy processes may still describe movement in dispersed environments, but not in patchy environments-where search was area-restricted. Furthermore, our results

  1. SearchResultFinder: federated search made easy

    OpenAIRE

    Trieschnigg, Rudolf Berend; Tjin-Kam-Jet, Kien; Hiemstra, Djoerd

    2013-01-01

    Building a federated search engine based on a large number existing web search engines is a challenge: implementing the programming interface (API) for each search engine is an exacting and time-consuming job. In this demonstration we present SearchResultFinder, a browser plugin which speeds up determining reusable XPaths for extracting search result items from HTML search result pages. Based on a single search result page, the tool presents a ranked list of candidate extraction XPaths and al...

  2. Monad Transformers for Backtracking Search

    Directory of Open Access Journals (Sweden)

    Jules Hedges

    2014-06-01

    Full Text Available This paper extends Escardo and Oliva's selection monad to the selection monad transformer, a general monadic framework for expressing backtracking search algorithms in Haskell. The use of the closely related continuation monad transformer for similar purposes is also discussed, including an implementation of a DPLL-like SAT solver with no explicit recursion. Continuing a line of work exploring connections between selection functions and game theory, we use the selection monad transformer with the nondeterminism monad to obtain an intuitive notion of backward induction for a certain class of nondeterministic games.

  3. Accurate Classification of RNA Structures Using Topological Fingerprints

    Science.gov (United States)

    Li, Kejie; Gribskov, Michael

    2016-01-01

    While RNAs are well known to possess complex structures, functionally similar RNAs often have little sequence similarity. While the exact size and spacing of base-paired regions vary, functionally similar RNAs have pronounced similarity in the arrangement, or topology, of base-paired stems. Furthermore, predicted RNA structures often lack pseudoknots (a crucial aspect of biological activity), and are only partially correct, or incomplete. A topological approach addresses all of these difficulties. In this work we describe each RNA structure as a graph that can be converted to a topological spectrum (RNA fingerprint). The set of subgraphs in an RNA structure, its RNA fingerprint, can be compared with the fingerprints of other RNA structures to identify and correctly classify functionally related RNAs. Topologically similar RNAs can be identified even when a large fraction, up to 30%, of the stems are omitted, indicating that highly accurate structures are not necessary. We investigate the performance of the RNA fingerprint approach on a set of eight highly curated RNA families, with diverse sizes and functions, containing pseudoknots, and with little sequence similarity–an especially difficult test set. In spite of the difficult test set, the RNA fingerprint approach is very successful (ROC AUC > 0.95). Due to the inclusion of pseudoknots, the RNA fingerprint approach both covers a wider range of possible structures than methods based only on secondary structure, and its tolerance for incomplete structures suggests that it can be applied even to predicted structures. Source code is freely available at https://github.rcac.purdue.edu/mgribsko/XIOS_RNA_fingerprint. PMID:27755571

  4. Accurate Telescope Mount Positioning with MEMS Accelerometers

    Science.gov (United States)

    Mészáros, L.; Jaskó, A.; Pál, A.; Csépány, G.

    2014-08-01

    This paper describes the advantages and challenges of applying microelectromechanical accelerometer systems (MEMS accelerometers) in order to attain precise, accurate and stateless positioning of telescope mounts. This provides a completely independent method from other forms of electronic, optical, mechanical or magnetic feedback or real-time astrometry. Our goal is to reach the sub-arcminute range which is well smaller than the field-of-view of conventional imaging telescope systems. Here we present how this sub-arcminute accuracy can be achieved with very cheap MEMS sensors and we also detail how our procedures can be extended in order to attain even finer measurements. In addition, our paper discusses how can a complete system design be implemented in order to be a part of a telescope control system.

  5. Apparatus for accurately measuring high temperatures

    Science.gov (United States)

    Smith, D.D.

    The present invention is a thermometer used for measuring furnace temperatures in the range of about 1800/sup 0/ to 2700/sup 0/C. The thermometer comprises a broadband multicolor thermal radiation sensor positioned to be in optical alignment with the end of a blackbody sight tube extending into the furnace. A valve-shutter arrangement is positioned between the radiation sensor and the sight tube and a chamber for containing a charge of high pressure gas is positioned between the valve-shutter arrangement and the radiation sensor. A momentary opening of the valve shutter arrangement allows a pulse of the high gas to purge the sight tube of air-borne thermal radiation contaminants which permits the radiation sensor to accurately measure the thermal radiation emanating from the end of the sight tube.

  6. Toward Accurate and Quantitative Comparative Metagenomics

    Science.gov (United States)

    Nayfach, Stephen; Pollard, Katherine S.

    2016-01-01

    Shotgun metagenomics and computational analysis are used to compare the taxonomic and functional profiles of microbial communities. Leveraging this approach to understand roles of microbes in human biology and other environments requires quantitative data summaries whose values are comparable across samples and studies. Comparability is currently hampered by the use of abundance statistics that do not estimate a meaningful parameter of the microbial community and biases introduced by experimental protocols and data-cleaning approaches. Addressing these challenges, along with improving study design, data access, metadata standardization, and analysis tools, will enable accurate comparative metagenomics. We envision a future in which microbiome studies are replicable and new metagenomes are easily and rapidly integrated with existing data. Only then can the potential of metagenomics for predictive ecological modeling, well-powered association studies, and effective microbiome medicine be fully realized. PMID:27565341

  7. Accurate renormalization group analyses in neutrino sector

    Energy Technology Data Exchange (ETDEWEB)

    Haba, Naoyuki [Graduate School of Science and Engineering, Shimane University, Matsue 690-8504 (Japan); Kaneta, Kunio [Kavli IPMU (WPI), The University of Tokyo, Kashiwa, Chiba 277-8568 (Japan); Takahashi, Ryo [Graduate School of Science and Engineering, Shimane University, Matsue 690-8504 (Japan); Yamaguchi, Yuya [Department of Physics, Faculty of Science, Hokkaido University, Sapporo 060-0810 (Japan)

    2014-08-15

    We investigate accurate renormalization group analyses in neutrino sector between ν-oscillation and seesaw energy scales. We consider decoupling effects of top quark and Higgs boson on the renormalization group equations of light neutrino mass matrix. Since the decoupling effects are given in the standard model scale and independent of high energy physics, our method can basically apply to any models beyond the standard model. We find that the decoupling effects of Higgs boson are negligible, while those of top quark are not. Particularly, the decoupling effects of top quark affect neutrino mass eigenvalues, which are important for analyzing predictions such as mass squared differences and neutrinoless double beta decay in an underlying theory existing at high energy scale.

  8. Classification of similar medical images in the lifting domain

    Science.gov (United States)

    Sallee, Chad W.; Tashakkori, Rahman

    2002-03-01

    In this paper lifting is used for similarity analysis and classification of sets of similar medical images. The lifting scheme is an invertible wavelet transform that maps integers to integers. Lifting provides efficient in-place calculation of transfer coefficients and is widely used for analysis of similar image sets. Images of a similar set show high degrees of correlation with one another. The inter-set redundancy can be exploited for the purposes of prediction, compression, feature extraction, and classification. This research intends to show that there is a higher degree of correlation between images of a similar set in the lifting domain than in the pixel domain. Such a high correlation will result in more accurate classification and prediction of images in a similar set. Several lifting schemes from Calderbank-Daubechies-Fauveue's family were used in this research. The research shows that some of these lifting schemes decorrelates the images of similar sets more effectively than others. The research presents the statistical analysis of the data in scatter plots and regression models.

  9. Accurate Weather Forecasting for Radio Astronomy

    Science.gov (United States)

    Maddalena, Ronald J.

    2010-01-01

    The NRAO Green Bank Telescope routinely observes at wavelengths from 3 mm to 1 m. As with all mm-wave telescopes, observing conditions depend upon the variable atmospheric water content. The site provides over 100 days/yr when opacities are low enough for good observing at 3 mm, but winds on the open-air structure reduce the time suitable for 3-mm observing where pointing is critical. Thus, to maximum productivity the observing wavelength needs to match weather conditions. For 6 years the telescope has used a dynamic scheduling system (recently upgraded; www.gb.nrao.edu/DSS) that requires accurate multi-day forecasts for winds and opacities. Since opacity forecasts are not provided by the National Weather Services (NWS), I have developed an automated system that takes available forecasts, derives forecasted opacities, and deploys the results on the web in user-friendly graphical overviews (www.gb.nrao.edu/ rmaddale/Weather). The system relies on the "North American Mesoscale" models, which are updated by the NWS every 6 hrs, have a 12 km horizontal resolution, 1 hr temporal resolution, run to 84 hrs, and have 60 vertical layers that extend to 20 km. Each forecast consists of a time series of ground conditions, cloud coverage, etc, and, most importantly, temperature, pressure, humidity as a function of height. I use the Liebe's MWP model (Radio Science, 20, 1069, 1985) to determine the absorption in each layer for each hour for 30 observing wavelengths. Radiative transfer provides, for each hour and wavelength, the total opacity and the radio brightness of the atmosphere, which contributes substantially at some wavelengths to Tsys and the observational noise. Comparisons of measured and forecasted Tsys at 22.2 and 44 GHz imply that the forecasted opacities are good to about 0.01 Nepers, which is sufficient for forecasting and accurate calibration. Reliability is high out to 2 days and degrades slowly for longer-range forecasts.

  10. Citation Searching: Search Smarter & Find More

    Science.gov (United States)

    Hammond, Chelsea C.; Brown, Stephanie Willen

    2008-01-01

    The staff at University of Connecticut are participating in Elsevier's Student Ambassador Program (SAmP) in which graduate students train their peers on "citation searching" research using Scopus and Web of Science, two tremendous citation databases. They are in the fourth semester of these training programs, and they are wildly successful: They…

  11. Citation Searching: Search Smarter & Find More

    Science.gov (United States)

    Hammond, Chelsea C.; Brown, Stephanie Willen

    2008-01-01

    The staff at University of Connecticut are participating in Elsevier's Student Ambassador Program (SAmP) in which graduate students train their peers on "citation searching" research using Scopus and Web of Science, two tremendous citation databases. They are in the fourth semester of these training programs, and they are wildly successful: They…

  12. Fast Structural Search in Phylogenetic Databases

    Directory of Open Access Journals (Sweden)

    William H. Piel

    2005-01-01

    Full Text Available As the size of phylogenetic databases grows, the need for efficiently searching these databases arises. Thanks to previous and ongoing research, searching by attribute value and by text has become commonplace in these databases. However, searching by topological or physical structure, especially for large databases and especially for approximate matches, is still an art. We propose structural search techniques that, given a query or pattern tree P and a database of phylogenies D, find trees in D that are sufficiently close to P . The “closeness” is a measure of the topological relationships in P that are found to be the same or similar in a tree D in D. We develop a filtering technique that accelerates searches and present algorithms for rooted and unrooted trees where the trees can be weighted or unweighted. Experimental results on comparing the similarity measure with existing tree metrics and on evaluating the efficiency of the search techniques demonstrate that the proposed approach is promising

  13. A Distributed Weighted Voting Approach for Accurate Eye Center Estimation

    Directory of Open Access Journals (Sweden)

    Gagandeep Singh

    2013-05-01

    Full Text Available This paper proposes a novel approach for accurate estimation of eye center in face images. A distributed voting based approach in which every pixel votes is adopted for potential eye center candidates. The votes are distributed over a subset of pixels which lie in a direction which is opposite to gradient direction and the weightage of votes is distributed according to a novel mechanism.  First, image is normalized to eliminate illumination variations and its edge map is generated using Canny edge detector. Distributed voting is applied on the edge image to generate different eye center candidates. Morphological closing and local maxima search are used to reduce the number of candidates. A classifier based on spatial and intensity information is used to choose the correct candidates for the locations of eye center. The proposed approach was tested on BioID face database and resulted in better Iris detection rate than the state-of-the-art. The proposed approach is robust against illumination variation, small pose variations, presence of eye glasses and partial occlusion of eyes.Defence Science Journal, 2013, 63(3, pp.292-297, DOI:http://dx.doi.org/10.14429/dsj.63.2763

  14. The FLUKA Code: An Accurate Simulation Tool for Particle Therapy.

    Science.gov (United States)

    Battistoni, Giuseppe; Bauer, Julia; Boehlen, Till T; Cerutti, Francesco; Chin, Mary P W; Dos Santos Augusto, Ricardo; Ferrari, Alfredo; Ortega, Pablo G; Kozłowska, Wioletta; Magro, Giuseppe; Mairani, Andrea; Parodi, Katia; Sala, Paola R; Schoofs, Philippe; Tessonnier, Thomas; Vlachoudis, Vasilis

    2016-01-01

    Monte Carlo (MC) codes are increasingly spreading in the hadrontherapy community due to their detailed description of radiation transport and interaction with matter. The suitability of a MC code for application to hadrontherapy demands accurate and reliable physical models capable of handling all components of the expected radiation field. This becomes extremely important for correctly performing not only physical but also biologically based dose calculations, especially in cases where ions heavier than protons are involved. In addition, accurate prediction of emerging secondary radiation is of utmost importance in innovative areas of research aiming at in vivo treatment verification. This contribution will address the recent developments of the FLUKA MC code and its practical applications in this field. Refinements of the FLUKA nuclear models in the therapeutic energy interval lead to an improved description of the mixed radiation field as shown in the presented benchmarks against experimental data with both (4)He and (12)C ion beams. Accurate description of ionization energy losses and of particle scattering and interactions lead to the excellent agreement of calculated depth-dose profiles with those measured at leading European hadron therapy centers, both with proton and ion beams. In order to support the application of FLUKA in hospital-based environments, Flair, the FLUKA graphical interface, has been enhanced with the capability of translating CT DICOM images into voxel-based computational phantoms in a fast and well-structured way. The interface is capable of importing also radiotherapy treatment data described in DICOM RT standard. In addition, the interface is equipped with an intuitive PET scanner geometry generator and automatic recording of coincidence events. Clinically, similar cases will be presented both in terms of absorbed dose and biological dose calculations describing the various available features.

  15. GeoSearch: A lightweight broking middleware for geospatial resources discovery

    Science.gov (United States)

    Gui, Z.; Yang, C.; Liu, K.; Xia, J.

    2012-12-01

    With petabytes of geodata, thousands of geospatial web services available over the Internet, it is critical to support geoscience research and applications by finding the best-fit geospatial resources from the massive and heterogeneous resources. Past decades' developments witnessed the operation of many service components to facilitate geospatial resource management and discovery. However, efficient and accurate geospatial resource discovery is still a big challenge due to the following reasons: 1)The entry barriers (also called "learning curves") hinder the usability of discovery services to end users. Different portals and catalogues always adopt various access protocols, metadata formats and GUI styles to organize, present and publish metadata. It is hard for end users to learn all these technical details and differences. 2)The cost for federating heterogeneous services is high. To provide sufficient resources and facilitate data discovery, many registries adopt periodic harvesting mechanism to retrieve metadata from other federated catalogues. These time-consuming processes lead to network and storage burdens, data redundancy, and also the overhead of maintaining data consistency. 3)The heterogeneous semantics issues in data discovery. Since the keyword matching is still the primary search method in many operational discovery services, the search accuracy (precision and recall) is hard to guarantee. Semantic technologies (such as semantic reasoning and similarity evaluation) offer a solution to solve these issues. However, integrating semantic technologies with existing service is challenging due to the expandability limitations on the service frameworks and metadata templates. 4)The capabilities to help users make final selection are inadequate. Most of the existing search portals lack intuitive and diverse information visualization methods and functions (sort, filter) to present, explore and analyze search results. Furthermore, the presentation of the value

  16. Dual Target Search is Neither Purely Simultaneous nor Purely Successive.

    Science.gov (United States)

    Cave, Kyle R; Menneer, Tamaryn; Nomani, Mohammad S; Stroud, Michael J; Donnelly, Nick

    2017-08-31

    Previous research shows that visual search for two different targets is less efficient than search for a single target. Stroud, Menneer, Cave and Donnelly (2012) concluded that two target colours are represented separately based on modeling the fixation patterns. Although those analyses provide evidence for two separate target representations, they do not show whether participants search simultaneously for both targets, or first search for one target and then the other. Some studies suggest that multiple target representations are simultaneously active, while others indicate that search can be voluntarily simultaneous, or switching, or a mixture of both. Stroud et al.'s participants were not explicitly instructed to use any particular strategy. These data were revisited to determine which strategy was employed. Each fixated item was categorised according to whether its colour was more similar to one target or the other. Once an item similar to one target is fixated, the next fixated item is more likely to be similar to that target than the other, showing that at a given moment during search, one target is generally favoured. However, the search for one target is not completed before search for the other begins. Instead, there are often short runs of one or two fixations to distractors similar to one target, with each run followed by a switch to the other target. Thus, the results suggest that one target is more highly weighted than the other at any given time, but not to the extent that search is purely successive.

  17. Human memory search

    NARCIS (Netherlands)

    Davelaar, E.J.; Raaijmakers, J.G.W.; Hills, T.T.; Robbins, T.W.; Todd, P.M.

    2012-01-01

    The importance of understanding human memory search is hard to exaggerate: we build and live our lives based on what whe remember. This chapter explores the characteristics of memory search, with special emphasis on the use of retrieval cues. We introduce the dependent measures that are obtained

  18. Distributed Deep Web Search

    NARCIS (Netherlands)

    Tjin-Kam-Jet, Kien

    2013-01-01

    The World Wide Web contains billions of documents (and counting); hence, it is likely that some document will contain the answer or content you are searching for. While major search engines like Bing and Google often manage to return relevant results to your query, there are plenty of situations in

  19. With News Search Engines

    Science.gov (United States)

    Gunn, Holly

    2005-01-01

    Although there are many news search engines on the Web, finding the news items one wants can be challenging. Choosing appropriate search terms is one of the biggest challenges. Unless one has seen the article that one is seeking, it is often difficult to select words that were used in the headline or text of the article. The limited archives of…

  20. Towards Accessible Search Systems

    NARCIS (Netherlands)

    Serdyukov, Pavel; Hiemstra, Djoerd; Ruthven, Ian

    2010-01-01

    The SIGIR workshop Towards Accessible Search Systems was the first workshop in the field to raise the discussion on how to make search engines accessible for different types of users. We report on the results of the workshop that was held on 23 July 2010 in conjunction with the 33rd Annual ACM SIGIR

  1. Searches for Supersymmetry

    CERN Document Server

    Ventura, Andrea; The ATLAS collaboration

    2017-01-01

    New and recents results on Supersymmetry searches are shown for the ATLAS and the CMS experiments. Analyses with about 36 fb^-1 are considered for searches concerning light squarks and gluinos, direct pair production of 3rd generation squarks, electroweak production of charginos, neutralinos, sleptons, R-parity violating scenarios and long-lived particles.

  2. Distributed deep web search

    NARCIS (Netherlands)

    Tjin-Kam-Jet, Kien-Tsoi Theodorus Egbert

    2013-01-01

    The World Wide Web contains billions of documents (and counting); hence, it is likely that some document will contain the answer or content you are searching for. While major search engines like Bing and Google often manage to return relevant results to your query, there are plenty of situations in

  3. Fixing Dataset Search

    Science.gov (United States)

    Lynnes, Chris

    2014-01-01

    Three current search engines are queried for ozone data at the GES DISC. The results range from sub-optimal to counter-intuitive. We propose a method to fix dataset search by implementing a robust relevancy ranking scheme. The relevancy ranking scheme is based on several heuristics culled from more than 20 years of helping users select datasets.

  4. ElasticSearch cookbook

    CERN Document Server

    Paro, Alberto

    2015-01-01

    If you are a developer who implements ElasticSearch in your web applications and want to sharpen your understanding of the core elements and applications, this is the book for you. It is assumed that you've got working knowledge of JSON and, if you want to extend ElasticSearch, of Java and related technologies.

  5. Search and Recommendation

    DEFF Research Database (Denmark)

    Bogers, Toine

    2014-01-01

    -scale application by companies like Amazon, Facebook, and Netflix. But are search and recommendation really two different fields of research that address different problems with different sets of algorithms in papers published at distinct conferences? In my talk, I want to argue that search and recommendation...

  6. Approaching system equilibrium with accurate or not accurate feedback information in a two-route system

    Science.gov (United States)

    Zhao, Xiao-mei; Xie, Dong-fan; Li, Qi

    2015-02-01

    With the development of intelligent transport system, advanced information feedback strategies have been developed to reduce traffic congestion and enhance the capacity. However, previous strategies provide accurate information to travelers and our simulation results show that accurate information brings negative effects, especially in delay case. Because travelers prefer to the best condition route with accurate information, and delayed information cannot reflect current traffic condition but past. Then travelers make wrong routing decisions, causing the decrease of the capacity and the increase of oscillations and the system deviating from the equilibrium. To avoid the negative effect, bounded rationality is taken into account by introducing a boundedly rational threshold BR. When difference between two routes is less than the BR, routes have equal probability to be chosen. The bounded rationality is helpful to improve the efficiency in terms of capacity, oscillation and the gap deviating from the system equilibrium.

  7. Accurate measurement of streamwise vortices using dual-plane PIV

    Science.gov (United States)

    Waldman, Rye M.; Breuer, Kenneth S.

    2012-11-01

    Low Reynolds number aerodynamic experiments with flapping animals (such as bats and small birds) are of particular interest due to their application to micro air vehicles which operate in a similar parameter space. Previous PIV wake measurements described the structures left by bats and birds and provided insight into the time history of their aerodynamic force generation; however, these studies have faced difficulty drawing quantitative conclusions based on said measurements. The highly three-dimensional and unsteady nature of the flows associated with flapping flight are major challenges for accurate measurements. The challenge of animal flight measurements is finding small flow features in a large field of view at high speed with limited laser energy and camera resolution. Cross-stream measurement is further complicated by the predominately out-of-plane flow that requires thick laser sheets and short inter-frame times, which increase noise and measurement uncertainty. Choosing appropriate experimental parameters requires compromise between the spatial and temporal resolution and the dynamic range of the measurement. To explore these challenges, we do a case study on the wake of a fixed wing. The fixed model simplifies the experiment and allows direct measurements of the aerodynamic forces via load cell. We present a detailed analysis of the wake measurements, discuss the criteria for making accurate measurements, and present a solution for making quantitative aerodynamic load measurements behind free-flyers.

  8. Tales from the Field: Search Strategies Applied in Web Searching

    Directory of Open Access Journals (Sweden)

    Soohyung Joo

    2010-08-01

    Full Text Available In their web search processes users apply multiple types of search strategies, which consist of different search tactics. This paper identifies eight types of information search strategies with associated cases based on sequences of search tactics during the information search process. Thirty-one participants representing the general public were recruited for this study. Search logs and verbal protocols offered rich data for the identification of different types of search strategies. Based on the findings, the authors further discuss how to enhance web-based information retrieval (IR systems to support each type of search strategy.

  9. Self-similarity of complex networks and hidden metric spaces

    CERN Document Server

    Serrano, M Angeles; Boguna, Marian

    2007-01-01

    We demonstrate that the self-similarity of some scale-free networks with respect to a simple degree-thresholding renormalization scheme finds a natural interpretation in the assumption that network nodes exist in hidden metric spaces. Clustering, i.e., cycles of length three, plays a crucial role in this framework as a topological reflection of the triangle inequality in the hidden geometry. We prove that a class of hidden variable models with underlying metric spaces are able to accurately reproduce the self-similarity properties that we measured in the real networks. Our findings indicate that hidden geometries underlying these real networks are a plausible explanation for their observed topologies and, in particular, for their self-similarity with respect to the degree-based renormalization.

  10. JSC Search System Usability Case Study

    Science.gov (United States)

    Meza, David; Berndt, Sarah

    2014-01-01

    The advanced nature of "search" has facilitated the movement from keyword match to the delivery of every conceivable information topic from career, commerce, entertainment, learning... the list is infinite. At NASA Johnson Space Center (JSC ) the Search interface is an important means of knowledge transfer. By indexing multiple sources between directorates and organizations, the system's potential is culture changing in that through search, knowledge of the unique accomplishments in engineering and science can be seamlessly passed between generations. This paper reports the findings of an initial survey, the first of a four part study to help determine user sentiment on the intranet, or local (JSC) enterprise search environment as well as the larger NASA enterprise. The survey is a means through which end users provide direction on the development and transfer of knowledge by way of the search experience. The ideal is to identify what is working and what needs to be improved from the users' vantage point by documenting: (1) Where users are satisfied/dissatisfied (2) Perceived value of interface components (3) Gaps which cause any disappointment in search experience. The near term goal is it to inform JSC search in order to improve users' ability to utilize existing services and infrastructure to perform tasks with a shortened life cycle. Continuing steps include an agency based focus with modified questions to accomplish a similar purpose

  11. Interactive searching of facial image databases

    Science.gov (United States)

    Nicholls, Robert A.; Shepherd, John W.; Shepherd, Jean

    1995-09-01

    A set of psychological facial descriptors has been devised to enable computerized searching of criminal photograph albums. The descriptors have been used to encode image databased of up to twelve thousand images. Using a system called FACES, the databases are searched by translating a witness' verbal description into corresponding facial descriptors. Trials of FACES have shown that this coding scheme is more productive and efficient than searching traditional photograph albums. An alternative method of searching the encoded database using a genetic algorithm is currenly being tested. The genetic search method does not require the witness to verbalize a description of the target but merely to indicate a degree of similarity between the target and a limited selection of images from the database. The major drawback of FACES is that is requires a manual encoding of images. Research is being undertaken to automate the process, however, it will require an algorithm which can predict human descriptive values. Alternatives to human derived coding schemes exist using statistical classifications of images. Since databases encoded using statistical classifiers do not have an obvious direct mapping to human derived descriptors, a search method which does not require the entry of human descriptors is required. A genetic search algorithm is being tested for such a purpose.

  12. Search and Disrupt

    DEFF Research Database (Denmark)

    Ørding Olsen, Anders

    This paper analyzes how external search is affected by strategic interest alignment among knowledge sources. I focus on misalignment arising from the heterogeneous effects of disruptive technologies by analyzing the influence of incumbents on 2,855 non-incumbents? external knowledge search efforts....... The efforts most likely to solve innovation problems obtained funding from the European Commission?s 7th Framework Program (2007-2013). The results show that involving incumbents improves search in complementary technologies, while demoting it when strategic interests are misaligned in disruptive technologies....... However, incumbent sources engaged in capability reconfiguration to accommodate disruption improve search efforts in disruptive technologies. The paper concludes that the value of external sources is contingent on more than their knowledge. Specifically, interdependence of sources in search gives rise...

  13. Search and Disrupt

    DEFF Research Database (Denmark)

    Ørding Olsen, Anders

    This paper analyzes how external search is affected by strategic interest alignment among knowledge sources. I focus on misalignment arising from the heterogeneous effects of disruptive technologies by analyzing the influence of incumbents on 2,855 non-incumbents? external knowledge search efforts....... The efforts most likely to solve innovation problems obtained funding from the European Commission?s 7th Framework Program (2007-2013). The results show that involving incumbents improves search in complementary technologies, while demoting it when strategic interests are misaligned in disruptive technologies....... However, incumbent sources engaged in capability reconfiguration to accommodate disruption improve search efforts in disruptive technologies. The paper concludes that the value of external sources is contingent on more than their knowledge. Specifically, interdependence of sources in search gives rise...

  14. Accurate complex scaling of three dimensional numerical potentials.

    Science.gov (United States)

    Cerioni, Alessandro; Genovese, Luigi; Duchemin, Ivan; Deutsch, Thierry

    2013-05-28

    The complex scaling method, which consists in continuing spatial coordinates into the complex plane, is a well-established method that allows to compute resonant eigenfunctions of the time-independent Schrödinger operator. Whenever it is desirable to apply the complex scaling to investigate resonances in physical systems defined on numerical discrete grids, the most direct approach relies on the application of a similarity transformation to the original, unscaled Hamiltonian. We show that such an approach can be conveniently implemented in the Daubechies wavelet basis set, featuring a very promising level of generality, high accuracy, and no need for artificial convergence parameters. Complex scaling of three dimensional numerical potentials can be efficiently and accurately performed. By carrying out an illustrative resonant state computation in the case of a one-dimensional model potential, we then show that our wavelet-based approach may disclose new exciting opportunities in the field of computational non-Hermitian quantum mechanics.

  15. Further investigation on adaptive search

    Directory of Open Access Journals (Sweden)

    Ming Hong Pi

    2014-05-01

    Full Text Available Adaptive search is one of the fastest fractal compression algorithms and has gained great success in many industrial applications. By substituting the luminance offset by the range block mean, the authors create a completely new version for both the encoding and decoding algorithms. In this paper, theoretically, they prove that the proposed decoding algorithm converges at least as fast as the existing decoding algorithms using the luminance offset. In addition, they prove that the attractor of the decoding algorithm can be represented by a linear combination of range-averaged images. These theorems are very important contributions to the theory and applications of fractal image compression. As a result, the decoding image can be represented as the sum of the DC and AC component images, which is similar with discrete cosine transform or wavelet transform. To further speed up this algorithm and reduce the complexity of range and domain blocks matching, they propose two improvements in this paper, that is, employing the post-quantisation and geometric neighbouring local search to replace the currently used pre-quantisation and the global search, respectively. The corresponding experimental results show the proposed encoding and decoding algorithms can provide a better performance compared with the existing algorithms.

  16. Job Search, Search Intensity, and Labor Market Transitions

    Science.gov (United States)

    Bloemen, Hans G.

    2005-01-01

    Job searches by both the unemployed and employed jobseekers are studied through an empirical structural job search model using a choice variable of search intensity. The resulting influence of search intensity on the labor market transitions is analyzed to give the estimation results of the search and the impact of the benefit level on the search…

  17. How Users Search the Library from a Single Search Box

    Science.gov (United States)

    Lown, Cory; Sierra, Tito; Boyer, Josh

    2013-01-01

    Academic libraries are turning increasingly to unified search solutions to simplify search and discovery of library resources. Unfortunately, very little research has been published on library user search behavior in single search box environments. This study examines how users search a large public university library using a prominent, single…

  18. Similarity landscapes: An improved method for scientific visualization of information from protein and DNA database searches

    Energy Technology Data Exchange (ETDEWEB)

    Dogget, N.; Myers, G. [Los Alamos National Lab., NM (United States); Wills, C.J. [Univ. of California, San Diego, CA (United States)

    1998-12-01

    This is the final report of a three-year, Laboratory Directed Research and Development (LDRD) project at the Los Alamos National Laboratory (LANL). The authors have used computer simulations and examination of a variety of databases to answer questions about a wide range of evolutionary questions. The authors have found that there is a clear distinction in the evolution of HIV-1 and HIV-2, with the former and more virulent virus evolving more rapidly at a functional level. The authors have discovered highly non-random patterns in the evolution of HIV-1 that can be attributed to a variety of selective pressures. In the course of examination of microsatellite DNA (short repeat regions) in microorganisms, the authors have found clear differences between prokaryotes and eukaryotes in their distribution, differences that can be tied to different selective pressures. They have developed a new method (topiary pruning) for enhancing the phylogenetic information contained in DNA sequences. Most recently, the authors have discovered effects in complex rainforest ecosystems that indicate strong frequency-dependent interactions between host species and their parasites, leading to the maintenance of ecosystem variability.

  19. NFFinder: an online bioinformatics tool for searching similar transcriptomics experiments in the context of drug repositioning.

    Science.gov (United States)

    Setoain, Javier; Franch, Mònica; Martínez, Marta; Tabas-Madrid, Daniel; Sorzano, Carlos O S; Bakker, Annette; Gonzalez-Couto, Eduardo; Elvira, Juan; Pascual-Montano, Alberto

    2015-07-01

    Drug repositioning, using known drugs for treating conditions different from those the drug was originally designed to treat, is an important drug discovery tool that allows for a faster and cheaper development process by using drugs that are already approved or in an advanced trial stage for another purpose. This is especially relevant for orphan diseases because they affect too few people to make drug research de novo economically viable. In this paper we present NFFinder, a bioinformatics tool for identifying potential useful drugs in the context of orphan diseases. NFFinder uses transcriptomic data to find relationships between drugs, diseases and a phenotype of interest, as well as identifying experts having published on that domain. The application shows in a dashboard a series of graphics and tables designed to help researchers formulate repositioning hypotheses and identify potential biological relationships between drugs and diseases. NFFinder is freely available at http://nffinder.cnb.csic.es.

  20. Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction

    DEFF Research Database (Denmark)

    Wichterich, Marc; Assent, Ira; Philipp, Kranen

    2008-01-01

    dimensionality reduction techniques for the EMD in a filter-and-refine architecture for efficient lossless retrieval. Thorough experimental evaluation on real world data sets demonstrates a substantial reduction of the number of expensive high-dimensional EMD computations and thus remarkably faster response...

  1. Searching for similarities : transfer-oriented learning in health education at secondary schools

    NARCIS (Netherlands)

    Peters, L.W.H.

    2012-01-01

    Het voortgezet onderwijs wordt overladen met lespakketten over gezond gedrag. Deze richten zich veelal op enkelvoudige gedragsdomeinen (bijvoorbeeld roken, alcohol, voeding of veilige seks). Het zou efficiënter zijn als één lespakket, met beperkte lestijd, effecten op meerdere gedragsdomeinen tegeli

  2. Content Based Retrieval Database Management System with Support for Similarity Searching and Query Refinement

    Science.gov (United States)

    2002-01-01

    friendship and compre- hension. Particularly I want to thank Ulises Cervantes, Miguel Valdez, Gabriel Lopez , Olga Baos, Seguti, Erick and Javiera...Database System: The Story of O2. The Morgan Kaufmann Series in Data Management Systems, May 1992. [10] D. Barbara, H. Garcia -Molina, and D. Porter...Sessions, page 360, 1994. [57] Hector Garcia -Molina, Jeffrey D. Ullman, and Jennifer Widom. Database System Implemen- tation. Prentice Hall, 1999. [58

  3. HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool

    National Research Council Canada - National Science Library

    O'Driscoll, Aisling; Belogrudov, Vladislav; Carroll, John; Kropp, Kai; Walsh, Paul; Ghazal, Peter; Sleator, Roy D

    2015-01-01

    ... large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is...

  4. Data mining technique for fast retrieval of similar waveforms in Fusion massive databases

    Energy Technology Data Exchange (ETDEWEB)

    Vega, J. [Asociacion EURATOM/CIEMAT Para Fusion, Madrid (Spain)], E-mail: jesus.vega@ciemat.es; Pereira, A.; Portas, A. [Asociacion EURATOM/CIEMAT Para Fusion, Madrid (Spain); Dormido-Canto, S.; Farias, G.; Dormido, R.; Sanchez, J.; Duro, N. [Departamento de Informatica y Automatica, UNED, Madrid (Spain); Santos, M. [Departamento de Arquitectura de Computadores y Automatica, UCM, Madrid (Spain); Sanchez, E. [Asociacion EURATOM/CIEMAT Para Fusion, Madrid (Spain); Pajares, G. [Departamento de Arquitectura de Computadores y Automatica, UCM, Madrid (Spain)

    2008-01-15

    Fusion measurement systems generate similar waveforms for reproducible behavior. A major difficulty related to data analysis is the identification, in a rapid and automated way, of a set of discharges with comparable behaviour, i.e. discharges with 'similar' waveforms. Here we introduce a new technique for rapid searching and retrieval of 'similar' signals. The approach consists of building a classification system that avoids traversing the whole database looking for similarities. The classification system diminishes the problem dimensionality (by means of waveform feature extraction) and reduces the searching space to just the most probable 'similar' waveforms (clustering techniques). In the searching procedure, the input waveform is classified in any of the existing clusters. Then, a similarity measure is computed between the input signal and all cluster elements in order to identify the most similar waveforms. The inner product of normalized vectors is used as the similarity measure as it allows the searching process to be independent of signal gain and polarity. This development has been applied recently to TJ-II stellarator databases and has been integrated into its remote participation system.

  5. A New Method for Measuring Text Similarity in Learning Management Systems Using WordNet

    Science.gov (United States)

    Alkhatib, Bassel; Alnahhas, Ammar; Albadawi, Firas

    2014-01-01

    As text sources are getting broader, measuring text similarity is becoming more compelling. Automatic text classification, search engines and auto answering systems are samples of applications that rely on text similarity. Learning management systems (LMS) are becoming more important since electronic media is getting more publicly available. As…

  6. Chemical-text hybrid search engines.

    Science.gov (United States)

    Zhou, Yingyao; Zhou, Bin; Jiang, Shumei; King, Frederick J

    2010-01-01

    As the amount of chemical literature increases, it is critical that researchers be enabled to accurately locate documents related to a particular aspect of a given compound. Existing solutions, based on text and chemical search engines alone, suffer from the inclusion of "false negative" and "false positive" results, and cannot accommodate diverse repertoire of formats currently available for chemical documents. To address these concerns, we developed an approach called Entity-Canonical Keyword Indexing (ECKI), which converts a chemical entity embedded in a data source into its canonical keyword representation prior to being indexed by text search engines. We implemented ECKI using Microsoft Office SharePoint Server Search, and the resultant hybrid search engine not only supported complex mixed chemical and keyword queries but also was applied to both intranet and Internet environments. We envision that the adoption of ECKI will empower researchers to pose more complex search questions that were not readily attainable previously and to obtain answers at much improved speed and accuracy.

  7. Generating Personalized Web Search Using Semantic Context

    Directory of Open Access Journals (Sweden)

    Zheng Xu

    2015-01-01

    Full Text Available The “one size fits the all” criticism of search engines is that when queries are submitted, the same results are returned to different users. In order to solve this problem, personalized search is proposed, since it can provide different search results based upon the preferences of users. However, existing methods concentrate more on the long-term and independent user profile, and thus reduce the effectiveness of personalized search. In this paper, the method captures the user context to provide accurate preferences of users for effectively personalized search. First, the short-term query context is generated to identify related concepts of the query. Second, the user context is generated based on the click through data of users. Finally, a forgetting factor is introduced to merge the independent user context in a user session, which maintains the evolution of user preferences. Experimental results fully confirm that our approach can successfully represent user context according to individual user information needs.

  8. Similarity of TIMSS Math and Science Achievement of Nations

    Directory of Open Access Journals (Sweden)

    Algirdas Zabulionis

    2001-09-01

    Full Text Available In 1991-97, the International Association for the Evaluation of Educational Achievement (IEA undertook a Third International Mathematics and Science Study (TIMSS in which data about the mathematics and science achievement of the thirteen year-old students in more than 40 countries were collected. These data provided the opportunity to search for patterns of students' answers to the test items: which group of items was relatively more difficult (or more easy for the students from a particular country (or group of countries. Using this massive data set an attempt was made to measure the similarities among country profiles of how students responded to the test items.

  9. Accurate lineshape spectroscopy and the Boltzmann constant.

    Science.gov (United States)

    Truong, G-W; Anstie, J D; May, E F; Stace, T M; Luiten, A N

    2015-10-14

    Spectroscopy has an illustrious history delivering serendipitous discoveries and providing a stringent testbed for new physical predictions, including applications from trace materials detection, to understanding the atmospheres of stars and planets, and even constraining cosmological models. Reaching fundamental-noise limits permits optimal extraction of spectroscopic information from an absorption measurement. Here, we demonstrate a quantum-limited spectrometer that delivers high-precision measurements of the absorption lineshape. These measurements yield a very accurate measurement of the excited-state (6P1/2) hyperfine splitting in Cs, and reveals a breakdown in the well-known Voigt spectral profile. We develop a theoretical model that accounts for this breakdown, explaining the observations to within the shot-noise limit. Our model enables us to infer the thermal velocity dispersion of the Cs vapour with an uncertainty of 35 p.p.m. within an hour. This allows us to determine a value for Boltzmann's constant with a precision of 6 p.p.m., and an uncertainty of 71 p.p.m.

  10. MEMS accelerometers in accurate mount positioning systems

    Science.gov (United States)

    Mészáros, László; Pál, András.; Jaskó, Attila

    2014-07-01

    In order to attain precise, accurate and stateless positioning of telescope mounts we apply microelectromechanical accelerometer systems (also known as MEMS accelerometers). In common practice, feedback from the mount position is provided by electronic, optical or magneto-mechanical systems or via real-time astrometric solution based on the acquired images. Hence, MEMS-based systems are completely independent from these mechanisms. Our goal is to investigate the advantages and challenges of applying such devices and to reach the sub-arcminute range { that is well smaller than the field-of-view of conventional imaging telescope systems. We present how this sub-arcminute accuracy can be achieved with very cheap MEMS sensors. Basically, these sensors yield raw output within an accuracy of a few degrees. We show what kind of calibration procedures could exploit spherical and cylindrical constraints between accelerometer output channels in order to achieve the previously mentioned accuracy level. We also demonstrate how can our implementation be inserted in a telescope control system. Although this attainable precision is less than both the resolution of telescope mount drive mechanics and the accuracy of astrometric solutions, the independent nature of attitude determination could significantly increase the reliability of autonomous or remotely operated astronomical observations.

  11. Does a pneumotach accurately characterize voice function?

    Science.gov (United States)

    Walters, Gage; Krane, Michael

    2016-11-01

    A study is presented which addresses how a pneumotach might adversely affect clinical measurements of voice function. A pneumotach is a device, typically a mask, worn over the mouth, in order to measure time-varying glottal volume flow. By measuring the time-varying difference in pressure across a known aerodynamic resistance element in the mask, the glottal volume flow waveform is estimated. Because it adds aerodynamic resistance to the vocal system, there is some concern that using a pneumotach may not accurately portray the behavior of the voice. To test this hypothesis, experiments were performed in a simplified airway model with the principal dimensions of an adult human upper airway. A compliant constriction, fabricated from silicone rubber, modeled the vocal folds. Variations of transglottal pressure, time-averaged volume flow, model vocal fold vibration amplitude, and radiated sound with subglottal pressure were performed, with and without the pneumotach in place, and differences noted. Acknowledge support of NIH Grant 2R01DC005642-10A1.

  12. Towards Accurate Modeling of Moving Contact Lines

    CERN Document Server

    Holmgren, Hanna

    2015-01-01

    A main challenge in numerical simulations of moving contact line problems is that the adherence, or no-slip boundary condition leads to a non-integrable stress singularity at the contact line. In this report we perform the first steps in developing the macroscopic part of an accurate multiscale model for a moving contact line problem in two space dimensions. We assume that a micro model has been used to determine a relation between the contact angle and the contact line velocity. An intermediate region is introduced where an analytical expression for the velocity exists. This expression is used to implement boundary conditions for the moving contact line at a macroscopic scale, along a fictitious boundary located a small distance away from the physical boundary. Model problems where the shape of the interface is constant thought the simulation are introduced. For these problems, experiments show that the errors in the resulting contact line velocities converge with the grid size $h$ at a rate of convergence $...

  13. Fast and accurate exhaled breath ammonia measurement.

    Science.gov (United States)

    Solga, Steven F; Mudalel, Matthew L; Spacek, Lisa A; Risby, Terence H

    2014-06-11

    This exhaled breath ammonia method uses a fast and highly sensitive spectroscopic method known as quartz enhanced photoacoustic spectroscopy (QEPAS) that uses a quantum cascade based laser. The monitor is coupled to a sampler that measures mouth pressure and carbon dioxide. The system is temperature controlled and specifically designed to address the reactivity of this compound. The sampler provides immediate feedback to the subject and the technician on the quality of the breath effort. Together with the quick response time of the monitor, this system is capable of accurately measuring exhaled breath ammonia representative of deep lung systemic levels. Because the system is easy to use and produces real time results, it has enabled experiments to identify factors that influence measurements. For example, mouth rinse and oral pH reproducibly and significantly affect results and therefore must be controlled. Temperature and mode of breathing are other examples. As our understanding of these factors evolves, error is reduced, and clinical studies become more meaningful. This system is very reliable and individual measurements are inexpensive. The sampler is relatively inexpensive and quite portable, but the monitor is neither. This limits options for some clinical studies and provides rational for future innovations.

  14. Noninvasive hemoglobin monitoring: how accurate is enough?

    Science.gov (United States)

    Rice, Mark J; Gravenstein, Nikolaus; Morey, Timothy E

    2013-10-01

    Evaluating the accuracy of medical devices has traditionally been a blend of statistical analyses, at times without contextualizing the clinical application. There have been a number of recent publications on the accuracy of a continuous noninvasive hemoglobin measurement device, the Masimo Radical-7 Pulse Co-oximeter, focusing on the traditional statistical metrics of bias and precision. In this review, which contains material presented at the Innovations and Applications of Monitoring Perfusion, Oxygenation, and Ventilation (IAMPOV) Symposium at Yale University in 2012, we critically investigated these metrics as applied to the new technology, exploring what is required of a noninvasive hemoglobin monitor and whether the conventional statistics adequately answer our questions about clinical accuracy. We discuss the glucose error grid, well known in the glucose monitoring literature, and describe an analogous version for hemoglobin monitoring. This hemoglobin error grid can be used to evaluate the required clinical accuracy (±g/dL) of a hemoglobin measurement device to provide more conclusive evidence on whether to transfuse an individual patient. The important decision to transfuse a patient usually requires both an accurate hemoglobin measurement and a physiologic reason to elect transfusion. It is our opinion that the published accuracy data of the Masimo Radical-7 is not good enough to make the transfusion decision.

  15. Accurate free energy calculation along optimized paths.

    Science.gov (United States)

    Chen, Changjun; Xiao, Yi

    2010-05-01

    The path-based methods of free energy calculation, such as thermodynamic integration and free energy perturbation, are simple in theory, but difficult in practice because in most cases smooth paths do not exist, especially for large molecules. In this article, we present a novel method to build the transition path of a peptide. We use harmonic potentials to restrain its nonhydrogen atom dihedrals in the initial state and set the equilibrium angles of the potentials as those in the final state. Through a series of steps of geometrical optimization, we can construct a smooth and short path from the initial state to the final state. This path can be used to calculate free energy difference. To validate this method, we apply it to a small 10-ALA peptide and find that the calculated free energy changes in helix-helix and helix-hairpin transitions are both self-convergent and cross-convergent. We also calculate the free energy differences between different stable states of beta-hairpin trpzip2, and the results show that this method is more efficient than the conventional molecular dynamics method in accurate free energy calculation.

  16. Accurate fission data for nuclear safety

    CERN Document Server

    Solders, A; Jokinen, A; Kolhinen, V S; Lantz, M; Mattera, A; Penttila, H; Pomp, S; Rakopoulos, V; Rinta-Antila, S

    2013-01-01

    The Accurate fission data for nuclear safety (AlFONS) project aims at high precision measurements of fission yields, using the renewed IGISOL mass separator facility in combination with a new high current light ion cyclotron at the University of Jyvaskyla. The 30 MeV proton beam will be used to create fast and thermal neutron spectra for the study of neutron induced fission yields. Thanks to a series of mass separating elements, culminating with the JYFLTRAP Penning trap, it is possible to achieve a mass resolving power in the order of a few hundred thousands. In this paper we present the experimental setup and the design of a neutron converter target for IGISOL. The goal is to have a flexible design. For studies of exotic nuclei far from stability a high neutron flux (10^12 neutrons/s) at energies 1 - 30 MeV is desired while for reactor applications neutron spectra that resembles those of thermal and fast nuclear reactors are preferred. It is also desirable to be able to produce (semi-)monoenergetic neutrons...

  17. Fast and Provably Accurate Bilateral Filtering.

    Science.gov (United States)

    Chaudhury, Kunal N; Dabhade, Swapnil D

    2016-06-01

    The bilateral filter is a non-linear filter that uses a range filter along with a spatial filter to perform edge-preserving smoothing of images. A direct computation of the bilateral filter requires O(S) operations per pixel, where S is the size of the support of the spatial filter. In this paper, we present a fast and provably accurate algorithm for approximating the bilateral filter when the range kernel is Gaussian. In particular, for box and Gaussian spatial filters, the proposed algorithm can cut down the complexity to O(1) per pixel for any arbitrary S . The algorithm has a simple implementation involving N+1 spatial filterings, where N is the approximation order. We give a detailed analysis of the filtering accuracy that can be achieved by the proposed approximation in relation to the target bilateral filter. This allows us to estimate the order N required to obtain a given accuracy. We also present comprehensive numerical results to demonstrate that the proposed algorithm is competitive with the state-of-the-art methods in terms of speed and accuracy.

  18. Accurate thermoplasmonic simulation of metallic nanoparticles

    Science.gov (United States)

    Yu, Da-Miao; Liu, Yan-Nan; Tian, Fa-Lin; Pan, Xiao-Min; Sheng, Xin-Qing

    2017-01-01

    Thermoplasmonics leads to enhanced heat generation due to the localized surface plasmon resonances. The measurement of heat generation is fundamentally a complicated task, which necessitates the development of theoretical simulation techniques. In this paper, an efficient and accurate numerical scheme is proposed for applications with complex metallic nanostructures. Light absorption and temperature increase are, respectively, obtained by solving the volume integral equation (VIE) and the steady-state heat diffusion equation through the method of moments (MoM). Previously, methods based on surface integral equations (SIEs) were utilized to obtain light absorption. However, computing light absorption from the equivalent current is as expensive as O(NsNv), where Ns and Nv, respectively, denote the number of surface and volumetric unknowns. Our approach reduces the cost to O(Nv) by using VIE. The accuracy, efficiency and capability of the proposed scheme are validated by multiple simulations. The simulations show that our proposed method is more efficient than the approach based on SIEs under comparable accuracy, especially for the case where many incidents are of interest. The simulations also indicate that the temperature profile can be tuned by several factors, such as the geometry configuration of array, beam direction, and light wavelength.

  19. Learning-Based Video Superresolution Reconstruction Using Spatiotemporal Nonlocal Similarity

    Directory of Open Access Journals (Sweden)

    Meiyu Liang

    2015-01-01

    Full Text Available Aiming at improving the video visual resolution quality and details clarity, a novel learning-based video superresolution reconstruction algorithm using spatiotemporal nonlocal similarity is proposed in this paper. Objective high-resolution (HR estimations of low-resolution (LR video frames can be obtained by learning LR-HR correlation mapping and fusing spatiotemporal nonlocal similarities between video frames. With the objective of improving algorithm efficiency while guaranteeing superresolution quality, a novel visual saliency-based LR-HR correlation mapping strategy between LR and HR patches is proposed based on semicoupled dictionary learning. Moreover, aiming at improving performance and efficiency of spatiotemporal similarity matching and fusion, an improved spatiotemporal nonlocal fuzzy registration scheme is established using the similarity weighting strategy based on pseudo-Zernike moment feature similarity and structural similarity, and the self-adaptive regional correlation evaluation strategy. The proposed spatiotemporal fuzzy registration scheme does not rely on accurate estimation of subpixel motion, and therefore it can be adapted to complex motion patterns and is robust to noise and rotation. Experimental results demonstrate that the proposed algorithm achieves competitive superresolution quality compared to other state-of-the-art algorithms in terms of both subjective and objective evaluations.

  20. The Search for Another Earth - Part II

    Indian Academy of Sciences (India)

    2016-10-01

    In the first part, we discussed the various methods for thedetection of planets outside the solar system known as theexoplanets. In this part, we will describe various kinds ofexoplanets. The habitable planets discovered so far and thepresent status of our search for a habitable planet similar tothe Earth will also be discussed.

  1. Insights: Talent Searches from Parents' Perspectives

    Science.gov (United States)

    Willis, Mariam

    2012-01-01

    Talent Searches offer an opportunity for gifted children to experience learning on prestigious college campuses around the nation, and as importantly, an opportunity to form relationships with like-minded, similar-age peers. Few opportunities open doors for intellectual, social, and emotional growth in gifted children as efficiently as…

  2. A literature search tool for intelligent extraction of disease-associated genes.

    Science.gov (United States)

    Jung, Jae-Yoon; DeLuca, Todd F; Nelson, Tristan H; Wall, Dennis P

    2014-01-01

    To extract disorder-associated genes from the scientific literature in PubMed with greater sensitivity for literature-based support than existing methods. We developed a PubMed query to retrieve disorder-related, original research articles. Then we applied a rule-based text-mining algorithm with keyword matching to extract target disorders, genes with significant results, and the type of study described by the article. We compared our resulting candidate disorder genes and supporting references with existing databases. We demonstrated that our candidate gene set covers nearly all genes in manually curated databases, and that the references supporting the disorder-gene link are more extensive and accurate than other general purpose gene-to-disorder association databases. We implemented a novel publication search tool to find target articles, specifically focused on links between disorders and genotypes. Through comparison against gold-standard manually updated gene-disorder databases and comparison with automated databases of similar functionality we show that our tool can search through the entirety of PubMed to extract the main gene findings for human diseases rapidly and accurately.

  3. Skewed Binary Search Trees

    DEFF Research Database (Denmark)

    Brodal, Gerth Stølting; Moruz, Gabriel

    2006-01-01

    It is well-known that to minimize the number of comparisons a binary search tree should be perfectly balanced. Previous work has shown that a dominating factor over the running time for a search is the number of cache faults performed, and that an appropriate memory layout of a binary search tree...... can reduce the number of cache faults by several hundred percent. Motivated by the fact that during a search branching to the left or right at a node does not necessarily have the same cost, e.g. because of branch prediction schemes, we in this paper study the class of skewed binary search trees....... For all nodes in a skewed binary search tree the ratio between the size of the left subtree and the size of the tree is a fixed constant (a ratio of 1/2 gives perfect balanced trees). In this paper we present an experimental study of various memory layouts of static skewed binary search trees, where each...

  4. Myanmar Language Search Engine

    Directory of Open Access Journals (Sweden)

    Pann Yu Mon

    2011-03-01

    Full Text Available With the enormous growth of the World Wide Web, search engines play a critical role in retrieving information from the borderless Web. Although many search engines are available for the major languages, but they are not much proficient for the less computerized languages including Myanmar. The main reason is that those search engines are not considering the specific features of those languages. A search engine which capable of searching the Web documents written in those languages is highly needed, especially when more and more Web sites are coming up with localized content in multiple languages. In this study, the design and the architecture of language specific search engine for Myanmar language is proposed. The main feature of the system are, (1 it can search the multiple encodings of the Myanmar Web page, (2 the system is designed to comply with the specific features of the Myanmar language. Finally the experiment has been done to prove whether it meets the design requirements.

  5. Search Engine Bias and the Demise of Search Engine Utopianism

    Science.gov (United States)

    Goldman, E.

    Due to search engines' automated operations, people often assume that search engines display search results neutrally and without bias. However, this perception is mistaken. Like any other media company, search engines affirmatively control their users' experiences, which has the consequence of skewing search results (a phenomenon called "search engine bias"). Some commentators believe that search engine bias is a defect requiring legislative correction. Instead, this chapter argues that search engine bias is the beneficial consequence of search engines optimizing content for their users. The chapter further argues that the most problematic aspect of search engine bias, the "winner-take-all" effect caused by top placement in search results, will be mooted by emerging personalized search technology.

  6. Anesthetizing animals: Similar to humans yet, peculiar?

    Science.gov (United States)

    Kurdi, Madhuri S; Ramaswamy, Ashwini H

    2015-01-01

    From time immemorial, animals have served as models for humans. Like humans, animals too have to undergo several types of elective and emergency surgeries. Several anesthetic techniques and drugs used in humans are also used in animals. However, unlike humans, the animal kingdom includes a wide variety of species, breeds, and sizes. Different species have variable pharmacological responses, anatomy, temperament, behavior, and lifestyles. The anesthetic techniques and drugs have to suit different species and breeds. Nevertheless, there are several drugs and many peculiar anesthetic techniques used in animals but not in human beings. Keeping this in mind, literature was hand searched and electronically searched using the words "veterinary anesthesia," "anesthetic drugs and techniques in animals" using Google search engine. The interesting information so collected is presented in this article which highlights some challenging and amazing aspects of anesthetizing animals including the preanesthetic assessment, preparation, premedication, monitoring, induction of general anesthesia, intubation, equipment, regional blocks, neuraxial block, and perioperative complications.

  7. User Oriented Trajectory Search for Trip Recommendation

    KAUST Repository

    Ding, Ruogu

    2012-09-08

    Trajectory sharing and searching have received significant attention in recent years. In this thesis, we propose and investigate the methods to find and recommend the best trajectory to the traveler, and mainly focus on a novel technique named User Oriented Trajectory Search (UOTS) query processing. In contrast to conventional trajectory search by locations (spatial domain only), we consider both spatial and textual domains in the new UOTS query. Given a trajectory data set, the query input contains a set of intended places given by the traveler and a set of textual attributes describing the traveler’s preference. If a trajectory is connecting/close to the specified query locations, and the textual attributes of the trajectory are similar to the traveler’s preference, it will be recommended to the traveler. This type of queries can enable many popular applications such as trip planning and recommendation. There are two challenges in UOTS query processing, (i) how to constrain the searching range in two domains and (ii) how to schedule multiple query sources effectively. To overcome the challenges and answer the UOTS query efficiently, a novel collaborative searching approach is developed. Conceptually, the UOTS query processing is conducted in the spatial and textual domains alternately. A pair of upper and lower bounds are devised to constrain the searching range in two domains. In the meantime, a heuristic searching strategy based on priority ranking is adopted for scheduling the multiple query sources, which can further reduce the searching range and enhance the query efficiency notably. Furthermore, the devised collaborative searching approach can be extended to situations where the query locations are or- dered. Extensive experiments are conducted on both real and synthetic trajectory data in road networks. Our approach is verified to be effective in reducing both CPU time and disk I/O time.

  8. User oriented trajectory search for trip recommendation

    KAUST Repository

    Shang, Shuo

    2012-01-01

    Trajectory sharing and searching have received significant attentions in recent years. In this paper, we propose and investigate a novel problem called User Oriented Trajectory Search (UOTS) for trip recommendation. In contrast to conventional trajectory search by locations (spatial domain only), we consider both spatial and textual domains in the new UOTS query. Given a trajectory data set, the query input contains a set of intended places given by the traveler and a set of textual attributes describing the traveler\\'s preference. If a trajectory is connecting/close to the specified query locations, and the textual attributes of the trajectory are similar to the traveler\\'e preference, it will be recommended to the traveler for reference. This type of queries can bring significant benefits to travelers in many popular applications such as trip planning and recommendation. There are two challenges in the UOTS problem, (i) how to constrain the searching range in two domains and (ii) how to schedule multiple query sources effectively. To overcome the challenges and answer the UOTS query efficiently, a novel collaborative searching approach is developed. Conceptually, the UOTS query processing is conducted in the spatial and textual domains alternately. A pair of upper and lower bounds are devised to constrain the searching range in two domains. In the meantime, a heuristic searching strategy based on priority ranking is adopted for scheduling the multiple query sources, which can further reduce the searching range and enhance the query efficiency notably. Furthermore, the devised collaborative searching approach can be extended to situations where the query locations are ordered. The performance of the proposed UOTS query is verified by extensive experiments based on real and synthetic trajectory data in road networks. © 2012 ACM.

  9. User Oriented Trajectory Search for Trip Recommendation

    KAUST Repository

    Ding, Ruogu

    2012-07-08

    Trajectory sharing and searching have received significant attention in recent years. In this thesis, we propose and investigate the methods to find and recommend the best trajectory to the traveler, and mainly focus on a novel technique named User Oriented Trajectory Search (UOTS) query processing. In contrast to conventional trajectory search by locations (spatial domain only), we consider both spatial and textual domains in the new UOTS query. Given a trajectory data set, the query input contains a set of intended places given by the traveler and a set of textual attributes describing the traveler’s preference. If a trajectory is connecting/close to the specified query locations, and the textual attributes of the trajectory are similar to the traveler’s preference, it will be recommended to the traveler. This type of queries can enable many popular applications such as trip planning and recommendation. There are two challenges in UOTS query processing, (i) how to constrain the searching range in two domains and (ii) how to schedule multiple query sources effectively. To overcome the challenges and answer the UOTS query efficiently, a novel collaborative searching approach is developed. Conceptually, the UOTS query processing is conducted in the spatial and textual domains alternately. A pair of upper and lower bounds are devised to constrain the searching range in two domains. In the meantime, a heuristic searching strategy based on priority ranking is adopted for scheduling the multiple query sources, which can further reduce the searching range and enhance the query efficiency notably. Furthermore, the devised collaborative searching approach can be extended to situations where the query locations are ordered. Extensive experiments are conducted on both real and synthetic trajectory data in road networks. Our approach is verified to be effective in reducing both CPU time and disk I/O time.

  10. Accurate Holdup Calculations with Predictive Modeling & Data Integration

    Energy Technology Data Exchange (ETDEWEB)

    Azmy, Yousry [North Carolina State Univ., Raleigh, NC (United States). Dept. of Nuclear Engineering; Cacuci, Dan [Univ. of South Carolina, Columbia, SC (United States). Dept. of Mechanical Engineering

    2017-04-03

    In facilities that process special nuclear material (SNM) it is important to account accurately for the fissile material that enters and leaves the plant. Although there are many stages and processes through which materials must be traced and measured, the focus of this project is material that is “held-up” in equipment, pipes, and ducts during normal operation and that can accumulate over time into significant quantities. Accurately estimating the holdup is essential for proper SNM accounting (vis-à-vis nuclear non-proliferation), criticality and radiation safety, waste management, and efficient plant operation. Usually it is not possible to directly measure the holdup quantity and location, so these must be inferred from measured radiation fields, primarily gamma and less frequently neutrons. Current methods to quantify holdup, i.e. Generalized Geometry Holdup (GGH), primarily rely on simple source configurations and crude radiation transport models aided by ad hoc correction factors. This project seeks an alternate method of performing measurement-based holdup calculations using a predictive model that employs state-of-the-art radiation transport codes capable of accurately simulating such situations. Inverse and data assimilation methods use the forward transport model to search for a source configuration that best matches the measured data and simultaneously provide an estimate of the level of confidence in the correctness of such configuration. In this work the holdup problem is re-interpreted as an inverse problem that is under-determined, hence may permit multiple solutions. A probabilistic approach is applied to solving the resulting inverse problem. This approach rates possible solutions according to their plausibility given the measurements and initial information. This is accomplished through the use of Bayes’ Theorem that resolves the issue of multiple solutions by giving an estimate of the probability of observing each possible solution. To use

  11. Improving Search Engine Reliability

    Science.gov (United States)

    Pruthi, Jyoti; Kumar, Ela

    2010-11-01

    Search engines on the Internet are used daily to access and find information. While these services are providing an easy way to find information globally, they are also suffering from artificially created false results. This paper describes two techniques that are being used to manipulate the search engines: spam pages (used to achieve higher rankings on the result page) and cloaking (used to feed falsified data into search engines). This paper also describes two proposed methods to fight this kind of misuse, algorithms for both of the formerly mentioned cases of spamdexing.

  12. Modified harmony search

    Science.gov (United States)

    Mohamed, Najihah; Lutfi Amri Ramli, Ahmad; Majid, Ahmad Abd; Piah, Abd Rahni Mt

    2017-09-01

    A metaheuristic algorithm, called Harmony Search is quite highly applied in optimizing parameters in many areas. HS is a derivative-free real parameter optimization algorithm, and draws an inspiration from the musical improvisation process of searching for a perfect state of harmony. Propose in this paper Modified Harmony Search for solving optimization problems, which employs a concept from genetic algorithm method and particle swarm optimization for generating new solution vectors that enhances the performance of HS algorithm. The performances of MHS and HS are investigated on ten benchmark optimization problems in order to make a comparison to reflect the efficiency of the MHS in terms of final accuracy, convergence speed and robustness.

  13. ElasticSearch server

    CERN Document Server

    Rogozinski, Marek

    2014-01-01

    This book is a detailed, practical, hands-on guide packed with real-life scenarios and examples which will show you how to implement an ElasticSearch search engine on your own websites.If you are a web developer or a user who wants to learn more about ElasticSearch, then this is the book for you. You do not need to know anything about ElastiSeach, Java, or Apache Lucene in order to use this book, though basic knowledge about databases and queries is required.

  14. Reconstructing propagation networks with temporal similarity metrics

    CERN Document Server

    Liao, Hao

    2014-01-01

    Node similarity is a significant property driving the growth of real networks. In this paper, based on the observed spreading results we apply the node similarity metrics to reconstruct propagation networks. We find that the reconstruction accuracy of the similarity metrics is strongly influenced by the infection rate of the spreading process. Moreover, there is a range of infection rate in which the reconstruction accuracy of some similarity metrics drops to nearly zero. In order to improve the similarity-based reconstruction method, we finally propose a temporal similarity metric to take into account the time information of the spreading. The reconstruction results are remarkably improved with the new method.

  15. COMPARISON OF EXPLORATION STRATEGIES FOR MULTI-ROBOT SEARCH

    Directory of Open Access Journals (Sweden)

    Miroslav Kulich

    2015-06-01

    Full Text Available Searching for a stationary object in an unknown environment can be formulated as an iterative procedure consisting of map updating, selection of a next goal and navigation to this goal. It finishes when the object of interest is found. This formulation and a general search structure is similar to the related exploration problem. The only difference is in goal-selection, as search and exploration objectives are not the same. Although search is a key task in many search and rescue scenarios, the robotics community has paid little attention to the problem. There is no goal-selection strategy that has been designed specifically for search. In this paper, we study four state-of-the-art strategies for multi-robot exploration, and we evaluate their performance in various environments with respect to the expected time needed to find an object, i.e. to achieve the objective of the search.

  16. An advanced search engine for patent analytics in medicinal chemistry.

    Science.gov (United States)

    Pasche, Emilie; Gobeill, Julien; Teodoro, Douglas; Gaudinat, Arnaud; Vishnykova, Dina; Lovis, Christian; Ruch, Patrick

    2012-01-01

    Patent collections contain an important amount of medical-related knowledge, but existing tools were reported to lack of useful functionalities. We present here the development of TWINC, an advanced search engine dedicated to patent retrieval in the domain of health and life sciences. Our tool embeds two search modes: an ad hoc search to retrieve relevant patents given a short query and a related patent search to retrieve similar patents given a patent. Both search modes rely on tuning experiments performed during several patent retrieval competitions. Moreover, TWINC is enhanced with interactive modules, such as chemical query expansion, which is of prior importance to cope with various ways of naming biomedical entities. While the related patent search showed promising performances, the ad-hoc search resulted in fairly contrasted results. Nonetheless, TWINC performed well during the Chemathlon task of the PatOlympics competition and experts appreciated its usability.

  17. Accurate paleointensities - the multi-method approach

    Science.gov (United States)

    de Groot, Lennart

    2016-04-01

    The accuracy of models describing rapid changes in the geomagnetic field over the past millennia critically depends on the availability of reliable paleointensity estimates. Over the past decade methods to derive paleointensities from lavas (the only recorder of the geomagnetic field that is available all over the globe and through geologic times) have seen significant improvements and various alternative techniques were proposed. The 'classical' Thellier-style approach was optimized and selection criteria were defined in the 'Standard Paleointensity Definitions' (Paterson et al, 2014). The Multispecimen approach was validated and the importance of additional tests and criteria to assess Multispecimen results must be emphasized. Recently, a non-heating, relative paleointensity technique was proposed -the pseudo-Thellier protocol- which shows great potential in both accuracy and efficiency, but currently lacks a solid theoretical underpinning. Here I present work using all three of the aforementioned paleointensity methods on suites of young lavas taken from the volcanic islands of Hawaii, La Palma, Gran Canaria, Tenerife, and Terceira. Many of the sampled cooling units are <100 years old, the actual field strength at the time of cooling is therefore reasonably well known. Rather intuitively, flows that produce coherent results from two or more different paleointensity methods yield the most accurate estimates of the paleofield. Furthermore, the results for some flows pass the selection criteria for one method, but fail in other techniques. Scrutinizing and combing all acceptable results yielded reliable paleointensity estimates for 60-70% of all sampled cooling units - an exceptionally high success rate. This 'multi-method paleointensity approach' therefore has high potential to provide the much-needed paleointensities to improve geomagnetic field models for the Holocene.

  18. Towards Accurate Application Characterization for Exascale (APEX)

    Energy Technology Data Exchange (ETDEWEB)

    Hammond, Simon David [Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)

    2015-09-01

    Sandia National Laboratories has been engaged in hardware and software codesign activities for a number of years, indeed, it might be argued that prototyping of clusters as far back as the CPLANT machines and many large capability resources including ASCI Red and RedStorm were examples of codesigned solutions. As the research supporting our codesign activities has moved closer to investigating on-node runtime behavior a nature hunger has grown for detailed analysis of both hardware and algorithm performance from the perspective of low-level operations. The Application Characterization for Exascale (APEX) LDRD was a project concieved of addressing some of these concerns. Primarily the research was to intended to focus on generating accurate and reproducible low-level performance metrics using tools that could scale to production-class code bases. Along side this research was an advocacy and analysis role associated with evaluating tools for production use, working with leading industry vendors to develop and refine solutions required by our code teams and to directly engage with production code developers to form a context for the application analysis and a bridge to the research community within Sandia. On each of these accounts significant progress has been made, particularly, as this report will cover, in the low-level analysis of operations for important classes of algorithms. This report summarizes the development of a collection of tools under the APEX research program and leaves to other SAND and L2 milestone reports the description of codesign progress with Sandia’s production users/developers.

  19. Optimizing cell arrays for accurate functional genomics

    Directory of Open Access Journals (Sweden)

    Fengler Sven

    2012-07-01

    Full Text Available Abstract Background Cellular responses emerge from a complex network of dynamic biochemical reactions. In order to investigate them is necessary to develop methods that allow perturbing a high number of gene products in a flexible and fast way. Cell arrays (CA enable such experiments on microscope slides via reverse transfection of cellular colonies growing on spotted genetic material. In contrast to multi-well plates, CA are susceptible to contamination among neighboring spots hindering accurate quantification in cell-based screening projects. Here we have developed a quality control protocol for quantifying and minimizing contamination in CA. Results We imaged checkered CA that express two distinct fluorescent proteins and segmented images into single cells to quantify the transfection efficiency and interspot contamination. Compared with standard procedures, we measured a 3-fold reduction of contaminants when arrays containing HeLa cells were washed shortly after cell seeding. We proved that nucleic acid uptake during cell seeding rather than migration among neighboring spots was the major source of contamination. Arrays of MCF7 cells developed without the washing step showed 7-fold lower percentage of contaminant cells, demonstrating that contamination is dependent on specific cell properties. Conclusions Previously published methodological works have focused on achieving high transfection rate in densely packed CA. Here, we focused in an equally important parameter: The interspot contamination. The presented quality control is essential for estimating the rate of contamination, a major source of false positives and negatives in current microscopy based functional genomics screenings. We have demonstrated that a washing step after seeding enhances CA quality for HeLA but is not necessary for MCF7. The described method provides a way to find optimal seeding protocols for cell lines intended to be used for the first time in CA.

  20. Important Nearby Galaxies without Accurate Distances

    Science.gov (United States)

    McQuinn, Kristen

    2014-10-01

    The Spitzer Infrared Nearby Galaxies Survey (SINGS) and its offspring programs (e.g., THINGS, HERACLES, KINGFISH) have resulted in a fundamental change in our view of star formation and the ISM in galaxies, and together they represent the most complete multi-wavelength data set yet assembled for a large sample of nearby galaxies. These great investments of observing time have been dedicated to the goal of understanding the interstellar medium, the star formation process, and, more generally, galactic evolution at the present epoch. Nearby galaxies provide the basis for which we interpret the distant universe, and the SINGS sample represents the best studied nearby galaxies.Accurate distances are fundamental to interpreting observations of galaxies. Surprisingly, many of the SINGS spiral galaxies have numerous distance estimates resulting in confusion. We can rectify this situation for 8 of the SINGS spiral galaxies within 10 Mpc at a very low cost through measurements of the tip of the red giant branch. The proposed observations will provide an accuracy of better than 0.1 in distance modulus. Our sample includes such well known galaxies as M51 (the Whirlpool), M63 (the Sunflower), M104 (the Sombrero), and M74 (the archetypal grand design spiral).We are also proposing coordinated parallel WFC3 UV observations of the central regions of the galaxies, rich with high-mass UV-bright stars. As a secondary science goal we will compare the resolved UV stellar populations with integrated UV emission measurements used in calibrating star formation rates. Our observations will complement the growing HST UV atlas of high resolution images of nearby galaxies.

  1. How flatbed scanners upset accurate film dosimetry.

    Science.gov (United States)

    van Battum, L J; Huizenga, H; Verdaasdonk, R M; Heukelom, S

    2016-01-21

    Film is an excellent dosimeter for verification of dose distributions due to its high spatial resolution. Irradiated film can be digitized with low-cost, transmission, flatbed scanners. However, a disadvantage is their lateral scan effect (LSE): a scanner readout change over its lateral scan axis. Although anisotropic light scattering was presented as the origin of the LSE, this paper presents an alternative cause. Hereto, LSE for two flatbed scanners (Epson 1680 Expression Pro and Epson 10000XL), and Gafchromic film (EBT, EBT2, EBT3) was investigated, focused on three effects: cross talk, optical path length and polarization. Cross talk was examined using triangular sheets of various optical densities. The optical path length effect was studied using absorptive and reflective neutral density filters with well-defined optical characteristics (OD range 0.2-2.0). Linear polarizer sheets were used to investigate light polarization on the CCD signal in absence and presence of (un)irradiated Gafchromic film. Film dose values ranged between 0.2 to 9 Gy, i.e. an optical density range between 0.25 to 1.1. Measurements were performed in the scanner's transmission mode, with red-green-blue channels. LSE was found to depend on scanner construction and film type. Its magnitude depends on dose: for 9 Gy increasing up to 14% at maximum lateral position. Cross talk was only significant in high contrast regions, up to 2% for very small fields. The optical path length effect introduced by film on the scanner causes 3% for pixels in the extreme lateral position. Light polarization due to film and the scanner's optical mirror system is the main contributor, different in magnitude for the red, green and blue channel. We concluded that any Gafchromic EBT type film scanned with a flatbed scanner will face these optical effects. Accurate dosimetry requires correction of LSE, therefore, determination of the LSE per color channel and dose delivered to the film.

  2. Quantum searching application in search based software engineering

    Science.gov (United States)

    Wu, Nan; Song, FangMin; Li, Xiangdong

    2013-05-01

    The Search Based Software Engineering (SBSE) is widely used in software engineering for identifying optimal solutions. However, there is no polynomial-time complexity solution used in the traditional algorithms for SBSE, and that causes the cost very high. In this paper, we analyze and compare several quantum search algorithms that could be applied for SBSE: quantum adiabatic evolution searching algorithm, fixed-point quantum search (FPQS), quantum walks, and a rapid modified Grover quantum searching method. The Grover's algorithm is thought as the best choice for a large-scaled unstructured data searching and theoretically it can be applicable to any search-space structure and any type of searching problems.

  3. A Quantum-Based Similarity Method in Virtual Screening.

    Science.gov (United States)

    Al-Dabbagh, Mohammed Mumtaz; Salim, Naomie; Himmat, Mubarak; Ahmed, Ali; Saeed, Faisal

    2015-10-02

    One of the most widely-used techniques for ligand-based virtual screening is similarity searching. This study adopted the concepts of quantum mechanics to present as state-of-the-art similarity method of molecules inspired from quantum theory. The representation of molecular compounds in mathematical quantum space plays a vital role in the development of quantum-based similarity approach. One of the key concepts of quantum theory is the use of complex numbers. Hence, this study proposed three various techniques to embed and to re-represent the molecular compounds to correspond with complex numbers format. The quantum-based similarity method that developed in this study depending on complex pure Hilbert space of molecules called Standard Quantum-Based (SQB). The recall of retrieved active molecules were at top 1% and top 5%, and significant test is used to evaluate our proposed methods. The MDL drug data report (MDDR), maximum unbiased validation (MUV) and Directory of Useful Decoys (DUD) data sets were used for experiments and were represented by 2D fingerprints. Simulated virtual screening experiment show that the effectiveness of SQB method was significantly increased due to the role of representational power of molecular compounds in complex numbers forms compared to Tanimoto benchmark similarity measure.

  4. A Quantum-Based Similarity Method in Virtual Screening

    Directory of Open Access Journals (Sweden)

    Mohammed Mumtaz Al-Dabbagh

    2015-10-01

    Full Text Available One of the most widely-used techniques for ligand-based virtual screening is similarity searching. This study adopted the concepts of quantum mechanics to present as state-of-the-art similarity method of molecules inspired from quantum theory. The representation of molecular compounds in mathematical quantum space plays a vital role in the development of quantum-based similarity approach. One of the key concepts of quantum theory is the use of complex numbers. Hence, this study proposed three various techniques to embed and to re-represent the molecular compounds to correspond with complex numbers format. The quantum-based similarity method that developed in this study depending on complex pure Hilbert space of molecules called Standard Quantum-Based (SQB. The recall of retrieved active molecules were at top 1% and top 5%, and significant test is used to evaluate our proposed methods. The MDL drug data report (MDDR, maximum unbiased validation (MUV and Directory of Useful Decoys (DUD data sets were used for experiments and were represented by 2D fingerprints. Simulated virtual screening experiment show that the effectiveness of SQB method was significantly increased due to the role of representational power of molecular compounds in complex numbers forms compared to Tanimoto benchmark similarity measure.

  5. 2-Jump DNA Search Multiple Pattern Matching Algorithm

    OpenAIRE

    Raju Bhukya; D. V. L. N. Somayajulu

    2011-01-01

    Pattern matching in a DNA sequence or searching a pattern from a large data base is a major research area in computational biology. To extract pattern match from a large sequence it takes more time, in order to reduce searching time we have proposed an approach that reduces the search time with accurate retrieval of the matched pattern in the sequence. As performance plays a major role in extracting patterns from a given DNA sequence or from a database independent of the size of the sequence....

  6. Accurate ionization potential of semiconductors from efficient density functional calculations

    Science.gov (United States)

    Ye, Lin-Hui

    2016-07-01

    Despite its huge successes in total-energy-related applications, the Kohn-Sham scheme of density functional theory cannot get reliable single-particle excitation energies for solids. In particular, it has not been able to calculate the ionization potential (IP), one of the most important material parameters, for semiconductors. We illustrate that an approximate exact-exchange optimized effective potential (EXX-OEP), the Becke-Johnson exchange, can be used to largely solve this long-standing problem. For a group of 17 semiconductors, we have obtained the IPs to an accuracy similar to that of the much more sophisticated G W approximation (GWA), with the computational cost of only local-density approximation/generalized gradient approximation. The EXX-OEP, therefore, is likely as useful for solids as for finite systems. For solid surfaces, the asymptotic behavior of the vx c has effects similar to those of finite systems which, when neglected, typically cause the semiconductor IPs to be underestimated. This may partially explain why standard GWA systematically underestimates the IPs and why using the same GWA procedures has not been able to get an accurate IP and band gap at the same time.

  7. Conditional Similarity Solutions of the Boussinesq Equation

    Institute of Scientific and Technical Information of China (English)

    TANG Xiao-Yan; LIN Ji; LOU Sen-Yue

    2001-01-01

    The direct method proposed by Clarkson and Kruskal is modified to obtain some conditional similarity solutions of a nonlinear physics model. Taking the (1+ 1 )-dimensional Boussinesq equation as a simple example, six types of conditional similarity reductions are obtained.

  8. Collaboration during visual search.

    Science.gov (United States)

    Malcolmson, Kelly A; Reynolds, Michael G; Smilek, Daniel

    2007-08-01

    Two experiments examine how collaboration influences visual search performance. Working with a partner or on their own, participants reported whether a target was present or absent in briefly presented search displays. We compared the search performance of individuals working together (collaborative pairs) with the pooled responses of the individuals working alone (nominal pairs). Collaborative pairs were less likely than nominal pairs to correctly detect a target and they were less likely to make false alarms. Signal detection analyses revealed that collaborative pairs were more sensitive to the presence of the target and had a more conservative response bias than the nominal pairs. This pattern was observed even when the presence of another individual was matched across pairs. The results are discussed in the context of task-sharing, social loafing and current theories of visual search.

  9. Chemical Search Web Utility

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Chemical Search Web Utility is an intuitive web application that allows the public to easily find the chemical that they are interested in using, and which...

  10. Transplant Center Search Form

    Science.gov (United States)

    ... Share Your Story Give Us Feedback - A + A Transplant Center Search Form Welcome to the Blood & Marrow ... transplant centers for patients with a particular disease. Transplant Center login Username: * Password: * Request new password Join ...

  11. Diphoton searches (CMS)

    CERN Document Server

    Quittnat, Milena Eleonore

    2016-01-01

    Many physics scenarios beyond the standard model predict the existence of heavy resonances decaying to diphotons. This talk presents searches for BSM physics in the diphoton final state at CMS, focusing on the recent results.

  12. Automated search for supernovae

    Energy Technology Data Exchange (ETDEWEB)

    Kare, J.T.

    1984-11-15

    This thesis describes the design, development, and testing of a search system for supernovae, based on the use of current computer and detector technology. This search uses a computer-controlled telescope and charge coupled device (CCD) detector to collect images of hundreds of galaxies per night of observation, and a dedicated minicomputer to process these images in real time. The system is now collecting test images of up to several hundred fields per night, with a sensitivity corresponding to a limiting magnitude (visual) of 17. At full speed and sensitivity, the search will examine some 6000 galaxies every three nights, with a limiting magnitude of 18 or fainter, yielding roughly two supernovae per week (assuming one supernova per galaxy per 50 years) at 5 to 50 percent of maximum light. An additional 500 nearby galaxies will be searched every night, to locate about 10 supernovae per year at one or two percent of maximum light, within hours of the initial explosion.

  13. A fingerprint based metric for measuring similarities of crystalline structures.

    Science.gov (United States)

    Zhu, Li; Amsler, Maximilian; Fuhrer, Tobias; Schaefer, Bastian; Faraji, Somayeh; Rostami, Samare; Ghasemi, S Alireza; Sadeghi, Ali; Grauzinyte, Migle; Wolverton, Chris; Goedecker, Stefan

    2016-01-21

    Measuring similarities/dissimilarities between atomic structures is important for the exploration of potential energy landscapes. However, the cell vectors together with the coordinates of the atoms, which are generally used to describe periodic systems, are quantities not directly suitable as fingerprints to distinguish structures. Based on a characterization of the local environment of all atoms in a cell, we introduce crystal fingerprints that can be calculated easily and define configurational distances between crystalline structures that satisfy the mathematical properties of a metric. This distance between two configurations is a measure of their similarity/dissimilarity and it allows in particular to distinguish structures. The new method can be a useful tool within various energy landscape exploration schemes, such as minima hopping, random search, swarm intelligence algorithms, and high-throughput screenings.

  14. Initial Experiences with Retrieving Similar Objects in Simulation Data

    Energy Technology Data Exchange (ETDEWEB)

    Cheung, S-C S; Kamath, C

    2003-02-21

    Comparing the output of a physics simulation with an experiment, referred to as 'code validation,' is often done by visually comparing the two outputs. In order to determine which simulation is a closer match to the experiment, more quantitative measures are needed. In this paper, we describe our early experiences with this problem by considering the slightly simpler problem of finding objects in a image that are similar to a given query object. Focusing on a dataset from a fluid mixing problem, we report on our experiments with different features that are used to represent the objects of interest in the data. These early results indicate that the features must be chosen carefully to correctly represent the query object and the goal of the similarity search.

  15. A fingerprint based metric for measuring similarities of crystalline structures

    Energy Technology Data Exchange (ETDEWEB)

    Zhu, Li; Fuhrer, Tobias; Schaefer, Bastian; Grauzinyte, Migle; Goedecker, Stefan, E-mail: stefan.goedecker@unibas.ch [Department of Physics, Universität Basel, Klingelbergstr. 82, 4056 Basel (Switzerland); Amsler, Maximilian [Department of Physics, Universität Basel, Klingelbergstr. 82, 4056 Basel (Switzerland); Department of Materials Science and Engineering, Northwestern University, Evanston, Illinois 60208 (United States); Faraji, Somayeh; Rostami, Samare; Ghasemi, S. Alireza [Institute for Advanced Studies in Basic Sciences, P.O. Box 45195-1159, Zanjan (Iran, Islamic Republic of); Sadeghi, Ali [Physics Department, Shahid Beheshti University, G. C., Evin, 19839 Tehran (Iran, Islamic Republic of); Wolverton, Chris [Department of Materials Science and Engineering, Northwestern University, Evanston, Illinois 60208 (United States)

    2016-01-21

    Measuring similarities/dissimilarities between atomic structures is important for the exploration of potential energy landscapes. However, the cell vectors together with the coordinates of the atoms, which are generally used to describe periodic systems, are quantities not directly suitable as fingerprints to distinguish structures. Based on a characterization of the local environment of all atoms in a cell, we introduce crystal fingerprints that can be calculated easily and define configurational distances between crystalline structures that satisfy the mathematical properties of a metric. This distance between two configurations is a measure of their similarity/dissimilarity and it allows in particular to distinguish structures. The new method can be a useful tool within various energy landscape exploration schemes, such as minima hopping, random search, swarm intelligence algorithms, and high-throughput screenings.

  16. A fingerprint based metric for measuring similarities of crystalline structures

    CERN Document Server

    Zhu, Li; Fuhrer, Tobias; Schaefer, Bastian; Faraji, Somayeh; Rostami, Samara; Ghasemi, S Alireza; Sadeghi, Ali; Grauzinyte, Migle; Wolverton, Christopher; Goedecker, Stefan

    2015-01-01

    Measuring similarities/dissimilarities between atomic structures is important for the exploration of potential energy landscapes. However, the cell vectors together with the coordinates of the atoms, which are generally used to describe periodic systems, are quantities not suitable as fingerprints to distinguish structures. Based on a characterization of the local environment of all atoms in a cell we introduce crystal fingerprints that can be calculated easily and allow to define configurational distances between crystalline structures that satisfy the mathematical properties of a metric. This distance between two configurations is a measure of their similarity/dissimilarity and it allows in particular to distinguish structures. The new method is an useful tool within various energy landscape exploration schemes, such as minima hopping, random search, swarm intelligence algorithms and high-throughput screenings.

  17. The Sweet Spot of a Nonacademic Job Search

    Science.gov (United States)

    Lord, Alexandra M.

    2012-01-01

    Because academic culture frowns on Ph.D.'s who consider leaving the ivory tower, most of those who jump ship find themselves at a loss as to where and how to begin a job search. Yet a nonacademic job search is actually quite similar to a standard research project. Both require advance planning, substantial research, collating evidence for an…

  18. Shape Similarity Measures of Linear Entities

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    The essential of feature matching technology lies in how to measure the similarity of spatial entities.Among all the possible similarity measures,the shape similarity measure is one of the most important measures because it is easy to collect the necessary parameters and it is also well matched with the human intuition.In this paper a new shape similarity measure of linear entities based on the differences of direction change along each line is presented and its effectiveness is illustrated.

  19. Supersymmetry searches in ATLAS

    CERN Document Server

    Torro Pastor, Emma; The ATLAS collaboration

    2016-01-01

    Weak scale supersymmetry remains one of the best motivated and studied Standard Model extensions. This talk summarises recent ATLAS results for searches for supersymmetric (SUSY) particles. Weak and strong production in both R-Parity conserving and R-Parity violating SUSY scenarios are considered. The searches involved final states including jets, missing transverse momentum, light leptons, taus or photons, as well as long-lived particle signatures.

  20. SUSY searches in ATLAS

    CERN Document Server

    ATLAS, C; The ATLAS collaboration

    2014-01-01

    Despite the absence of experimental evidence, weak scale supersymmetry remains one of the best motivated and studied Standard Model extensions. This talk summarises recent ATLAS results for searches for supersymmetric (SUSY) particles. Weak and strong production in both R-Parity conserving and R-Parity violating SUSY scenarios are considered. The searches involved final states including jets, missing transverse momentum, light leptons, taus or photons, as well as long-lived particle signatures.