WorldWideScience

Sample records for accurate similarity search

  1. Application of kernel functions for accurate similarity search in large chemical databases

    2010-01-01

    Background Similaritysearch in chemical structure databases is an important problem with many applications in chemical genomics, drug design, and efficient chemical probe screening among others. It is widely believed that structure based methods provide an efficient way to do the query. Recently various graph kernel functions have been designed to capture the intrinsic similarity of graphs. Though successful in constructing accurate predictive and classification models, graph kernel functions...

  2. Approximate similarity search

    Amato, Giuseppe

    2000-01-01

    Similarity searching is fundamental in various application areas. Recently it has attracted much attention in the database community because of the growing need to deal with large volume of data. Consequently, efficiency has become a matter of concern in design. Although much has been done to develop structures able to perform fast similarity search, results are still not satisfactory, and more research is needed. The performance of similarity search for complex features deteriorates and does...

  3. Protein structural similarity search by Ramachandran codes

    Chang Chih-Hung; Huang Po-Jung; Lo Wei-Cheng; Lyu Ping-Chiang

    2007-01-01

    Abstract Background Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we ai...

  4. Multivariate Time Series Similarity Searching

    Jimin Wang; Yuelong Zhu; Shijin Li; Dingsheng Wan; Pengcheng Zhang

    2014-01-01

    Multivariate time series (MTS) datasets are very common in various financial, multimedia, and hydrological fields. In this paper, a dimension-combination method is proposed to search similar sequences for MTS. Firstly, the similarity of single-dimension series is calculated; then the overall similarity of the MTS is obtained by synthesizing each of the single-dimension similarity based on weighted BORDA voting method. The dimension-combination method could use the existing similarity searchin...

  5. Protein structural similarity search by Ramachandran codes

    Chang Chih-Hung

    2007-08-01

    Full Text Available Abstract Background Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases. Results We propose a new linear encoding method, SARST (Structural similarity search Aided by Ramachandran Sequential Transformation. SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms. Conclusion As a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era.

  6. Scaling Group Testing Similarity Search

    Iscen, Ahmet; Amsaleg, Laurent; Furon, Teddy

    2016-01-01

    The large dimensionality of modern image feature vectors, up to thousands of dimensions, is challenging the high dimensional indexing techniques. Traditional approaches fail at returning good quality results within a response time that is usable in practice. However, similarity search techniques inspired by the group testing framework have recently been proposed in an attempt to specifically defeat the curse of dimensionality. Yet, group testing does not scale and fails at indexing very large...

  7. Semantically enabled image similarity search

    Casterline, May V.; Emerick, Timothy; Sadeghi, Kolia; Gosse, C. A.; Bartlett, Brent; Casey, Jason

    2015-05-01

    Georeferenced data of various modalities are increasingly available for intelligence and commercial use, however effectively exploiting these sources demands a unified data space capable of capturing the unique contribution of each input. This work presents a suite of software tools for representing geospatial vector data and overhead imagery in a shared high-dimension vector or embedding" space that supports fused learning and similarity search across dissimilar modalities. While the approach is suitable for fusing arbitrary input types, including free text, the present work exploits the obvious but computationally difficult relationship between GIS and overhead imagery. GIS is comprised of temporally-smoothed but information-limited content of a GIS, while overhead imagery provides an information-rich but temporally-limited perspective. This processing framework includes some important extensions of concepts in literature but, more critically, presents a means to accomplish them as a unified framework at scale on commodity cloud architectures.

  8. Biosequence Similarity Search on the Mercury System

    Krishnamurthy, Praveen; Buhler, Jeremy; Chamberlain, Roger; Franklin, Mark; Gyang, Kwame; Jacob, Arpith; Lancaster, Joseph

    2007-01-01

    Biosequence similarity search is an important application in modern molecular biology. Search algorithms aim to identify sets of sequences whose extensional similarity suggests a common evolutionary origin or function. The most widely used similarity search tool for biosequences is BLAST, a program designed to compare query sequences to a database. Here, we present the design of BLASTN, the version of BLAST that searches DNA sequences, on the Mercury system, an architecture that supports high...

  9. Similarity Measures for Boolean Search Request Formulations.

    Radecki, Tadeusz

    1982-01-01

    Proposes a means for determining the similarity between search request formulations in online information retrieval systems, and discusses the use of similarity measures for clustering search formulations and document files in such systems. Experimental results using the proposed methods are presented in three tables. A reference list is provided.…

  10. Secure sketch search for document similarity

    Örencik, Cengiz; Orencik, Cengiz; Alewiwi, Mahmoud Khaled; SAVAŞ, Erkay; Savas, Erkay

    2015-01-01

    Document similarity search is an important problem that has many applications especially in outsourced data. With the wide spread of cloud computing, users tend to outsource their data to remote servers which are not necessarily trusted. This leads to the problem of protecting the privacy of sensitive data. We design and implement two secure similarity search schemes for textual documents utilizing locality sensitive hashing techniques for cosine similarity. While the first one provides very ...

  11. Efficient Authentication of Outsourced String Similarity Search

    Dong, Boxiang; Wang, Hui

    2016-01-01

    Cloud computing enables the outsourcing of big data analytics, where a third party server is responsible for data storage and processing. In this paper, we consider the outsourcing model that provides string similarity search as the service. In particular, given a similarity search query, the service provider returns all strings from the outsourced dataset that are similar to the query string. A major security concern of the outsourcing paradigm is to authenticate whether the service provider...

  12. Mobile P2P Fast Similarity Search

    Bocek, T; Hecht, F. V.; Hausheer, D; Hunt, E; Stiller, B.

    2009-01-01

    In informal data sharing environments, misspellings cause problems for data indexing and retrieval. This is even more pronounced in mobile environments, in which devices with limited input devices are used. In a mobile environment, similarity search algorithms for finding misspelled data need to account for limited CPU and bandwidth. This demo shows P2P fast similarity search (P2PFastSS) running on mobile phones and laptops that is tailored to uncertain data entry and use...

  13. Multiresolution Similarity Search in Image Databases

    Heczko, Martin; Hinneburg, Alexander; Keim, Daniel A.; Wawryniuk, Markus

    2004-01-01

    Typically searching image collections is based on features of the images. In most cases the features are based on the color histogram of the images. Similarity search based on color histograms is very efficient, but the quality of the search results is often rather poor. One of the reasons is that histogram-based systems only support a specific form of global similarity using the whole histogram as one vector. But there is more information in a histogram than the distribution of colors. This ...

  14. Representation Independent Proximity and Similarity Search

    Chodpathumwan, Yodsawalai; Aleyasin, Amirhossein; Termehchy, Arash; Sun, Yizhou

    2015-01-01

    Finding similar or strongly related entities in a graph database is a fundamental problem in data management and analytics with applications in similarity query processing, entity resolution, and pattern matching. Similarity search algorithms usually leverage the structural properties of the data graph to quantify the degree of similarity or relevance between entities. Nevertheless, the same information can be represented in many different structures and the structural properties observed ove...

  15. Web Search Results Summarization Using Similarity Assessment

    Sawant V.V.

    2014-06-01

    Full Text Available Now day’s internet has become part of our life, the WWW is most important service of internet because it allows presenting information such as document, imaging etc. The WWW grows rapidly and caters to a diversified levels and categories of users. For user specified results web search results are extracted. Millions of information pouring online, users has no time to surf the contents completely .Moreover the information available is repeated or duplicated in nature. This issue has created the necessity to restructure the search results that could yield results summarized. The proposed approach comprises of different feature extraction of web pages. Web page visual similarity assessment has been employed to address the problems in different fields including phishing, web archiving, web search engine etc. In this approach, initially by enters user query the number of search results get stored. The Earth Mover's Distance is used to assessment of web page visual similarity, in this technique take the web page as a low resolution image, create signature of that web page image with color and co-ordinate features .Calculate the distance between web pages by applying EMD method. Compute the Layout Similarity value by using tag comparison algorithm and template comparison algorithm. Textual similarity is computed by using cosine similarity, and hyperlink analysis is performed to compute outward links. The final similarity value is calculated by fusion of layout, text, hyperlink and EMD value. Once the similarity matrix is found clustering is employed with the help of connected component. Finally group of similar web pages i.e. summarized results get displayed to user. Experiment conducted to demonstrate the effectiveness of four methods to generate summarized result on different web pages and user queries also.

  16. SEAL: Spatio-Textual Similarity Search

    Fan, Ju; Zhou, Lizhu; Chen, Shanshan; Hu, Jun

    2012-01-01

    Location-based services (LBS) have become more and more ubiquitous recently. Existing methods focus on finding relevant points-of-interest (POIs) based on users' locations and query keywords. Nowadays, modern LBS applications generate a new kind of spatio-textual data, regions-of-interest (ROIs), containing region-based spatial information and textual description, e.g., mobile user profiles with active regions and interest tags. To satisfy search requirements on ROIs, we study a new research problem, called spatio-textual similarity search: Given a set of ROIs and a query ROI, we find the similar ROIs by considering spatial overlap and textual similarity. Spatio-textual similarity search has many important applications, e.g., social marketing in location-aware social networks. It calls for an efficient search method to support large scales of spatio-textual data in LBS systems. To this end, we introduce a filter-and-verification framework to compute the answers. In the filter step, we generate signatures for ...

  17. New similarity search based glioma grading

    MR-based differentiation between low- and high-grade gliomas is predominately based on contrast-enhanced T1-weighted images (CE-T1w). However, functional MR sequences as perfusion- and diffusion-weighted sequences can provide additional information on tumor grade. Here, we tested the potential of a recently developed similarity search based method that integrates information of CE-T1w and perfusion maps for non-invasive MR-based glioma grading. We prospectively included 37 untreated glioma patients (23 grade I/II, 14 grade III gliomas), in whom 3T MRI with FLAIR, pre- and post-contrast T1-weighted, and perfusion sequences was performed. Cerebral blood volume, cerebral blood flow, and mean transit time maps as well as CE-T1w images were used as input for the similarity search. Data sets were preprocessed and converted to four-dimensional Gaussian Mixture Models that considered correlations between the different MR sequences. For each patient, a so-called tumor feature vector (= probability-based classifier) was defined and used for grading. Biopsy was used as gold standard, and similarity based grading was compared to grading solely based on CE-T1w. Accuracy, sensitivity, and specificity of pure CE-T1w based glioma grading were 64.9%, 78.6%, and 56.5%, respectively. Similarity search based tumor grading allowed differentiation between low-grade (I or II) and high-grade (III) gliomas with an accuracy, sensitivity, and specificity of 83.8%, 78.6%, and 87.0%. Our findings indicate that integration of perfusion parameters and CE-T1w information in a semi-automatic similarity search based analysis improves the potential of MR-based glioma grading compared to CE-T1w data alone. (orig.)

  18. New similarity search based glioma grading

    Haegler, Katrin; Brueckmann, Hartmut; Linn, Jennifer [Ludwig-Maximilians-University of Munich, Department of Neuroradiology, Munich (Germany); Wiesmann, Martin; Freiherr, Jessica [RWTH Aachen University, Department of Neuroradiology, Aachen (Germany); Boehm, Christian [Ludwig-Maximilians-University of Munich, Department of Computer Science, Munich (Germany); Schnell, Oliver; Tonn, Joerg-Christian [Ludwig-Maximilians-University of Munich, Department of Neurosurgery, Munich (Germany)

    2012-08-15

    MR-based differentiation between low- and high-grade gliomas is predominately based on contrast-enhanced T1-weighted images (CE-T1w). However, functional MR sequences as perfusion- and diffusion-weighted sequences can provide additional information on tumor grade. Here, we tested the potential of a recently developed similarity search based method that integrates information of CE-T1w and perfusion maps for non-invasive MR-based glioma grading. We prospectively included 37 untreated glioma patients (23 grade I/II, 14 grade III gliomas), in whom 3T MRI with FLAIR, pre- and post-contrast T1-weighted, and perfusion sequences was performed. Cerebral blood volume, cerebral blood flow, and mean transit time maps as well as CE-T1w images were used as input for the similarity search. Data sets were preprocessed and converted to four-dimensional Gaussian Mixture Models that considered correlations between the different MR sequences. For each patient, a so-called tumor feature vector (= probability-based classifier) was defined and used for grading. Biopsy was used as gold standard, and similarity based grading was compared to grading solely based on CE-T1w. Accuracy, sensitivity, and specificity of pure CE-T1w based glioma grading were 64.9%, 78.6%, and 56.5%, respectively. Similarity search based tumor grading allowed differentiation between low-grade (I or II) and high-grade (III) gliomas with an accuracy, sensitivity, and specificity of 83.8%, 78.6%, and 87.0%. Our findings indicate that integration of perfusion parameters and CE-T1w information in a semi-automatic similarity search based analysis improves the potential of MR-based glioma grading compared to CE-T1w data alone. (orig.)

  19. Comparison of Two ``Document Similarity Search Engines''

    Poinçot, Phillipe; Lesteven, Soizick; Murtagh, Fionn

    We have developed and used the ``CDS document map'' based on neural networks (Kohonen maps) http://simbad.u-strasbg.fr/A+A/map.pl In this self-organizing map, documents are gradually clustered by subject themes. The tool is based on keywords associated with the documents. For one selected document, we locate it on the CDS document map and retrieve articles clustered in the same area. The second search engine, used by the ADS (NASA Astrophysics Data System http://cdsads.u-strasbg.fr http://adswww.harvard.edu http://ads.nao.ac.jp, has the capability to find all similar abstracts in the ADS database, with ``keyword request''. We have compared the results of the document similarity search engines, using the same set of documents. One example will be described and results will be discussed.

  20. Efficient Video Similarity Measurement and Search

    Cheung, S-C S

    2002-12-19

    The amount of information on the world wide web has grown enormously since its creation in 1990. Duplication of content is inevitable because there is no central management on the web. Studies have shown that many similar versions of the same text documents can be found throughout the web. This redundancy problem is more severe for multimedia content such as web video sequences, as they are often stored in multiple locations and different formats to facilitate downloading and streaming. Similar versions of the same video can also be found, unknown to content creators, when web users modify and republish original content using video editing tools. Identifying similar content can benefit many web applications and content owners. For example, it will reduce the number of similar answers to a web search and identify inappropriate use of copyright content. In this dissertation, they present a system architecture and corresponding algorithms to efficiently measure, search, and organize similar video sequences found on any large database such as the web.

  1. Outsourced similarity search on metric data assets

    Yiu, Man Lung

    2012-02-01

    This paper considers a cloud computing setting in which similarity querying of metric data is outsourced to a service provider. The data is to be revealed only to trusted users, not to the service provider or anyone else. Users query the server for the most similar data objects to a query example. Outsourcing offers the data owner scalability and a low-initial investment. The need for privacy may be due to the data being sensitive (e.g., in medicine), valuable (e.g., in astronomy), or otherwise confidential. Given this setting, the paper presents techniques that transform the data prior to supplying it to the service provider for similarity queries on the transformed data. Our techniques provide interesting trade-offs between query cost and accuracy. They are then further extended to offer an intuitive privacy guarantee. Empirical studies with real data demonstrate that the techniques are capable of offering privacy while enabling efficient and accurate processing of similarity queries.

  2. Earthquake detection through computationally efficient similarity search.

    Yoon, Clara E; O'Reilly, Ossian; Bergen, Karianne J; Beroza, Gregory C

    2015-12-01

    Seismology is experiencing rapid growth in the quantity of data, which has outpaced the development of processing algorithms. Earthquake detection-identification of seismic events in continuous data-is a fundamental operation for observational seismology. We developed an efficient method to detect earthquakes using waveform similarity that overcomes the disadvantages of existing detection methods. Our method, called Fingerprint And Similarity Thresholding (FAST), can analyze a week of continuous seismic waveform data in less than 2 hours, or 140 times faster than autocorrelation. FAST adapts a data mining algorithm, originally designed to identify similar audio clips within large databases; it first creates compact "fingerprints" of waveforms by extracting key discriminative features, then groups similar fingerprints together within a database to facilitate fast, scalable search for similar fingerprint pairs, and finally generates a list of earthquake detections. FAST detected most (21 of 24) cataloged earthquakes and 68 uncataloged earthquakes in 1 week of continuous data from a station located near the Calaveras Fault in central California, achieving detection performance comparable to that of autocorrelation, with some additional false detections. FAST is expected to realize its full potential when applied to extremely long duration data sets over a distributed network of seismic stations. The widespread application of FAST has the potential to aid in the discovery of unexpected seismic signals, improve seismic monitoring, and promote a greater understanding of a variety of earthquake processes. PMID:26665176

  3. Performance Evaluation and Optimization of Math-Similarity Search

    Zhang, Qun; Youssef, Abdou

    2015-01-01

    Similarity search in math is to find mathematical expressions that are similar to a user's query. We conceptualized the similarity factors between mathematical expressions, and proposed an approach to math similarity search (MSS) by defining metrics based on those similarity factors [11]. Our preliminary implementation indicated the advantage of MSS compared to non-similarity based search. In order to more effectively and efficiently search similar math expressions, MSS is further optimized. ...

  4. Highly accurate recommendation algorithm based on high-order similarities

    Liu, Jian-Guo; Wang, Bing-Hong; Zhang, Yi-Cheng

    2008-01-01

    In this Letter, we introduce a modified collaborative filtering (MCF) algorithm, which has remarkably higher accuracy than the standard collaborative filtering. In the MCF, instead of the standard Pearson coefficient, the user-user similarities are obtained by a diffusion process. Furthermore, by considering the second order similarities, we design an effective algorithm that depresses the influence of mainstream preferences. The corresponding algorithmic accuracy, measured by the ranking score, is further improved by 24.9% in the optimal case. In addition, two significant criteria of algorithmic performance, diversity and popularity, are also taken into account. Numerical results show that the algorithm based on second order similarity can outperform the MCF simultaneously in all three criteria.

  5. Web Search Results Summarization Using Similarity Assessment

    Sawant V.V.; Takale S.A.

    2014-01-01

    Now day’s internet has become part of our life, the WWW is most important service of internet because it allows presenting information such as document, imaging etc. The WWW grows rapidly and caters to a diversified levels and categories of users. For user specified results web search results are extracted. Millions of information pouring online, users has no time to surf the contents completely .Moreover the information available is repeated or duplicated in nature. This issue has created th...

  6. Outsourced similarity search on metric data assets

    Yiu, Man Lung; Assent, Ira; Jensen, Christian Søndergaard;

    2012-01-01

    This paper considers a cloud computing setting in which similarity querying of metric data is outsourced to a service provider. The data is to be revealed only to trusted users, not to the service provider or anyone else. Users query the server for the most similar data objects to a query example...

  7. A Similarity Search Using Molecular Topological Graphs

    Yoshifumi Fukunishi

    2009-01-01

    Full Text Available A molecular similarity measure has been developed using molecular topological graphs and atomic partial charges. Two kinds of topological graphs were used. One is the ordinary adjacency matrix and the other is a matrix which represents the minimum path length between two atoms of the molecule. The ordinary adjacency matrix is suitable to compare the local structures of molecules such as functional groups, and the other matrix is suitable to compare the global structures of molecules. The combination of these two matrices gave a similarity measure. This method was applied to in silico drug screening, and the results showed that it was effective as a similarity measure.

  8. Fast similarity search in peer-to-peer networks

    Bocek, T; Hunt, E; Hausheer, D; Stiller, B.

    2008-01-01

    Peer-to-peer (P2P) systems show numerous advantages over centralized systems, such as load balancing, scalability, and fault tolerance, and they require certain functionality, such as search, repair, and message and data transfer. In particular, structured P2P networks perform an exact search in logarithmic time proportional to the number of peers. However, keyword similarity search in a structured P2P network remains a challenge. Similarity search for service discovery can significantly impr...

  9. The Time Course of Similarity Effects in Visual Search

    Guest, Duncan; Lamberts, Koen

    2011-01-01

    It is well established that visual search becomes harder when the similarity between target and distractors is increased and the similarity between distractors is decreased. However, in models of visual search, similarity is typically treated as a static, time-invariant property of the relation between objects. Data from other perceptual tasks…

  10. A Similarity Search Using Molecular Topological Graphs

    2009-01-01

    A molecular similarity measure has been developed using molecular topological graphs and atomic partial charges. Two kinds of topological graphs were used. One is the ordinary adjacency matrix and the other is a matrix which represents the minimum path length between two atoms of the molecule. The ordinary adjacency matrix is suitable to compare the local structures of molecules such as functional groups, and the other matrix is suitable to compare the global structures of molecules. The comb...

  11. Learning Style Similarity for Searching Infographics

    Saleh, Babak; Dontcheva, Mira; Hertzmann, Aaron; Liu, Zhicheng

    2015-01-01

    Infographics are complex graphic designs integrating text, images, charts and sketches. Despite the increasing popularity of infographics and the rapid growth of online design portfolios, little research investigates how we can take advantage of these design resources. In this paper we present a method for measuring the style similarity between infographics. Based on human perception data collected from crowdsourced experiments, we use computer vision and machine learning algorithms to learn ...

  12. Visual similarity is stronger than semantic similarity in guiding visual search for numbers

    Godwin, H.J.; Hout, M.C.; Menneer, T.

    2014-01-01

    Using a visual search task, we explored how behavior is influenced by both visual and semantic information. We recorded participants’ eye movements as they searched for a single target number in a search array of single-digit numbers (0–9). We examined the probability of fixating the various distractors as a function of two key dimensions: the visual similarity between the target and each distractor, and the semantic similarity (i.e., the numerical distance) between the target and each distra...

  13. Fast and secure similarity search in high dimensional space

    Furon, Teddy; Jégou, Hervé; Amsaleg, Laurent; Mathon, Benjamin

    2013-01-01

    Similarity search in high dimensional space database is split into two worlds: i) fast, scalable, and approximate search algorithms which are not secure, and ii) search protocols based on secure computation which are not scalable. This paper presents a one-way privacy protocol that lies in between these two worlds. Approximate metrics for the cosine similarity allows speed. Elements of large random matrix theory provides security evidences if the size of the database is not too big with respe...

  14. Distributed Efficient Similarity Search Mechanism in Wireless Sensor Networks

    Khandakar Ahmed

    2015-03-01

    Full Text Available The Wireless Sensor Network similarity search problem has received considerable research attention due to sensor hardware imprecision and environmental parameter variations. Most of the state-of-the-art distributed data centric storage (DCS schemes lack optimization for similarity queries of events. In this paper, a DCS scheme with metric based similarity searching (DCSMSS is proposed. DCSMSS takes motivation from vector distance index, called iDistance, in order to transform the issue of similarity searching into the problem of an interval search in one dimension. In addition, a sector based distance routing algorithm is used to efficiently route messages. Extensive simulation results reveal that DCSMSS is highly efficient and significantly outperforms previous approaches in processing similarity search queries.

  15. Distributed efficient similarity search mechanism in wireless sensor networks.

    Ahmed, Khandakar; Gregory, Mark A

    2015-01-01

    The Wireless Sensor Network similarity search problem has received considerable research attention due to sensor hardware imprecision and environmental parameter variations. Most of the state-of-the-art distributed data centric storage (DCS) schemes lack optimization for similarity queries of events. In this paper, a DCS scheme with metric based similarity searching (DCSMSS) is proposed. DCSMSS takes motivation from vector distance index, called iDistance, in order to transform the issue of similarity searching into the problem of an interval search in one dimension. In addition, a sector based distance routing algorithm is used to efficiently route messages. Extensive simulation results reveal that DCSMSS is highly efficient and significantly outperforms previous approaches in processing similarity search queries. PMID:25751081

  16. Distributed Efficient Similarity Search Mechanism in Wireless Sensor Networks

    Khandakar Ahmed; Gregory, Mark A.

    2015-01-01

    The Wireless Sensor Network similarity search problem has received considerable research attention due to sensor hardware imprecision and environmental parameter variations. Most of the state-of-the-art distributed data centric storage (DCS) schemes lack optimization for similarity queries of events. In this paper, a DCS scheme with metric based similarity searching (DCSMSS) is proposed. DCSMSS takes motivation from vector distance index, called iDistance, in order to transform the issue of s...

  17. Activity-relevant similarity values for fingerprints and implications for similarity searching

    Swarit Jasial; Ye Hu; Martin Vogt; Jürgen Bajorath

    2016-01-01

    A largely unsolved problem in chemoinformatics is the issue of how calculated compound similarity relates to activity similarity, which is central to many applications. In general, activity relationships are predicted from calculated similarity values. However, there is no solid scientific foundation to bridge between calculated molecular and observed activity similarity. Accordingly, the success rate of identifying new active compounds by similarity searching is limited. Although various att...

  18. How Google Web Search copes with very similar documents

    Mettrop, W.; Nieuwenhuysen, P.; Smulders, H.

    2006-01-01

    A significant portion of the computer files that carry documents, multimedia, programs etc. on the Web are identical or very similar to other files on the Web. How do search engines cope with this? Do they perform some kind of “deduplication”? How should users take into account that web search resul

  19. Effective and Efficient Similarity Search in Scientific Workflow Repositories

    Starlinger, Johannes; Cohen-Boulakia, Sarah; Khanna, Sanjeev; Davidson, Susan; Leser, Ulf

    2015-01-01

    Scientific workflows have become a valuable tool for large-scale data processing and analysis. This has led to the creation of specialized online repositories to facilitate worflkow sharing and reuse. Over time, these repositories have grown to sizes that call for advanced methods to support workflow discovery, in particular for similarity search. Effective similarity search requires both high quality algorithms for the comparison of scientific workflows and efficient strategies for indexing,...

  20. Fast and accurate protein substructure searching with simulated annealing and GPUs

    Stivala Alex D

    2010-09-01

    Full Text Available Abstract Background Searching a database of protein structures for matches to a query structure, or occurrences of a structural motif, is an important task in structural biology and bioinformatics. While there are many existing methods for structural similarity searching, faster and more accurate approaches are still required, and few current methods are capable of substructure (motif searching. Results We developed an improved heuristic for tableau-based protein structure and substructure searching using simulated annealing, that is as fast or faster and comparable in accuracy, with some widely used existing methods. Furthermore, we created a parallel implementation on a modern graphics processing unit (GPU. Conclusions The GPU implementation achieves up to 34 times speedup over the CPU implementation of tableau-based structure search with simulated annealing, making it one of the fastest available methods. To the best of our knowledge, this is the first application of a GPU to the protein structural search problem.

  1. Indexing schemes for similarity search: an illustrated paradigm

    Pestov, Vladimir; Stojmirovic, Aleksandar

    2002-01-01

    We suggest a variation of the Hellerstein--Koutsoupias--Papadimitriou indexability model for datasets equipped with a similarity measure, with the aim of better understanding the structure of indexing schemes for similarity-based search and the geometry of similarity workloads. This in particular provides a unified approach to a great variety of schemes used to index into metric spaces and facilitates their transfer to more general similarity measures such as quasi-metrics. We discuss links b...

  2. Efficient Subgraph Similarity Search on Large Probabilistic Graph Databases

    Yuan, Ye; Chen, Lei; Wang, Haixun

    2012-01-01

    Many studies have been conducted on seeking the efficient solution for subgraph similarity search over certain (deterministic) graphs due to its wide application in many fields, including bioinformatics, social network analysis, and Resource Description Framework (RDF) data management. All these works assume that the underlying data are certain. However, in reality, graphs are often noisy and uncertain due to various factors, such as errors in data extraction, inconsistencies in data integration, and privacy preserving purposes. Therefore, in this paper, we study subgraph similarity search on large probabilistic graph databases. Different from previous works assuming that edges in an uncertain graph are independent of each other, we study the uncertain graphs where edges' occurrences are correlated. We formally prove that subgraph similarity search over probabilistic graphs is #P-complete, thus, we employ a filter-and-verify framework to speed up the search. In the filtering phase,we develop tight lower and u...

  3. SEARCH PROFILES BASED ON USER TO CLUSTER SIMILARITY

    Ilija Subasic

    2007-12-01

    Full Text Available Privacy of web users' query search logs has, since last year's AOL dataset release, been treated as one of the central issues concerning privacy on the Internet, Therefore, the question of privacy preservation has also raised a lot of attention in different communities surrounding the search engines. Usage of clustering methods for providing low level contextual search, wriile retaining high privacy/utility is examined in this paper. By using only the user's cluster membership the search query terms could be no longer retained thus providing less privacy concerns both for the users and companies. The paper brings lightweight framework for combining query words, user similarities and clustering in order to provide a meaningful way of mining user searches while protecting their privacy. This differs from previous attempts for privacy preserving in the attempt to anonymize the queries instead of the users.

  4. The breakfast effect: dogs (Canis familiaris) search more accurately when they are less hungry.

    Miller, Holly C; Bender, Charlotte

    2012-11-01

    We investigated whether the consumption of a morning meal (breakfast) by dogs (Canis familiaris) would affect search accuracy on a working memory task following the exertion of self-control. Dogs were tested either 30 or 90 min after consuming half of their daily resting energy requirements (RER). During testing dogs were initially required to sit still for 10 min before searching for hidden food in a visible displacement task. We found that 30 min following the consumption of breakfast, and 10 min after the behavioral inhibition task, dogs searched more accurately than they did in a fasted state. Similar differences were not observed when dogs were tested 90 min after meal consumption. This pattern of behavior suggests that breakfast enhanced search accuracy following a behavioral inhibition task by providing energy for cognitive processes, and that search accuracy decreased as a function of energy depletion. PMID:23032958

  5. Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision

    Holliday John D; Kanoulas Evangelos; Malim Nurul; Willett Peter

    2011-01-01

    Abstract Background Data fusion methods are widely used in virtual screening, and make the implicit assumption that the more often a molecule is retrieved in multiple similarity searches, the more likely it is to be active. This paper tests the correctness of this assumption. Results Sets of 25 searches using either the same reference structure and 25 different similarity measures (similarity fusion) or 25 different reference structures and the same similarity measure (group fusion) show that...

  6. Improving spectral library search by redefining similarity measures.

    Garg, Ankita; Enright, Catherine G; Madden, Michael G

    2015-05-26

    Similarity plays a central role in spectral library search. The goal of spectral library search is to identify those spectra in a reference library of known materials that most closely match an unknown query spectrum, on the assumption that this will allow us to identify the main constituent(s) of the query spectrum. The similarity measures used for this task in software and the academic literature are almost exclusively metrics, meaning that the measures obey the three axioms of metrics: (1) minimality; (2) symmetry; (3) triangle inequality. Consequently, they implicitly assume that the query spectrum is drawn from the same distribution as that of the reference library. In this paper, we demonstrate that this assumption is not necessary in practical spectral library search and that in fact it is often violated in practice. Although the reference library may be constructed carefully, it is generally impossible to guarantee that all future query spectra will be drawn from the same distribution as the reference library. Before evaluating different similarity measures, we need to understand how they define the relationship between spectra. In spectral library search, we often aim to find the constituent(s) of a mixture. We propose that, rather than asking which reference library spectra are similar to the mixture, we should ask which of the reference library spectra are contained in the given query mixture. This question is inherently asymmetric. Therefore, we should adopt a nonmetric measure. To evaluate our hypothesis, we apply a nonmetric measure formulated by Tversky [Psychol. Rev. 1977, 84, 327-352] known as the Contrast Model and compare its performance to the well-known Jaccard similarity index metric on spectroscopic data sets. Our results show that the Tversky similarity measure yields better results than the Jaccard index. PMID:25902003

  7. A Visual Similarity-Based 3D Search Engine

    Lmaati, Elmustapha Ait; Oirrak, Ahmed El; M.N. Kaddioui

    2009-01-01

    Retrieval systems for 3D objects are required because 3D databases used around the web are growing. In this paper, we propose a visual similarity based search engine for 3D objects. The system is based on a new representation of 3D objects given by a 3D closed curve that captures all information about the surface of the 3D object. We propose a new 3D descriptor, which is a combination of three signatures of this new representation, and we implement it in our interactive web based search engin...

  8. RAPSearch: a fast protein similarity search tool for short reads

    Choi Jeong-Hyeon

    2011-05-01

    Full Text Available Abstract Background Next Generation Sequencing (NGS is producing enormous corpuses of short DNA reads, affecting emerging fields like metagenomics. Protein similarity search--a key step to achieve annotation of protein-coding genes in these short reads, and identification of their biological functions--faces daunting challenges because of the very sizes of the short read datasets. Results We developed a fast protein similarity search tool RAPSearch that utilizes a reduced amino acid alphabet and suffix array to detect seeds of flexible length. For short reads (translated in 6 frames we tested, RAPSearch achieved ~20-90 times speedup as compared to BLASTX. RAPSearch missed only a small fraction (~1.3-3.2% of BLASTX similarity hits, but it also discovered additional homologous proteins (~0.3-2.1% that BLASTX missed. By contrast, BLAT, a tool that is even slightly faster than RAPSearch, had significant loss of sensitivity as compared to RAPSearch and BLAST. Conclusions RAPSearch is implemented as open-source software and is accessible at http://omics.informatics.indiana.edu/mg/RAPSearch. It enables faster protein similarity search. The application of RAPSearch in metageomics has also been demonstrated.

  9. Online multiple kernel similarity learning for visual search.

    Xia, Hao; Hoi, Steven C H; Jin, Rong; Zhao, Peilin

    2014-03-01

    Recent years have witnessed a number of studies on distance metric learning to improve visual similarity search in content-based image retrieval (CBIR). Despite their successes, most existing methods on distance metric learning are limited in two aspects. First, they usually assume the target proximity function follows the family of Mahalanobis distances, which limits their capacity of measuring similarity of complex patterns in real applications. Second, they often cannot effectively handle the similarity measure of multimodal data that may originate from multiple resources. To overcome these limitations, this paper investigates an online kernel similarity learning framework for learning kernel-based proximity functions which goes beyond the conventional linear distance metric learning approaches. Based on the framework, we propose a novel online multiple kernel similarity (OMKS) learning method which learns a flexible nonlinear proximity function with multiple kernels to improve visual similarity search in CBIR. We evaluate the proposed technique for CBIR on a variety of image data sets in which encouraging results show that OMKS outperforms the state-of-the-art techniques significantly. PMID:24457509

  10. Similarity preserving snippet-based visualization of web search results.

    Gomez-Nieto, Erick; San Roman, Frizzi; Pagliosa, Paulo; Casaca, Wallace; Helou, Elias S; de Oliveira, Maria Cristina F; Nonato, Luis Gustavo

    2014-03-01

    Internet users are very familiar with the results of a search query displayed as a ranked list of snippets. Each textual snippet shows a content summary of the referred document (or webpage) and a link to it. This display has many advantages, for example, it affords easy navigation and is straightforward to interpret. Nonetheless, any user of search engines could possibly report some experience of disappointment with this metaphor. Indeed, it has limitations in particular situations, as it fails to provide an overview of the document collection retrieved. Moreover, depending on the nature of the query--for example, it may be too general, or ambiguous, or ill expressed--the desired information may be poorly ranked, or results may contemplate varied topics. Several search tasks would be easier if users were shown an overview of the returned documents, organized so as to reflect how related they are, content wise. We propose a visualization technique to display the results of web queries aimed at overcoming such limitations. It combines the neighborhood preservation capability of multidimensional projections with the familiar snippet-based representation by employing a multidimensional projection to derive two-dimensional layouts of the query search results that preserve text similarity relations, or neighborhoods. Similarity is computed by applying the cosine similarity over a "bag-of-words" vector representation of collection built from the snippets. If the snippets are displayed directly according to the derived layout, they will overlap considerably, producing a poor visualization. We overcome this problem by defining an energy functional that considers both the overlapping among snippets and the preservation of the neighborhood structure as given in the projected layout. Minimizing this energy functional provides a neighborhood preserving two-dimensional arrangement of the textual snippets with minimum overlap. The resulting visualization conveys both a global

  11. Self-Taught Hashing for Fast Similarity Search

    Zhang, Dell; Cai, Deng; Lu, Jinsong

    2010-01-01

    The ability of fast similarity search at large scale is of great importance to many Information Retrieval (IR) applications. A promising way to accelerate similarity search is semantic hashing which designs compact binary codes for a large number of documents so that semantically similar documents are mapped to similar codes (within a short Hamming distance). Although some recently proposed techniques are able to generate high-quality codes for documents known in advance, obtaining the codes for previously unseen documents remains to be a very challenging problem. In this paper, we emphasise this issue and propose a novel Self-Taught Hashing (STH) approach to semantic hashing: we first find the optimal $l$-bit binary codes for all documents in the given corpus via unsupervised learning, and then train $l$ classifiers via supervised learning to predict the $l$-bit code for any query document unseen before. Our experiments on three real-world text datasets show that the proposed approach using binarised Laplaci...

  12. On optimizing distance-based similarity search for biological databases.

    Mao, Rui; Xu, Weijia; Ramakrishnan, Smriti; Nuckolls, Glen; Miranker, Daniel P

    2005-01-01

    Similarity search leveraging distance-based index structures is increasingly being used for both multimedia and biological database applications. We consider distance-based indexing for three important biological data types, protein k-mers with the metric PAM model, DNA k-mers with Hamming distance and peptide fragmentation spectra with a pseudo-metric derived from cosine distance. To date, the primary driver of this research has been multimedia applications, where similarity functions are often Euclidean norms on high dimensional feature vectors. We develop results showing that the character of these biological workloads is different from multimedia workloads. In particular, they are not intrinsically very high dimensional, and deserving different optimization heuristics. Based on MVP-trees, we develop a pivot selection heuristic seeking centers and show it outperforms the most widely used corner seeking heuristic. Similarly, we develop a data partitioning approach sensitive to the actual data distribution in lieu of median splits. PMID:16447992

  13. Computing Semantic Similarity Measure Between Words Using Web Search Engine

    Pushpa C N

    2013-05-01

    Full Text Available Semantic Similarity measures between words plays an important role in information retrieval, natural language processing and in various tasks on the web. In this paper, we have proposed a Modified Pattern Extraction Algorithm to compute th e supervised semantic similarity measure between the words by combining both page count meth od and web snippets method. Four association measures are used to find semantic simi larity between words in page count method using web search engines. We use a Sequential Minim al Optimization (SMO support vector machines (SVM to find the optimal combination of p age counts-based similarity scores and top-ranking patterns from the web snippets method. The SVM is trained to classify synonymous word-pairs and non-synonymous word-pairs. The propo sed Modified Pattern Extraction Algorithm outperforms by 89.8 percent of correlatio n value.

  14. On Fuzzy vs. Metric Similarity Search in Complex Databases

    Eckhardt, Alan; Skopal, T.; Vojtáš, Peter

    Berlin: Springer, 2009 - ( And reasen, T.; Yager, R.; Bulskov, H.; Christiansen, H.; Larsen, H.), s. 64-75. (Lecture Notes in Artificial Intelligence . 5822). ISBN 978-3-642-04956-9. ISSN 0302-9743. [FQAS 2009. International Conference on Flexible Query Answering Systems /8./. Roskilde (DK), 26.10.2009-28.10.2009] R&D Projects: GA AV ČR 1ET100300517; GA ČR GD201/09/H057 Grant ostatní: GA ČR(CZ) GA201/09/0683 Institutional research plan: CEZ:AV0Z10300504 Keywords : fuzzy operators * non-metric search * similarity search * indexing Subject RIV: IN - Informatics, Computer Science

  15. SHOP: scaffold hopping by GRID-based similarity searches

    Bergmann, Rikke; Linusson, Anna; Zamora, Ismael

    2007-01-01

    A new GRID-based method for scaffold hopping (SHOP) is presented. In a fully automatic manner, scaffolds were identified in a database based on three types of 3D-descriptors. SHOP's ability to recover scaffolds was assessed and validated by searching a database spiked with fragments of known...... scaffolds were in the 31 top-ranked scaffolds. SHOP also identified new scaffolds with substantially different chemotypes from the queries. Docking analysis indicated that the new scaffolds would have similar binding modes to those of the respective query scaffolds observed in X-ray structures. The...

  16. Quick and easy implementation of approximate similarity search with Lucene

    Amato, Giuseppe; Bolettieri, Paolo; Gennaro, Claudio; Rabitti, Fausto

    2013-01-01

    Similarity search technique has been proved to be an effective way for retrieving multimedia content. However, as the amount of available multimedia data increases, the cost of developing from scratch a robust and scalable system with content-based image retrieval facilities is quite prohibitive. In this paper, we propose to exploit an approach that allows us to convert low level features into a textual form. In this way, we are able to easily set up a retrieval system on top of the Lucene se...

  17. An efficient similarity search based on indexing in large DNA databases.

    Jeong, In-Seon; Park, Kyoung-Wook; Kang, Seung-Ho; Lim, Hyeong-Seok

    2010-04-01

    Index-based search algorithms are an important part of a genomic search, and how to construct indices is the key to an index-based search algorithm to compute similarities between two DNA sequences. In this paper, we propose an efficient query processing method that uses special transformations to construct an index. It uses small storage and it rapidly finds the similarity between two sequences in a DNA sequence database. At first, a sequence is partitioned into equal length windows. We select the likely subsequences by computing Hamming distance to query sequence. The algorithm then transforms the subsequences in each window into a multidimensional vector space by indexing the frequencies of the characters, including the positional information of the characters in the subsequences. The result of our experiments shows that the algorithm has faster run time than other heuristic algorithms based on index structure. Also, the algorithm is as accurate as those heuristic algorithms. PMID:20418167

  18. Rank-Based Similarity Search: Reducing the Dimensional Dependence.

    Houle, Michael E; Nett, Michael

    2015-01-01

    This paper introduces a data structure for k-NN search, the Rank Cover Tree (RCT), whose pruning tests rely solely on the comparison of similarity values; other properties of the underlying space, such as the triangle inequality, are not employed. Objects are selected according to their ranks with respect to the query object, allowing much tighter control on the overall execution costs. A formal theoretical analysis shows that with very high probability, the RCT returns a correct query result in time that depends very competitively on a measure of the intrinsic dimensionality of the data set. The experimental results for the RCT show that non-metric pruning strategies for similarity search can be practical even when the representational dimension of the data is extremely high. They also show that the RCT is capable of meeting or exceeding the level of performance of state-of-the-art methods that make use of metric pruning or other selection tests involving numerical constraints on distance values. PMID:26353214

  19. Query-dependent banding (QDB for faster RNA similarity searches.

    Eric P Nawrocki

    2007-03-01

    Full Text Available When searching sequence databases for RNAs, it is desirable to score both primary sequence and RNA secondary structure similarity. Covariance models (CMs are probabilistic models well-suited for RNA similarity search applications. However, the computational complexity of CM dynamic programming alignment algorithms has limited their practical application. Here we describe an acceleration method called query-dependent banding (QDB, which uses the probabilistic query CM to precalculate regions of the dynamic programming lattice that have negligible probability, independently of the target database. We have implemented QDB in the freely available Infernal software package. QDB reduces the average case time complexity of CM alignment from LN(2.4 to LN(1.3 for a query RNA of N residues and a target database of L residues, resulting in a 4-fold speedup for typical RNA queries. Combined with other improvements to Infernal, including informative mixture Dirichlet priors on model parameters, benchmarks also show increased sensitivity and specificity resulting from improved parameterization.

  20. Efficient Similarity Search Using the Earth Mover's Distance for Large Multimedia Databases

    Assent, Ira; Wichterich, Marc; Meisen, Tobias;

    2008-01-01

    Multimedia similarity search in large databases requires efficient query processing. The Earth mover's distance, introduced in computer vision, is successfully used as a similarity model in a number of small-scale applications. Its computational complexity hindered its adoption in large multimedia...... databases. We enable directly indexing the Earth mover's distance in structures such as the R-tree and the VA-file by providing the accurate 'MinDist' function to any bounding rectangle in the index. We exploit the computational structure of the new MinDist to derive a new lower bound for the EMD Min...

  1. Activity-relevant similarity values for fingerprints and implications for similarity searching [version 1; referees: 3 approved

    Swarit Jasial

    2016-04-01

    Full Text Available A largely unsolved problem in chemoinformatics is the issue of how calculated compound similarity relates to activity similarity, which is central to many applications. In general, activity relationships are predicted from calculated similarity values. However, there is no solid scientific foundation to bridge between calculated molecular and observed activity similarity. Accordingly, the success rate of identifying new active compounds by similarity searching is limited. Although various attempts have been made to establish relationships between calculated fingerprint similarity values and biological activities, none of these has yielded generally applicable rules for similarity searching. In this study, we have addressed the question of molecular versus activity similarity in a more fundamental way. First, we have evaluated if activity-relevant similarity value ranges could in principle be identified for standard fingerprints and distinguished from similarity resulting from random compound comparisons. Then, we have analyzed if activity-relevant similarity values could be used to guide typical similarity search calculations aiming to identify active compounds in databases. It was found that activity-relevant similarity values can be identified as a characteristic feature of fingerprints. However, it was also shown that such values cannot be reliably used as thresholds for practical similarity search calculations. In addition, the analysis presented herein helped to rationalize differences in fingerprint search performance.

  2. Fast and accurate database searches with MS-GF+Percolator.

    Granholm, Viktor; Kim, Sangtae; Navarro, José C F; Sjölund, Erik; Smith, Richard D; Käll, Lukas

    2014-02-01

    One can interpret fragmentation spectra stemming from peptides in mass-spectrometry-based proteomics experiments using so-called database search engines. Frequently, one also runs post-processors such as Percolator to assess the confidence, infer unique peptides, and increase the number of identifications. A recent search engine, MS-GF+, has shown promising results, due to a new and efficient scoring algorithm. However, MS-GF+ provides few statistical estimates about the peptide-spectrum matches, hence limiting the biological interpretation. Here, we enabled Percolator processing for MS-GF+ output and observed an increased number of identified peptides for a wide variety of data sets. In addition, Percolator directly reports p values and false discovery rate estimates, such as q values and posterior error probabilities, for peptide-spectrum matches, peptides, and proteins, functions that are useful for the whole proteomics community. PMID:24344789

  3. An accurate algorithm to calculate the Hurst exponent of self-similar processes

    In this paper, we introduce a new approach which generalizes the GM2 algorithm (introduced in Sánchez-Granero et al. (2008) [52]) as well as fractal dimension algorithms (FD1, FD2 and FD3) (first appeared in Sánchez-Granero et al. (2012) [51]), providing an accurate algorithm to calculate the Hurst exponent of self-similar processes. We prove that this algorithm performs properly in the case of short time series when fractional Brownian motions and Lévy stable motions are considered. We conclude the paper with a dynamic study of the Hurst exponent evolution in the S and P500 index stocks. - Highlights: • We provide a new approach to properly calculate the Hurst exponent. • This generalizes FD algorithms and GM2, introduced previously by the authors. • This method (FD4) results especially appropriate for short time series. • FD4 may be used in both unifractal and multifractal contexts. • As an empirical application, we show that S and P500 stocks improved their efficiency

  4. Keyword Search over Data Service Integration for Accurate Results

    Zemleris, Vidmantas; Robert Gwadera

    2013-01-01

    Virtual data integration provides a coherent interface for querying heterogeneous data sources (e.g., web services, proprietary systems) with minimum upfront effort. Still, this requires its users to learn the query language and to get acquainted with data organization, which may pose problems even to proficient users. We present a keyword search system, which proposes a ranked list of structured queries along with their explanations. It operates mainly on the metadata, such as the constraints on inputs accepted by services. It was developed as an integral part of the CMS data discovery service, and is currently available as open source.

  5. Keyword search over data service integration for accurate results

    Virtual Data Integration provides a coherent interface for querying heterogeneous data sources (e.g., web services, proprietary systems) with minimum upfront effort. Still, this requires its users to learn a new query language and to get acquainted with data organization which may pose problems even to proficient users. We present a keyword search system, which proposes a ranked list of structured queries along with their explanations. It operates mainly on the metadata, such as the constraints on inputs accepted by services. It was developed as an integral part of the CMS data discovery service, and is currently available as open source.

  6. Activity-relevant similarity values for fingerprints and implications for similarity searching [version 2; referees: 3 approved

    Swarit Jasial; Ye Hu; Martin Vogt; Jürgen Bajorath

    2016-01-01

    A largely unsolved problem in chemoinformatics is the issue of how calculated compound similarity relates to activity similarity, which is central to many applications. In general, activity relationships are predicted from calculated similarity values. However, there is no solid scientific foundation to bridge between calculated molecular and observed activity similarity. Accordingly, the success rate of identifying new active compounds by similarity searching is limited. Although various att...

  7. Similarity between Grover's quantum search algorithm and classical two-body collisions

    Zhang, Jingfu; Lu, Zhiheng

    2001-01-01

    By studying the attribute of the inversion about average operation in quantum searching algorithm, we find the similarity between the quantum searching and the course of two rigid bodies'collision. Some related questions are discussed from this similarity.

  8. Gene expression module-based chemical function similarity search

    Li, Yun; Hao, Pei; Zheng, Siyuan; Tu, Kang; Fan, Haiwei; Zhu, Ruixin; Ding, Guohui; Dong, Changzheng; Wang, Chuan; Li, Xuan; Thiesen, H.-J.; Chen, Y. Eugene; Jiang, HuaLiang; Liu, Lei; Li, Yixue

    2008-01-01

    Investigation of biological processes using selective chemical interventions is generally applied in biomedical research and drug discovery. Many studies of this kind make use of gene expression experiments to explore cellular responses to chemical interventions. Recently, some research groups constructed libraries of chemical related expression profiles, and introduced similarity comparison into chemical induced transcriptome analysis. Resembling sequence similarity alignment, expression pat...

  9. Cognitive Residues of Similarity: 'After-Effects' of Similarity Computations in Visual Search

    O'Toole, Stephanie; Keane, Mark T.

    2013-01-01

    What are the 'cognitive after-effects' of making a similarity judgement? What, cognitively, is left behind and what effect might these residues have on subsequent processing? In this paper, we probe for such after-effects using a visual searcht ask, performed after a task in which pictures of real-world objects were compared. So, target objects were first presented in a comparison task (e.g., rate the similarity of this object to another) thus, presumably, modifying some of their features bef...

  10. G-Hash: Towards Fast Kernel-based Similarity Search in Large Graph Databases

    Wang, Xiaohong; Smalter, Aaron; Huan, Jun; Lushington, Gerald H.

    2009-01-01

    Structured data including sets, sequences, trees and graphs, pose significant challenges to fundamental aspects of data management such as efficient storage, indexing, and similarity search. With the fast accumulation of graph databases, similarity search in graph databases has emerged as an important research topic. Graph similarity search has applications in a wide range of domains including cheminformatics, bioinformatics, sensor network management, social network management, and XML docum...

  11. Ranking and clustering of search results: Analysis of Similarity graph

    Shevchuk, Ksenia Alexander

    2008-01-01

    Evaluate the clustering of the similarity matrix and confirm that it is high. Compare the ranking results of the eigenvector ranking and the Link Popularity ranking and confirm for the high clustered graph the correlation between those is larger than for the low clustered graph.

  12. Density-based similarity measures for content based search

    Hush, Don R [Los Alamos National Laboratory; Porter, Reid B [Los Alamos National Laboratory; Ruggiero, Christy E [Los Alamos National Laboratory

    2009-01-01

    We consider the query by multiple example problem where the goal is to identify database samples whose content is similar to a coUection of query samples. To assess the similarity we use a relative content density which quantifies the relative concentration of the query distribution to the database distribution. If the database distribution is a mixture of the query distribution and a background distribution then it can be shown that database samples whose relative content density is greater than a particular threshold {rho} are more likely to have been generated by the query distribution than the background distribution. We describe an algorithm for predicting samples with relative content density greater than {rho} that is computationally efficient and possesses strong performance guarantees. We also show empirical results for applications in computer network monitoring and image segmentation.

  13. Perceptual Grouping in Haptic Search: The Influence of Proximity, Similarity, and Good Continuation

    Overvliet, Krista E.; Krampe, Ralf Th.; Wagemans, Johan

    2012-01-01

    We conducted a haptic search experiment to investigate the influence of the Gestalt principles of proximity, similarity, and good continuation. We expected faster search when the distractors could be grouped. We chose edges at different orientations as stimuli because they are processed similarly in the haptic and visual modality. We therefore…

  14. Improving image similarity search effectiveness in a multimedia content management system

    Amato, Giuseppe; Falchi, Fabrizio; Gennaro, Claudio; Rabitti, Fausto; Savino, Pasquale; Stanchev, Peter

    2004-01-01

    In this paper, a technique for making more effective the similarity search process of images in a Multimedia Content Management System is proposed. The content-based retrieval process integrates the search on different multimedia components, linked in XML structures. Depending on the specific characteristics of an image data set, some features can be more effective than others when performing similarity search. Starting from this observation, we propose a technique that predicts the effective...

  15. Searching the protein structure database for ligand-binding site similarities using CPASS v.2

    Caprez Adam

    2011-01-01

    Full Text Available Abstract Background A recent analysis of protein sequences deposited in the NCBI RefSeq database indicates that ~8.5 million protein sequences are encoded in prokaryotic and eukaryotic genomes, where ~30% are explicitly annotated as "hypothetical" or "uncharacterized" protein. Our Comparison of Protein Active-Site Structures (CPASS v.2 database and software compares the sequence and structural characteristics of experimentally determined ligand binding sites to infer a functional relationship in the absence of global sequence or structure similarity. CPASS is an important component of our Functional Annotation Screening Technology by NMR (FAST-NMR protocol and has been successfully applied to aid the annotation of a number of proteins of unknown function. Findings We report a major upgrade to our CPASS software and database that significantly improves its broad utility. CPASS v.2 is designed with a layered architecture to increase flexibility and portability that also enables job distribution over the Open Science Grid (OSG to increase speed. Similarly, the CPASS interface was enhanced to provide more user flexibility in submitting a CPASS query. CPASS v.2 now allows for both automatic and manual definition of ligand-binding sites and permits pair-wise, one versus all, one versus list, or list versus list comparisons. Solvent accessible surface area, ligand root-mean square difference, and Cβ distances have been incorporated into the CPASS similarity function to improve the quality of the results. The CPASS database has also been updated. Conclusions CPASS v.2 is more than an order of magnitude faster than the original implementation, and allows for multiple simultaneous job submissions. Similarly, the CPASS database of ligand-defined binding sites has increased in size by ~ 38%, dramatically increasing the likelihood of a positive search result. The modification to the CPASS similarity function is effective in reducing CPASS similarity scores

  16. On a Probabilistic Approach to Determining the Similarity between Boolean Search Request Formulations.

    Radecki, Tadeusz

    1982-01-01

    Presents and discusses the results of research into similarity measures for search request formulations which employ Boolean combinations of index terms. The use of a weighting mechanism to indicate the importance of attributes in a search formulation is described. A 16-item reference list is included. (JL)

  17. Comparative study on Authenticated Sub Graph Similarity Search in Outsourced Graph Database

    N. D. Dhamale; Prof. S. R. Durugkar

    2015-01-01

    Today security is very important in the database system. Advanced database systems face a great challenge raised by the emergence of massive, complex structural data in bioinformatics, chem-informatics, and many other applications. Since exact matching is often too restrictive, similarity search of complex structures becomes a vital operation that must be supported efficiently. The Subgraph similarity search is used in graph databases to retrieve graphs whose subgraphs...

  18. Effects of Part-based Similarity on Visual Search: The Frankenbear Experiment

    Alexander, Robert G.; Zelinsky, Gregory J.

    2012-01-01

    Do the target-distractor and distractor-distractor similarity relationships known to exist for simple stimuli extend to real-world objects, and are these effects expressed in search guidance or target verification? Parts of photorealistic distractors were replaced with target parts to create four levels of target-distractor similarity under heterogenous and homogenous conditions. We found that increasing target-distractor similarity and decreasing distractor-distractor similarity impaired sea...

  19. A Theoretical Framework for Defining Similarity Measures for Boolean Search Request Formulations, Including Some Experimental Results.

    Radecki, Tadeusz

    1985-01-01

    Reports research results into a methodology for determining similarity between queries characterized by Boolean search request formulations and discusses similarity measures for Boolean combinations of index terms. Rationale behind these measures is outlined, and conditions ensuring their equivalence are identified. Results of an experiment…

  20. SAPIR - Executing complex similarity queries over multi layer P2P search structures

    Falchi, Fabrizio; Batko, Michal

    2009-01-01

    This deliverable reports the activities conducted within Task 5.4 "Executing complex similarity queries over multi layer P2P search structures" of the SAPIR project. In particular the deliverable discusses complex similarity queries issues and the implementation of the query processing over the P2P indexing. The document is accompanied by a zip file containing the javadoc for MUFIN.

  1. Development of an accurate 3D blood vessel searching system using NIR light

    Mizuno, Yoshifumi; Katayama, Tsutao; Nakamachi, Eiji

    2010-02-01

    Health monitoring system (HMS) and drug delivery system (DDS) require accurate puncture by needle for automatic blood sampling. In this study, we develop a miniature and high accurate automatic 3D blood vessel searching system. The size of detecting system is 40x25x10 mm. Our searching system use Near-Infrared (NIR) LEDs, CMOS camera modules and image processing units. We employ the stereo method for searching system to determine 3D blood vessel location. Blood vessel visualization system adopts hemoglobin's absorption characterization of NIR light. NIR LED is set behind the finger and it irradiates Near Infrared light for the finger. CMOS camera modules are set in front of the finger and it captures clear blood vessel images. Two dimensional location of the blood vessel is detected by luminance distribution of the image and its depth is calculated by the stereo method. 3D blood vessel location is automatically detected by our image processing system. To examine the accuracy of our detecting system, we carried out experiments using finger phantoms with blood vessel diameters, 0.5, 0.75, 1.0mm, at the depths, 0.5 ~ 2.0 mm, under the artificial tissue surface. Experimental results of depth obtained by our detecting system showed good agreements with given depths, and the availability of this system is confirmed.

  2. SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters

    Lefkowitz Elliot J

    2004-10-01

    Full Text Available Abstract Background Large-scale sequence comparison is a powerful tool for biological inference in modern molecular biology. Comparing new sequences to those in annotated databases is a useful source of functional and structural information about these sequences. Using software such as the basic local alignment search tool (BLAST or HMMPFAM to identify statistically significant matches between newly sequenced segments of genetic material and those in databases is an important task for most molecular biologists. Searching algorithms are intrinsically slow and data-intensive, especially in light of the rapid growth of biological sequence databases due to the emergence of high throughput DNA sequencing techniques. Thus, traditional bioinformatics tools are impractical on PCs and even on dedicated UNIX servers. To take advantage of larger databases and more reliable methods, high performance computation becomes necessary. Results We describe the implementation of SS-Wrapper (Similarity Search Wrapper, a package of wrapper applications that can parallelize similarity search applications on a Linux cluster. Our wrapper utilizes a query segmentation-search (QS-search approach to parallelize sequence database search applications. It takes into consideration load balancing between each node on the cluster to maximize resource usage. QS-search is designed to wrap many different search tools, such as BLAST and HMMPFAM using the same interface. This implementation does not alter the original program, so newly obtained programs and program updates should be accommodated easily. Benchmark experiments using QS-search to optimize BLAST and HMMPFAM showed that QS-search accelerated the performance of these programs almost linearly in proportion to the number of CPUs used. We have also implemented a wrapper that utilizes a database segmentation approach (DS-BLAST that provides a complementary solution for BLAST searches when the database is too large to fit into

  3. MEASURING THE PERFORMANCE OF SIMILARITY PROPAGATION IN AN SEMANTIC SEARCH ENGINE

    S. K. Jayanthi

    2013-10-01

    Full Text Available In the current scenario, web page result personalization is playing a vital role. Nearly 80 % of the users expect the best results in the first page itself without having any persistence to browse longer in URL mode. This research work focuses on two main themes: Semantic web search through online and Domain based search through offline. The first part is to find an effective method which allows grouping similar results together using BookShelf Data Structure and organizing the various clusters. The second one is focused on the academic domain based search through offline. This paper focuses on finding documents which are similar and how Vector space can be used to solve it. So more weightage is given for the principles and working methodology of similarity propagation. Cosine similarity measure is used for finding the relevancy among the documents.

  4. Comparative study on Authenticated Sub Graph Similarity Search in Outsourced Graph Database

    N. D. Dhamale

    2015-11-01

    Full Text Available Today security is very important in the database system. Advanced database systems face a great challenge raised by the emergence of massive, complex structural data in bioinformatics, chem-informatics, and many other applications. Since exact matching is often too restrictive, similarity search of complex structures becomes a vital operation that must be supported efficiently. The Subgraph similarity search is used in graph databases to retrieve graphs whose subgraphs are similar to a given query graph. It has been proven successful in a wide range of applications including bioinformatics and chem-informatics, etc. Due to the cost of providing efficient similarity search services on everincreasing graph data, database outsourcing is apparently an appealing solution to database owners. In this paper, we are studying on authentication techniques that follow the popular filtering-and-verification framework. An authentication-friendly metric index called GMTree. Specifically, we transform the similarity search into a search in a graph metric space and derive small verification objects (VOs to-be-transmitted to query clients. To further optimize GMTree, we are studying on a sampling-based pivot selection method and an authenticated version of MCS computation.

  5. Efficient Retrieval of Images for Search Engine by Visual Similarity and Re Ranking

    Viswa S S

    2013-06-01

    Full Text Available Nowadays, web scale image search engines (e.g. Google Image Search, Microsoft Live Image Search rely almost purely on surrounding text features. Users type keywords in hope of finding a certain type of images. The search engine returns thousands of images ranked by the text keywords extracted from the surrounding text. However, many of returned images are noisy, disorganized, or irrelevant. Even Google and Microsoft have no Visual Information for searching of images. Using visual information to re rank and improve text based image search results is the idea. This improves the precision of the text based image search ranking by incorporating the information conveyed by the visual modality. The typical assumption that the top- images in the text-based search result are equally relevant is relaxed by linking the relevance of the images to their initial rank positions. Then, a number of images from the initial search result are employed as the prototypes that serve to visually represent the query and that are subsequently used to construct meta re rankers .i.e. The most relevant images are found by visual similarity and the average scores are calculated. By applying different meta re rankers to an image from the initial result, re ranking scores are generated, which are then used to find the new rank position for an image in the re ranked search result. Human supervision is introduced to learn the model weights offline, prior to the online re ranking process. While model learning requires manual labelling of the results for a few queries, the resulting model is query independent and therefore applicable to any other query. The experimental results on a representative web image search dataset comprising 353 queries demonstrate that the proposed method outperforms the existing supervised and unsupervised Re ranking approaches. Moreover, it improves the performance over the text-based image search engine by more than 25.48%.

  6. Improving protein structure similarity searches using domain boundaries based on conserved sequence information

    Madej Tom; Wang Yanli; Thompson Kenneth; Bryant Stephen H

    2009-01-01

    Abstract Background The identification of protein domains plays an important role in protein structure comparison. Domain query size and composition are critical to structure similarity search algorithms such as the Vector Alignment Search Tool (VAST), the method employed for computing related protein structures in NCBI Entrez system. Currently, domains identified on the basis of structural compactness are used for VAST computations. In this study, we have investigated how alternative definit...

  7. Efficient and accurate nearest neighbor and closest pair search in high-dimensional space

    Tao, Yufei

    2010-07-01

    Nearest Neighbor (NN) search in high-dimensional space is an important problem in many applications. From the database perspective, a good solution needs to have two properties: (i) it can be easily incorporated in a relational database, and (ii) its query cost should increase sublinearly with the dataset size, regardless of the data and query distributions. Locality-Sensitive Hashing (LSH) is a well-known methodology fulfilling both requirements, but its current implementations either incur expensive space and query cost, or abandon its theoretical guarantee on the quality of query results. Motivated by this, we improve LSH by proposing an access method called the Locality-Sensitive B-tree (LSB-tree) to enable fast, accurate, high-dimensional NN search in relational databases. The combination of several LSB-trees forms a LSB-forest that has strong quality guarantees, but improves dramatically the efficiency of the previous LSH implementation having the same guarantees. In practice, the LSB-tree itself is also an effective index which consumes linear space, supports efficient updates, and provides accurate query results. In our experiments, the LSB-tree was faster than: (i) iDistance (a famous technique for exact NN search) by two orders ofmagnitude, and (ii) MedRank (a recent approximate method with nontrivial quality guarantees) by one order of magnitude, and meanwhile returned much better results. As a second step, we extend our LSB technique to solve another classic problem, called Closest Pair (CP) search, in high-dimensional space. The long-term challenge for this problem has been to achieve subquadratic running time at very high dimensionalities, which fails most of the existing solutions. We show that, using a LSB-forest, CP search can be accomplished in (worst-case) time significantly lower than the quadratic complexity, yet still ensuring very good quality. In practice, accurate answers can be found using just two LSB-trees, thus giving a substantial

  8. A comparison of field-based similarity searching methods: CatShape, FBSS, and ROCS.

    Moffat, Kirstin; Gillet, Valerie J; Whittle, Martin; Bravi, Gianpaolo; Leach, Andrew R

    2008-04-01

    Three field-based similarity methods are compared in retrospective virtual screening experiments. The methods are the CatShape module of CATALYST, ROCS, and an in-house program developed at the University of Sheffield called FBSS. The programs are used in both rigid and flexible searches carried out in the MDL Drug Data Report. UNITY 2D fingerprints are also used to provide a comparison with a more traditional approach to similarity searching, and similarity based on simple whole-molecule properties is used to provide a baseline for the more sophisticated searches. Overall, UNITY 2D fingerprints and ROCS with the chemical force field option gave comparable performance and were superior to the shape-only 3D methods. When the flexible methods were compared with the rigid methods, it was generally found that the flexible methods gave slightly better results than their respective rigid methods; however, the increased performance did not justify the additional computational cost required. PMID:18351728

  9. Ligand scaffold hopping combining 3D maximal substructure search and molecular similarity

    Petitjean Michel

    2009-08-01

    Full Text Available Abstract Background Virtual screening methods are now well established as effective to identify hit and lead candidates and are fully integrated in most drug discovery programs. Ligand-based approaches make use of physico-chemical, structural and energetics properties of known active compounds to search large chemical libraries for related and novel chemotypes. While 2D-similarity search tools are known to be fast and efficient, the use of 3D-similarity search methods can be very valuable to many research projects as integration of "3D knowledge" can facilitate the identification of not only related molecules but also of chemicals possessing distant scaffolds as compared to the query and therefore be more inclined to scaffolds hopping. To date, very few methods performing this task are easily available to the scientific community. Results We introduce a new approach (LigCSRre to the 3D ligand similarity search of drug candidates. It combines a 3D maximum common substructure search algorithm independent on atom order with a tunable description of atomic compatibilities to prune the search and increase its physico-chemical relevance. We show, on 47 experimentally validated active compounds across five protein targets having different specificities, that for single compound search, the approach is able to recover on average 52% of the co-actives in the top 1% of the ranked list which is better than gold standards of the field. Moreover, the combination of several runs on a single protein target using different query active compounds shows a remarkable improvement in enrichment. Such Results demonstrate LigCSRre as a valuable tool for ligand-based screening. Conclusion LigCSRre constitutes a new efficient and generic approach to the 3D similarity screening of small compounds, whose flexible design opens the door to many enhancements. The program is freely available to the academics for non-profit research at: http://bioserv.rpbs.univ-paris-diderot.fr/LigCSRre.html.

  10. Similarity-based search of model organism, disease and drug effect phenotypes

    Hoehndorf, Robert

    2015-02-19

    Background: Semantic similarity measures over phenotype ontologies have been demonstrated to provide a powerful approach for the analysis of model organism phenotypes, the discovery of animal models of human disease, novel pathways, gene functions, druggable therapeutic targets, and determination of pathogenicity. Results: We have developed PhenomeNET 2, a system that enables similarity-based searches over a large repository of phenotypes in real-time. It can be used to identify strains of model organisms that are phenotypically similar to human patients, diseases that are phenotypically similar to model organism phenotypes, or drug effect profiles that are similar to the phenotypes observed in a patient or model organism. PhenomeNET 2 is available at http://aber-owl.net/phenomenet. Conclusions: Phenotype-similarity searches can provide a powerful tool for the discovery and investigation of molecular mechanisms underlying an observed phenotypic manifestation. PhenomeNET 2 facilitates user-defined similarity searches and allows researchers to analyze their data within a large repository of human, mouse and rat phenotypes.

  11. Twin Similarities in Holland Types as Shown by Scores on the Self-Directed Search

    Chauvin, Ida; McDaniel, Janelle R.; Miller, Mark J.; King, James M.; Eddlemon, Ondie L. M.

    2012-01-01

    This study examined the degree of similarity between scores on the Self-Directed Search from one set of identical twins. Predictably, a high congruence score was found. Results from a biographical sheet are discussed as well as implications of the results for career counselors.

  12. Accurate corresponding point search using sphere-attribute-image for statistical bone model generation

    Statistical deformable model based two-dimensional/three-dimensional (2-D/3-D) registration is a promising method for estimating the position and shape of patient bone in the surgical space. Since its accuracy depends on the statistical model capacity, we propose a method for accurately generating a statistical bone model from a CT volume. Our method employs the Sphere-Attribute-Image (SAI) and has improved the accuracy of corresponding point search in statistical model generation. At first, target bone surfaces are extracted as SAIs from the CT volume. Then the textures of SAIs are classified to some regions using Maximally-stable-extremal-regions methods. Next, corresponding regions are determined using Normalized cross-correlation (NCC). Finally, corresponding points in each corresponding region are determined using NCC. The application of our method to femur bone models was performed, and worked well in the experiments. (author)

  13. Similarity and heterogeneity effects in visual search are mediated by "segmentability".

    Utochkin, Igor S; Yurevich, Maria A

    2016-07-01

    The heterogeneity of our visual environment typically reduces the speed with which a singleton target can be found. Visual search theories explain this phenomenon via nontarget similarities and dissimilarities that affect grouping, perceptual noise, and so forth. In this study, we show that increasing the heterogeneity of a display can facilitate rather than inhibit visual search for size and orientation singletons when heterogeneous features smoothly fill the transition between highly distinguishable nontargets. We suggest that this smooth transition reduces the "segmentability" of dissimilar items to otherwise separate subsets, causing the visual system to treat them as a near-homogenous set standing apart from a singleton. (PsycINFO Database Record PMID:26784002

  14. Manifold Learning for Multivariate Variable-Length Sequences With an Application to Similarity Search.

    Ho, Shen-Shyang; Dai, Peng; Rudzicz, Frank

    2016-06-01

    Multivariate variable-length sequence data are becoming ubiquitous with the technological advancement in mobile devices and sensor networks. Such data are difficult to compare, visualize, and analyze due to the nonmetric nature of data sequence similarity measures. In this paper, we propose a general manifold learning framework for arbitrary-length multivariate data sequences driven by similarity/distance (parameter) learning in both the original data sequence space and the learned manifold. Our proposed algorithm transforms the data sequences in a nonmetric data sequence space into feature vectors in a manifold that preserves the data sequence space structure. In particular, the feature vectors in the manifold representing similar data sequences remain close to one another and far from the feature points corresponding to dissimilar data sequences. To achieve this objective, we assume a semisupervised setting where we have knowledge about whether some of data sequences are similar or dissimilar, called the instance-level constraints. Using this information, one learns the similarity measure for the data sequence space and the distance measures for the manifold. Moreover, we describe an approach to handle the similarity search problem given user-defined instance level constraints in the learned manifold using a consensus voting scheme. Experimental results on both synthetic data and real tropical cyclone sequence data are presented to demonstrate the feasibility of our manifold learning framework and the robustness of performing similarity search in the learned manifold. PMID:25781959

  15. SW#db: GPU-Accelerated Exact Sequence Similarity Database Search.

    Matija Korpar

    Full Text Available In recent years we have witnessed a growth in sequencing yield, the number of samples sequenced, and as a result-the growth of publicly maintained sequence databases. The increase of data present all around has put high requirements on protein similarity search algorithms with two ever-opposite goals: how to keep the running times acceptable while maintaining a high-enough level of sensitivity. The most time consuming step of similarity search are the local alignments between query and database sequences. This step is usually performed using exact local alignment algorithms such as Smith-Waterman. Due to its quadratic time complexity, alignments of a query to the whole database are usually too slow. Therefore, the majority of the protein similarity search methods prior to doing the exact local alignment apply heuristics to reduce the number of possible candidate sequences in the database. However, there is still a need for the alignment of a query sequence to a reduced database. In this paper we present the SW#db tool and a library for fast exact similarity search. Although its running times, as a standalone tool, are comparable to the running times of BLAST, it is primarily intended to be used for exact local alignment phase in which the database of sequences has already been reduced. It uses both GPU and CPU parallelization and was 4-5 times faster than SSEARCH, 6-25 times faster than CUDASW++ and more than 20 times faster than SSW at the time of writing, using multiple queries on Swiss-prot and Uniref90 databases.

  16. Wikipedia Chemical Structure Explorer: substructure and similarity searching of molecules from Wikipedia

    Ertl, Peter; Patiny, Luc; Sander, Thomas; Rufener, Christian; Zasso, Michaël

    2015-01-01

    Background Wikipedia, the world’s largest and most popular encyclopedia is an indispensable source of chemistry information. It contains among others also entries for over 15,000 chemicals including metabolites, drugs, agrochemicals and industrial chemicals. To provide an easy access to this wealth of information we decided to develop a substructure and similarity search tool for chemical structures referenced in Wikipedia. Results We extracted chemical structures from entries in Wikipedia an...

  17. Target enhanced 2D similarity search by using explicit biological activity annotations and profiles

    Yu, Xiang; Geer, Lewis Y.; Han, Lianyi; Bryant, Stephen H

    2015-01-01

    Background The enriched biological activity information of compounds in large and freely-accessible chemical databases like the PubChem Bioassay Database has become a powerful research resource for the scientific research community. Currently, 2D fingerprint based conventional similarity search (CSS) is the most common widely used approach for database screening, but it does not typically incorporate the relative importance of fingerprint bits to biological activity. Results In this study, a ...

  18. Protein similarity search with subset seeds on a dedicated reconfigurable hardware

    Peterlongo, Pierre; Noé, Laurent; Lavenier, Dominique; Georges, Gilles; Jacques, Julien; Kucherov, Gregory; Giraud, Mathieu

    2007-01-01

    Genome sequencing of numerous species raises the need of complete genome comparison with precise and fast similarity searches. Today, advanced seed-based techniques (spaced seeds, multiple seeds, subset seeds) provide better sensitivity/specificity ratios. We present an implementation of such a seed-based technique onto parallel specialized hardware embedding reconfigurable architecture (FPGA), where the FPGA is tightly connected to large capacity Flash memories. This parallel system allows l...

  19. Database searching for compounds with similar biological activity using short binary bit string representations of molecules.

    Xue, L; Godden, J W; Bajorath, J

    1999-01-01

    In an effort to identify biologically active molecules in compound databases, we have investigated similarity searching using short binary bit strings with a maximum of 54 bit positions. These "minifingerprints" (MFPs) were designed to account for the presence or absence of structural fragments and/or aromatic character, flexibility, and hydrogen-bonding capacity of molecules. MFP design was based on an analysis of distributions of molecular descriptors and structural fragments in two large compound collections. The performance of different MFPs and a reference fingerprint was tested by systematic "one-against-all" similarity searches of molecules in a database containing 364 compounds with different biological activities. For each fingerprint, the most effective similarity cutoff value was determined. An MFP accounting for only 32 structural fragments showed less than 2% false positive similarity matches and correctly assigned on average approximately 40% of the compounds with the same biological activity to a query molecule. Inclusion of three numerical two-dimensional (2D) molecular descriptors increased the performance by 15%. This MFP performed better than a complex 2D fingerprint. At a similarity cutoff value of 0.85, the 2D fingerprint totally eliminated false positives but recognized less than 10% of the compounds within the same activity class. PMID:10529986

  20. WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTERN RETRIEVAL ALGORITHM

    Pushpa C N

    2013-02-01

    Full Text Available Semantic Similarity measures plays an important role in information retrieval, natural language processing and various tasks on web such as relation extraction, community mining, document clustering, and automatic meta-data extraction. In this paper, we have proposed a Pattern Retrieval Algorithm [PRA] to compute the semantic similarity measure between the words by combining both page count method and web snippets method. Four association measures are used to find semantic similarity between words in page count method using web search engines. We use a Sequential Minimal Optimization (SMO support vector machines (SVM to find the optimal combination of page counts-based similarity scores and top-ranking patterns from the web snippets method. The SVM is trained to classify synonymous word-pairs and nonsynonymous word-pairs. The proposed approach aims to improve the Correlation values, Precision, Recall, and F-measures, compared to the existing methods. The proposed algorithm outperforms by 89.8 % of correlation value.

  1. Semantic similarity measures in the biomedical domain by leveraging a web search engine.

    Hsieh, Sheau-Ling; Chang, Wen-Yung; Chen, Chi-Huang; Weng, Yung-Ching

    2013-07-01

    Various researches in web related semantic similarity measures have been deployed. However, measuring semantic similarity between two terms remains a challenging task. The traditional ontology-based methodologies have a limitation that both concepts must be resided in the same ontology tree(s). Unfortunately, in practice, the assumption is not always applicable. On the other hand, if the corpus is sufficiently adequate, the corpus-based methodologies can overcome the limitation. Now, the web is a continuous and enormous growth corpus. Therefore, a method of estimating semantic similarity is proposed via exploiting the page counts of two biomedical concepts returned by Google AJAX web search engine. The features are extracted as the co-occurrence patterns of two given terms P and Q, by querying P, Q, as well as P AND Q, and the web search hit counts of the defined lexico-syntactic patterns. These similarity scores of different patterns are evaluated, by adapting support vector machines for classification, to leverage the robustness of semantic similarity measures. Experimental results validating against two datasets: dataset 1 provided by A. Hliaoutakis; dataset 2 provided by T. Pedersen, are presented and discussed. In dataset 1, the proposed approach achieves the best correlation coefficient (0.802) under SNOMED-CT. In dataset 2, the proposed method obtains the best correlation coefficient (SNOMED-CT: 0.705; MeSH: 0.723) with physician scores comparing with measures of other methods. However, the correlation coefficients (SNOMED-CT: 0.496; MeSH: 0.539) with coder scores received opposite outcomes. In conclusion, the semantic similarity findings of the proposed method are close to those of physicians' ratings. Furthermore, the study provides a cornerstone investigation for extracting fully relevant information from digitizing, free-text medical records in the National Taiwan University Hospital database. PMID:25055314

  2. Gene network homology in prokaryotes using a similarity search approach: queries of quorum sensing signal transduction.

    David N Quan

    Full Text Available Bacterial cell-cell communication is mediated by small signaling molecules known as autoinducers. Importantly, autoinducer-2 (AI-2 is synthesized via the enzyme LuxS in over 80 species, some of which mediate their pathogenicity by recognizing and transducing this signal in a cell density dependent manner. AI-2 mediated phenotypes are not well understood however, as the means for signal transduction appears varied among species, while AI-2 synthesis processes appear conserved. Approaches to reveal the recognition pathways of AI-2 will shed light on pathogenicity as we believe recognition of the signal is likely as important, if not more, than the signal synthesis. LMNAST (Local Modular Network Alignment Similarity Tool uses a local similarity search heuristic to study gene order, generating homology hits for the genomic arrangement of a query gene sequence. We develop and apply this tool for the E. coli lac and LuxS regulated (Lsr systems. Lsr is of great interest as it mediates AI-2 uptake and processing. Both test searches generated results that were subsequently analyzed through a number of different lenses, each with its own level of granularity, from a binary phylogenetic representation down to trackback plots that preserve genomic organizational information. Through a survey of these results, we demonstrate the identification of orthologs, paralogs, hitchhiking genes, gene loss, gene rearrangement within an operon context, and also horizontal gene transfer (HGT. We found a variety of operon structures that are consistent with our hypothesis that the signal can be perceived and transduced by homologous protein complexes, while their regulation may be key to defining subsequent phenotypic behavior.

  3. Efficient Retrieval of Images for Search Engine by Visual Similarity and Re Ranking

    Viswa S S

    2013-01-01

    Nowadays, web scale image search engines (e.g. Google Image Search, Microsoft Live Image Search) rely almost purely on surrounding text features. Users type keywords in hope of finding a certain type of images. The search engine returns thousands of images ranked by the text keywords extracted from the surrounding text. However, many of returned images are noisy, disorganized, or irrelevant. Even Google and Microsoft have no Visual Information for searching of images. Using visual information...

  4. PHOG-BLAST – a new generation tool for fast similarity search of protein families

    Mironov Andrey A

    2006-06-01

    Full Text Available Abstract Background The need to compare protein profiles frequently arises in various protein research areas: comparison of protein families, domain searches, resolution of orthology and paralogy. The existing fast algorithms can only compare a protein sequence with a protein sequence and a profile with a sequence. Algorithms to compare profiles use dynamic programming and complex scoring functions. Results We developed a new algorithm called PHOG-BLAST for fast similarity search of profiles. This algorithm uses profile discretization to convert a profile to a finite alphabet and utilizes hashing for fast search. To determine the optimal alphabet, we analyzed columns in reliable multiple alignments and obtained column clusters in the 20-dimensional profile space by applying a special clustering procedure. We show that the clustering procedure works best if its parameters are chosen so that 20 profile clusters are obtained which can be interpreted as ancestral amino acid residues. With these clusters, only less than 2% of columns in multiple alignments are out of clusters. We tested the performance of PHOG-BLAST vs. PSI-BLAST on three well-known databases of multiple alignments: COG, PFAM and BALIBASE. On the COG database both algorithms showed the same performance, on PFAM and BALIBASE PHOG-BLAST was much superior to PSI-BLAST. PHOG-BLAST required 10–20 times less computer memory and computation time than PSI-BLAST. Conclusion Since PHOG-BLAST can compare multiple alignments of protein families, it can be used in different areas of comparative proteomics and protein evolution. For example, PHOG-BLAST helped to build the PHOG database of phylogenetic orthologous groups. An essential step in building this database was comparing protein complements of different species and orthologous groups of different taxons on a personal computer in reasonable time. When it is applied to detect weak similarity between protein families, PHOG-BLAST is less

  5. HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.

    O'Driscoll, Aisling; Belogrudov, Vladislav; Carroll, John; Kropp, Kai; Walsh, Paul; Ghazal, Peter; Sleator, Roy D

    2015-04-01

    The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples. PMID:25625550

  6. Breast cancer stories on the internet: improving search facilities to help patients find stories of similar others

    Overberg, Regina Ingrid

    2013-01-01

    The primary aim of this thesis is to gain insight into which search facilities for spontaneously published stories facilitate breast cancer patients in finding stories by other patients in a similar situation. According to the narrative approach, social comparison theory, and social cognitive theory, reading stories about similar others may have the most positive impact. The research followed a user-centred design: users of search facilities (i.e., patients who want to read stories written by...

  7. Identification of protein biochemical functions by similarity search using the molecular surface database eF-site

    Kinoshita, Kengo; Nakamura, Haruki

    2003-01-01

    The identification of protein biochemical functions based on their three-dimensional structures is strongly required in the post-genome-sequencing era. We have developed a new method to identify and predict protein biochemical functions using the similarity information of molecular surface geometries and electrostatic potentials on the surfaces. Our prediction system consists of a similarity search method based on a clique search algorithm and the molecular surface database eF-site (electrost...

  8. PSimScan: algorithm and utility for fast protein similarity search.

    Anna Kaznadzey

    Full Text Available In the era of metagenomics and diagnostics sequencing, the importance of protein comparison methods of boosted performance cannot be overstated. Here we present PSimScan (Protein Similarity Scanner, a flexible open source protein similarity search tool which provides a significant gain in speed compared to BLASTP at the price of controlled sensitivity loss. The PSimScan algorithm introduces a number of novel performance optimization methods that can be further used by the community to improve the speed and lower hardware requirements of bioinformatics software. The optimization starts at the lookup table construction, then the initial lookup table-based hits are passed through a pipeline of filtering and aggregation routines of increasing computational complexity. The first step in this pipeline is a novel algorithm that builds and selects 'similarity zones' aggregated from neighboring matches on small arrays of adjacent diagonals. PSimScan performs 5 to 100 times faster than the standard NCBI BLASTP, depending on chosen parameters, and runs on commodity hardware. Its sensitivity and selectivity at the slowest settings are comparable to the NCBI BLASTP's and decrease with the increase of speed, yet stay at the levels reasonable for many tasks. PSimScan is most advantageous when used on large collections of query sequences. Comparing the entire proteome of Streptocuccus pneumoniae (2,042 proteins to the NCBI's non-redundant protein database of 16,971,855 records takes 6.5 hours on a moderately powerful PC, while the same task with the NCBI BLASTP takes over 66 hours. We describe innovations in the PSimScan algorithm in considerable detail to encourage bioinformaticians to improve on the tool and to use the innovations in their own software development.

  9. The Cost of Search for Multiple Targets: Effects of Practice and Target Similarity

    Menneer, Tamaryn; Cave, Kyle R.; Donnelly, Nick

    2009-01-01

    With the use of X-ray images, performance in the simultaneous search for two target categories was compared with performance in two independent searches, one for each category. In all cases, displays contained one target at most. Dual-target search, for both categories simultaneously, produced a cost in accuracy, although the magnitude of this…

  10. Early Visual Tagging: Effects of Target-Distractor Similarity and Old Age on Search, Subitization, and Counting

    Watson, Derrick G.; Maylor, Elizabeth A.; Allen, Gareth E. J.; Bruce, Lucy A. M.

    2007-01-01

    Three experiments examined the effects of target-distractor (T-D) similarity and old age on the efficiency of searching for single targets and enumerating multiple targets. Experiment 1 showed that increasing T-D similarity selectively reduced the efficiency of enumerating small (less than 4) numerosities (subitizing) but had little effect on…

  11. Application of 3D Zernike descriptors to shape-based ligand similarity searching

    Venkatraman Vishwesh

    2009-12-01

    Full Text Available Abstract Background The identification of promising drug leads from a large database of compounds is an important step in the preliminary stages of drug design. Although shape is known to play a key role in the molecular recognition process, its application to virtual screening poses significant hurdles both in terms of the encoding scheme and speed. Results In this study, we have examined the efficacy of the alignment independent three-dimensional Zernike descriptor (3DZD for fast shape based similarity searching. Performance of this approach was compared with several other methods including the statistical moments based ultrafast shape recognition scheme (USR and SIMCOMP, a graph matching algorithm that compares atom environments. Three benchmark datasets are used to thoroughly test the methods in terms of their ability for molecular classification, retrieval rate, and performance under the situation that simulates actual virtual screening tasks over a large pharmaceutical database. The 3DZD performed better than or comparable to the other methods examined, depending on the datasets and evaluation metrics used. Reasons for the success and the failure of the shape based methods for specific cases are investigated. Based on the results for the three datasets, general conclusions are drawn with regard to their efficiency and applicability. Conclusion The 3DZD has unique ability for fast comparison of three-dimensional shape of compounds. Examples analyzed illustrate the advantages and the room for improvements for the 3DZD.

  12. Efficient Retrieval of Images for Search Engine by Visual Similarity and Re Ranking

    Viswa S S

    2013-06-01

    Full Text Available Nowadays, web scale image search engines (e.g.Google Image Search, Microsoft Live ImageSearch rely almost purely on surrounding textfeatures. Users type keywords in hope of finding acertain type of images. The search engine returnsthousands of images ranked by the text keywordsextracted from the surrounding text. However,many of returned images are noisy, disorganized, orirrelevant. Even Google and Microsoft have noVisual Information for searching of images. Usingvisual information to re rank and improve textbased image search results is the idea. Thisimproves the precision of the text based imagesearch ranking by incorporating the informationconveyed by the visual modality.The typicalassumption that the top-images in the text-basedsearch result are equally relevant is relaxed bylinking the relevance of the images to their initialrank positions. Then, a number of images from theinitial search result are employed as the prototypesthat serve to visually represent the query and thatare subsequently used to construct meta re rankers.i.e. The most relevant images are found by visualsimilarity and the average scores are calculated. Byapplying different meta re rankers to an image fromthe initial result, re ranking scores are generated,which are then used to find the new rank positionfor an image in the re ranked search result.Humansupervision is introduced to learn the model weightsoffline, prior to the online re ranking process. Whilemodel learning requires manual labelling of theresults for a few queries, the resulting model isquery independent and therefore applicable to anyother query. The experimental results on arepresentative web image search dataset comprising353 queries demonstrate that the proposed methodoutperforms the existing supervised andunsupervised Re ranking approaches. Moreover, itimproves the performance over the text-based imagesearch engine by morethan 25.48%

  13. RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data

    Zhao, Yongan; Tang, Haixu; Ye, Yuzhen

    2011-01-01

    Summary: With the wide application of next-generation sequencing (NGS) techniques, fast tools for protein similarity search that scale well to large query datasets and large databases are highly desirable. In a previous work, we developed RAPSearch, an algorithm that achieved a ~20–90-fold speedup relative to BLAST while still achieving similar levels of sensitivity for short protein fragments derived from NGS data. RAPSearch, however, requires a substantial memory footprint to identify align...

  14. Accurate Image Search using Local Descriptors into a Compact Image Representation

    Soumia Benkrama

    2013-01-01

    Full Text Available Progress in image retrieval by using low-level features, such as colors, textures and shapes, the performance is still unsatisfied as there are existing gaps between low-level features and high-level semantic concepts. In this work, we present an improved implementation for the bag of visual words approach. We propose a image retrieval system based on bag-of-features (BoF model by using scale invariant feature transform (SIFT and speeded up robust features (SURF. In literature SIFT and SURF give of good results. Based on this observation, we decide to use a bag-of-features approach over quaternion zernike moments (QZM. We compare the results of SIFT and SURF with those of QZM. We propose an indexing method for content based search task that aims to retrieve collection of images and returns a ranked list of objects in response to a query image. Experimental results with the Coil-100 and corel-1000 image database, demonstrate that QZM produces a better performance than known representations (SIFT and SURF.

  15. In Search of an Accurate Evaluation of Intrahepatic Cholestasis of Pregnancy

    Manuela Martinefski

    2012-01-01

    Full Text Available Until now, biochemical parameter for diagnosis of intrahepatic cholestasis of pregnancy (ICP mostly used is the rise of total serum bile acids (TSBA above the upper normal limit of 11 μM. However, differential diagnosis is very difficult since overlapped values calculated on bile acids determinations, are observed in different conditions of pregnancy including the benign condition of pruritus gravidarum. The aim of this work was to determine the better markers in ICP for a precise diagnosis together with parameters associated with severity of symptoms and treatment evaluation. Serum bile acid profiles were evaluated using capillary electrophoresis in 38 healthy pregnant women and 32 ICP patients and it was calculated the sensitivity, specificity, accuracy, predictive values and the relationships of certain individual bile acids in pregnant women in order to replace TSBA determinations. The evaluation of the results shows that LCA and UDCA/LCA ratio provided information for a more complete and accurate diagnosis and evaluation of ICP than calculation of solely TSBA levels in pregnant women.

  16. Breast cancer stories on the internet : improving search facilities to help patients find stories of similar others

    Overberg, Regina Ingrid

    2013-01-01

    The primary aim of this thesis is to gain insight into which search facilities for spontaneously published stories facilitate breast cancer patients in finding stories by other patients in a similar situation. According to the narrative approach, social comparison theory, and social cognitive theory

  17. BlastMultAl, a Blast Extension for Similarity Searching with Alignment Graphs

    Nicodème, Pierre

    1996-01-01

    We describe a new method of processing similarity queries of a proteic multiple alignment with a set (database) of protein sequences, or similarity queries of a protein sequence with a set of protein alignments. We use a representation of multiple alignments as alignment-graphs. Comparisons with different classical methods is made. This new method allows the detection of subtle similarities which are not found by the other methods. It has direct applications for similarities querying with the...

  18. SHOP: receptor-based scaffold hopping by GRID-based similarity searches

    Bergmann, Rikke; Liljefors, Tommy; Sørensen, Morten D;

    2009-01-01

    find known active CDK2 scaffolds in a database. Additionally, SHOP was used for suggesting new inhibitors of p38 MAP kinase. Four p38 complexes were used to perform six scaffold searches. Several new scaffolds were suggested, and the resulting compounds were successfully docked into the query proteins....

  19. Finding and Reusing Learning Materials with Multimedia Similarity Search and Social Networks

    Little, Suzanne; Ferguson, Rebecca; Ruger, Stefan

    2012-01-01

    The authors describe how content-based multimedia search technologies can be used to help learners find new materials and learning pathways by identifying semantic relationships between educational resources in a social learning network. This helps users--both learners and educators--to explore and find material to support their learning aims.…

  20. FSim: A Novel Functional Similarity Search Algorithm and Tool for Discovering Functionally Related Gene Products

    2014-01-01

    Background. During the analysis of genomics data, it is often required to quantify the functional similarity of genes and their products based on the annotation information from gene ontology (GO) with hierarchical structure. A flexible and user-friendly way to estimate the functional similarity of genes utilizing GO annotation is therefore highly desired. Results. We proposed a novel algorithm using a level coefficient-weighted model to measure the functional similarity of gene products base...

  1. Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction

    Wichterich, Marc; Assent, Ira; Philipp, Kranen;

    2008-01-01

    The Earth Mover's Distance (EMD) was developed in computer vision as a flexible similarity model that utilizes similarities in feature space to define a high quality similarity measure in feature representation space. It has been successfully adopted in a multitude of applications with low to...... dimensionality reduction techniques for the EMD in a filter-and-refine architecture for efficient lossless retrieval. Thorough experimental evaluation on real world data sets demonstrates a substantial reduction of the number of expensive high-dimensional EMD computations and thus remarkably faster response...

  2. A Commodity Information Search Model of E-Commerce Search Engine Based on Semantic Similarity and Multi-Attribute Decision Method

    Ziming Zeng

    2010-01-01

    The paper presented an intelligent commodity information search model, which integrates semantic retrieval andmulti-attribute decision method. First, semantic similarity is computed by constructing semantic vector-space, inorder to realize the semantic consistency between retrieved result and customer’s query. Besides, TOPSISmethod is also utilized to construct the comparison mechanism of commodity by calculating the utility value ofeach retrieved commodity. Finally, the experiment is conduct...

  3. Proposal for a Similar Question Search System on a Q&A Site

    Katsutoshi Kanamori

    2014-06-01

    Full Text Available There is a service to help Internet users obtain answers to specific questions when they visit a Q&A site. A Q&A site is very useful for the Internet user, but posted questions are often not answered immediately. This delay in answering occurs because in most cases another site user is answering the question manually. In this study, we propose a system that can present a question that is similar to a question posted by a user. An advantage of this system is that a user can refer to an answer to a similar question. This research measures the similarity of a candidate question based on word and dependency parsing. In an experiment, we examined the effectiveness of the proposed system for questions actually posted on the Q&A site. The result indicates that the system can show the questioner the answer to a similar question. However, the system still has a number of aspects that should be improved.

  4. Managing Biomedical Image Metadata for Search and Retrieval of Similar Images

    Korenblum, Daniel; Rubin, Daniel; Napel, Sandy; Cesar RODRIGUEZ; Beaulieu, Chris

    2010-01-01

    Radiology images are generally disconnected from the metadata describing their contents, such as imaging observations (“semantic” metadata), which are usually described in text reports that are not directly linked to the images. We developed a system, the Biomedical Image Metadata Manager (BIMM) to (1) address the problem of managing biomedical image metadata and (2) facilitate the retrieval of similar images using semantic feature metadata. Our approach allows radiologists, researchers, and ...

  5. Integrating structure- and ligand-based virtual screening: comparison of individual, parallel, and fused molecular docking and similarity search calculations on multiple targets.

    Tan, Lu; Geppert, Hanna; Sisay, Mihiret T; Gütschow, Michael; Bajorath, Jürgen

    2008-10-01

    Similarity searching is often used to preselect compounds for docking, thereby decreasing the size of screening databases. However, integrated structure- and ligand-based screening schemes are rare at present. Docking and similarity search calculations using 2D fingerprints were carried out in a comparative manner on nine target enzymes, for which significant numbers of diverse inhibitors could be obtained. In the absence of knowledge-based docking constraints and target-directed parameter optimisation, fingerprint searching displayed a clear preference over docking calculations. Alternative combinations of docking and similarity search results were investigated and found to further increase compound recall of individual methods in a number of instances. When the results of similarity searching and docking were combined, parallel selection of candidate compounds from individual rankings was generally superior to rank fusion. We suggest that complementary results from docking and similarity searching can be captured by integrated compound selection schemes. PMID:18651695

  6. Web Similarity

    Cohen, Andrew; Vitányi, Paul

    2015-01-01

    Normalized web distance (NWD) is a similarity or normalized semantic distance based on the World Wide Web or any other large electronic database, for instance Wikipedia, and a search engine that returns reliable aggregate page counts. For sets of search terms the NWD gives a similarity on a scale from 0 (identical) to 1 (completely different). The NWD approximates the similarity according to all (upper semi)computable properties. We develop the theory and give applications. The derivation of ...

  7. SimSearch : a new variant of dynamic programming based on distance series for optimal and near-optimal similarity discovery in biological sequences

    Sérgio DEUSDADO; Carvalho, Paulo

    2009-01-01

    In this paper, we propose SimSearch, an algorithm implementing a new variant of dynamic programming based on distance series for optimal and near-optimal similarity discovery in biological sequences. The initial phase of SimSearch is devoted to fulfil the binary similarity matrices by signalling the distances between occurrences of the same symbol. The scoring scheme is further applied, when analysed the maximal extension of the pattern. Employing bit parallelism to analyse the global similar...

  8. Improving performance of content-based image retrieval schemes in searching for similar breast mass regions: an assessment

    This study aims to assess three methods commonly used in content-based image retrieval (CBIR) schemes and investigate the approaches to improve scheme performance. A reference database involving 3000 regions of interest (ROIs) was established. Among them, 400 ROIs were randomly selected to form a testing dataset. Three methods, namely mutual information, Pearson's correlation and a multi-feature-based k-nearest neighbor (KNN) algorithm, were applied to search for the 15 'the most similar' reference ROIs to each testing ROI. The clinical relevance and visual similarity of searching results were evaluated using the areas under receiver operating characteristic (ROC) curves (AZ) and average mean square difference (MSD) of the mass boundary spiculation level ratings between testing and selected ROIs, respectively. The results showed that the AZ values were 0.893 ± 0.009, 0.606 ± 0.021 and 0.699 ± 0.026 for the use of KNN, mutual information and Pearson's correlation, respectively. The AZ values increased to 0.724 ± 0.017 and 0.787 ± 0.016 for mutual information and Pearson's correlation when using ROIs with the size adaptively adjusted based on actual mass size. The corresponding MSD values were 2.107 ± 0.718, 2.301 ± 0.733 and 2.298 ± 0.743. The study demonstrates that due to the diversity of medical images, CBIR schemes using multiple image features and mass size-based ROIs can achieve significantly improved performance.

  9. Developing Molecular Interaction Database and Searching for Similar Pathways (MOLECULAR BIOLOGY AND INFORMATION-Biological Information Science)

    Kawashima, Shuichi; Katayama, Toshiaki; Kanehisa, Minoru

    1998-01-01

    We have developed a database named BRITE, which contains knowledge of interacting molecules and/or genes concering cell cycle and early development. Here, we report an overview of the database and the method of automatic search for functionally common sub-pathways between two biological pathways in BRITE.

  10. Topology-based document similarity search algorithm%一种基于文档拓扑的相似性搜索算法

    杨艳; 朱戈; 范文彬

    2011-01-01

    Searching for similar documents from the large number of documents quickly and efficiently is an important and time-consuming problem.The existing algorithms first find the candidate document set, and then sort them based on a document related evaluation to identify the most relevant ones.A topology-based document similarity search algorithm--Hub-Nis put forward, and the document similarity search problem is transformed into graph search problem, applying the pruning techniques, reducing the scope of scanned documents, and significantly improving retrieval efficiency.lt proves to be effective and feasible through experiment.%从海量文档中快速有效地搜索到相似文档是一个重要且耗时的问题.现有的文档相似性搜索算法是先找出候选文档集,再对候选文档进行相关性排序,找出最相关的文档.提出了一种基于文档拓扑的相似性搜索算法-Hub-N,将文档相似性搜索问题转化为图搜索问题,应用相应的剪枝技术,缩小了扫描文档的范围,提高了搜索效率.通过实验验证了算法的有效性和可行性.

  11. Novel DOCK clique driven 3D similarity database search tools for molecule shape matching and beyond: adding flexibility to the search for ligand kin.

    Good, Andrew C

    2007-10-01

    With readily available CPU power and copious disk storage, it is now possible to undertake rapid comparison of 3D properties derived from explicit ligand overlay experiments. With this in mind, shape software tools originally devised in the 1990s are revisited, modified and applied to the problem of ligand database shape comparison. The utility of Connolly surface data is highlighted using the program MAKESITE, which leverages surface normal data to a create ligand shape cast. This cast is applied directly within DOCK, allowing the program to be used unmodified as a shape searching tool. In addition, DOCK has undergone multiple modifications to create a dedicated ligand shape comparison tool KIN. Scoring has been altered to incorporate the original incarnation of Gaussian function derived shape description based on STO-3G atomic electron density. In addition, a tabu-like search refinement has been added to increase search speed by removing redundant starting orientations produced during clique matching. The ability to use exclusion regions, again based on Gaussian shape overlap, has also been integrated into the scoring function. The use of both DOCK with MAKESITE and KIN in database screening mode is illustrated using a published ligand shape virtual screening template. The advantages of using a clique-driven search paradigm are highlighted, including shape optimization within a pharmacophore constrained framework, and easy incorporation of additional scoring function modifications. The potential for further development of such methods is also discussed. PMID:17482856

  12. PHASE-RESOLVED INFRARED SPECTROSCOPY AND PHOTOMETRY OF V1500 CYGNI, AND A SEARCH FOR SIMILAR OLD CLASSICAL NOVAE

    We present phase-resolved near-infrared photometry and spectroscopy of the classical nova (CN) V1500 Cyg to explore whether cyclotron emission is present in this system. While the spectroscopy do not indicate the presence of discrete cyclotron harmonic emission, the light curves suggest that a sizable fraction of its near-infrared fluxes are due to this component. The light curves of V1500 Cyg appear to remain dominated by emission from the heated face of the secondary star in this system. We have used infrared spectroscopy and photometry to search for other potential magnetic systems among old CNe. We have found that the infrared light curves of V1974 Cyg superficially resemble those of V1500 Cyg, suggesting a highly irradiated companion. The old novae V446 Her and QV Vul have light curves with large amplitude variations like those seen in polars, suggesting they might have magnetic primaries. We extract photometry for 79 old novae from the Two Micron All Sky Survey Point Source Catalog and use those data to derive the mean, un-reddened infrared colors of quiescent novae. We also extract WISE data for these objects and find that 45 of them were detected. Surprisingly, a number of these systems were detected in the WISE 22 μm band. While two of those objects produced significant dust shells (V705 Cas and V445 Pup), the others did not. It appears that line emission from their ionized ejected shells is the most likely explanation for those detections

  13. Improving gene expression similarity measurement using pathway-based analytic dimension

    2009-01-01

    Background Gene expression similarity measuring methods were developed and applied to search rapidly growing public microarray databases. However, current expression similarity measuring methods need to be improved to accurately measure similarity between gene expression profiles from different platforms or different experiments. Results We devised new gene expression similarity measuring method based on pathway information. In short, newly devised method measure similarity between gene expre...

  14. QAR数据多维子序列的相似性搜索%Similarity search for multidimensional QAR data subsequence

    杨慧; 张国振

    2013-01-01

    High dimensionality of QAR and the uncertain relevance among them which make the method to do the similarity search for time series in the low dimensionality are no longer applicable in such situation. Taking into account the specificity of the civil aviation industry, with the similarity search for QAR to ascertain the plane faults requires a special definition of the similarity. In this paper, expertise and analytic hierarchy process algorithm are combined to be used to calculate the weightiness of different dimensionalities for the plane fault. It translates the QAR data with the symbolic method, and then builds a k-d tree index, which makes it possible to do the similarity search on multidimensional QAR data subsequences. Shape and distance are used toghther to define similarity. The high precision and the low cost are proved by the experiments in this paper.%QAR数据的高维度以及维度之间不确定的相互关联性,使得原有低维空间上度量时间序列的相似性的方法不再适用,另一方面由于民航行业的特殊性,利用QAR数据进行相似性搜索来确定飞行故障,对相似性的定义也有特殊的要求.通过专家经验结合一种层次分析算法来确定飞行故障所关联的属性维度的重要性,对QAR数据的多维子序列进行符号化表示,并利用k-d树的特殊性质建立索引,使QAR数据多维子序列的快速相似性搜索成为可能,结合形状和距离对相似性进行定义和度量,实验证明查找速度快,准确度较为满意.

  15. SPOT-Ligand: Fast and effective structure-based virtual screening by binding homology search according to ligand and receptor similarity.

    Yang, Yuedong; Zhan, Jian; Zhou, Yaoqi

    2016-07-01

    Structure-based virtual screening usually involves docking of a library of chemical compounds onto the functional pocket of the target receptor so as to discover novel classes of ligands. However, the overall success rate remains low and screening a large library is computationally intensive. An alternative to this "ab initio" approach is virtual screening by binding homology search. In this approach, potential ligands are predicted based on similar interaction pairs (similarity in receptors and ligands). SPOT-Ligand is an approach that integrates ligand similarity by Tanimoto coefficient and receptor similarity by protein structure alignment program SPalign. The method was found to yield a consistent performance in DUD and DUD-E docking benchmarks even if model structures were employed. It improves over docking methods (DOCK6 and AUTODOCK Vina) and has a performance comparable to or better than other binding-homology methods (FINDsite and PoLi) with higher computational efficiency. The server is available at http://sparks-lab.org. © 2016 Wiley Periodicals, Inc. PMID:27074979

  16. Similarity Search in Document Collections

    Jordanov, Dimitar Dimitrov

    2009-01-01

    Hlavním cílem této práce je odhadnout výkonnost volně šířeni balík  Sémantický Vektory a třída MoreLikeThis z balíku Apache Lucene. Tato práce nabízí porovnání těchto dvou přístupů a zavádí metody, které mohou vést ke zlepšení kvality vyhledávání.

  17. Design of a bioactive small molecule that targets the myotonic dystrophy type 1 RNA via an RNA motif-ligand database and chemical similarity searching.

    Parkesh, Raman; Childs-Disney, Jessica L; Nakamori, Masayuki; Kumar, Amit; Wang, Eric; Wang, Thomas; Hoskins, Jason; Tran, Tuan; Housman, David; Thornton, Charles A; Disney, Matthew D

    2012-03-14

    Myotonic dystrophy type 1 (DM1) is a triplet repeating disorder caused by expanded CTG repeats in the 3'-untranslated region of the dystrophia myotonica protein kinase (DMPK) gene. The transcribed repeats fold into an RNA hairpin with multiple copies of a 5'CUG/3'GUC motif that binds the RNA splicing regulator muscleblind-like 1 protein (MBNL1). Sequestration of MBNL1 by expanded r(CUG) repeats causes splicing defects in a subset of pre-mRNAs including the insulin receptor, the muscle-specific chloride ion channel, sarco(endo)plasmic reticulum Ca(2+) ATPase 1, and cardiac troponin T. Based on these observations, the development of small-molecule ligands that target specifically expanded DM1 repeats could be of use as therapeutics. In the present study, chemical similarity searching was employed to improve the efficacy of pentamidine and Hoechst 33258 ligands that have been shown previously to target the DM1 triplet repeat. A series of in vitro inhibitors of the RNA-protein complex were identified with low micromolar IC(50)'s, which are >20-fold more potent than the query compounds. Importantly, a bis-benzimidazole identified from the Hoechst query improves DM1-associated pre-mRNA splicing defects in cell and mouse models of DM1 (when dosed with 1 mM and 100 mg/kg, respectively). Since Hoechst 33258 was identified as a DM1 binder through analysis of an RNA motif-ligand database, these studies suggest that lead ligands targeting RNA with improved biological activity can be identified by using a synergistic approach that combines analysis of known RNA-ligand interactions with chemical similarity searching. PMID:22300544

  18. Compression-based similarity

    Vitányi, Paul

    2011-01-01

    First we consider pair-wise distances for literal objects consisting of finite binary files. These files are taken to contain all of their meaning, like genomes or books. The distances are based on compression of the objects concerned, normalized, and can be viewed as similarity distances. Second, we consider pair-wise distances between names of objects, like "red" or "christianity." In this case the distances are based on searches of the Internet. Such a search can be performed by any search...

  19. Neural circuits of eye movements during performance of the visual exploration task, which is similar to the responsive search score task, in schizophrenia patients and normal subjects

    Abnormal exploratory eye movements have been studied as a biological marker for schizophrenia. Using functional MRI (fMRI), we investigated brain activations of 12 healthy and 8 schizophrenic subjects during performance of a visual exploration task that is similar to the responsive search score task to clarify the neural basis of the abnormal exploratory eye movement. Performance data, such as the number of eye movements, the reaction time, and the percentage of correct answers showed no significant differences between the two groups. Only the normal subjects showed activations at the bilateral thalamus and the left anterior medial frontal cortex during the visual exploration tasks. In contrast, only the schizophrenic subjects showed activations at the right anterior cingulate gyms during the same tasks. The activation at the different locations between the two groups, the left anterior medial frontal cortex in normal subjects and the right anterior cingulate gyrus in schizophrenia subjects, was explained by the feature of the visual tasks. Hypoactivation at the bilateral thalamus supports a dysfunctional filtering theory of schizophrenia. (author)

  20. Textual Spatial Cosine Similarity

    Crocetti, Giancarlo

    2015-01-01

    When dealing with document similarity many methods exist today, like cosine similarity. More complex methods are also available based on the semantic analysis of textual information, which are computationally expensive and rarely used in the real time feeding of content as in enterprise-wide search environments. To address these real-time constraints, we developed a new measure of document similarity called Textual Spatial Cosine Similarity, which is able to detect similitude at the semantic ...

  1. Similarity Search in Data Stream with Adaptive Segmental Approximations%基于适应性分段估计的数据流相似性搜索

    吴枫; 仲妍; 吴泉源; 贾焰; 杨树强

    2009-01-01

    Similarity search has attracted many researchers from various communities (real-time stock quotes, network security, sensor networks). Due to the infinite, continuous, fast and real-time properties of the data from these communities, a method is needed for online similarity search in data stream. This paper first proposes the lower bound function LB_seg_WF_(global) for DTW (dynamic time warping) in the presence of global warping constraints and LB_seg_WF for DTW without global warping constraints, which are not applied to any index structures. They are segmented DTW techniques, and can be applied to sequences and queries of varying lengths in data stream. Next, several tighter lower bounds are proposed to improve the approximate degree of the LB_seg_WF_(global) and LB_seg_WF. Finally, to deal with the possible continuously non-effective problem of LB_seg_WF_(global) or LB_seg_WF in data stream, it is believed that lower-bound LB_WF_(global) (in the presence of global warping constraints) and lower-bound LB_WF, upper-bound UB_WF (without global warping constraints) can fast estimate DTW and hence reduce a lot of redundant computations by incrementally computing. The theoretical analysis and statistical experiments confirm the validity of the proposed methods.%相似性搜索在股票交易行情、网络安全、传感器网络等众多领域应用广泛.由于这些领域中产生的数据具有无限的、连续的、快速的、实时的特性,所以需要适合数据流上的在线相似性搜索算法.首先,在具有或不具有全局约束条件下,分别提出了没有索引结构的DTW(dynamic time warping)下限函数LB_seg_WF_(global)和LB_seg_WF,它们是一种分段DTW技术,能够处理数据流上的非等长序列间在线相似性匹配问题.然后,为了进一步提高LB_seg_WF_(global)和LB_seg_WF的近似程度,提出了一系列的改进方法.最后,针对流上使用LB_seg_WF_(global)或LB_seg_WF可能会出现连续失效的情况,分别提

  2. Concept Search

    Giunchiglia, Fausto; Kharkevich, Uladzimir; Zaihrayeu, Ilya

    2008-01-01

    In this paper we present a novel approach, called Concept Search, which extends syntactic search, i.e., search based on the computation of string similarity between words, with semantic search, i.e., search based on the computation of semantic relations between concepts. The key idea of Concept Search is to operate on complex concepts and to maximally exploit the semantic information available, reducing to syntactic search only when necessary, i.e., when no semantic information is available. ...

  3. Modal Similarity

    Vigo , Dr. Ronaldo

    2009-01-01

    Just as Boolean rules define Boolean categories, the Boolean operators define higher-order Boolean categories referred to as modal categories. We examine the similarity order between these categories and the standard category of logical identity (i.e. the modal category defined by the biconditional or equivalence operator). Our goal is 4-fold: first, to introduce a similarity measure for determining this similarity order; second, to show that such a measure is a good predictor of the similari...

  4. Combination of 2D/3D Ligand-Based Similarity Search in Rapid Virtual Screening from Multimillion Compound Repositories. Selection and Biological Evaluation of Potential PDE4 and PDE5 Inhibitors

    Krisztina Dobi

    2014-05-01

    Full Text Available Rapid in silico selection of target focused libraries from commercial repositories is an attractive and cost effective approach. If structures of active compounds are available rapid 2D similarity search can be performed on multimillion compound databases but the generated library requires further focusing by various 2D/3D chemoinformatics tools. We report here a combination of the 2D approach with a ligand-based 3D method (Screen3D which applies flexible matching to align reference and target compounds in a dynamic manner and thus to assess their structural and conformational similarity. In the first case study we compared the 2D and 3D similarity scores on an existing dataset derived from the biological evaluation of a PDE5 focused library. Based on the obtained similarity metrices a fusion score was proposed. The fusion score was applied to refine the 2D similarity search in a second case study where we aimed at selecting and evaluating a PDE4B focused library. The application of this fused 2D/3D similarity measure led to an increase of the hit rate from 8.5% (1st round, 47% inhibition at 10 µM to 28.5% (2nd round at 50% inhibition at 10 µM and the best two hits had 53 nM inhibitory activities.

  5. Cognitive residues of similarity

    OToole, Stephanie; Keane, Mark T.

    2013-01-01

    What are the cognitive after-effects of making a similarity judgement? What, cognitively, is left behind and what effect might these residues have on subsequent processing? In this paper, we probe for such after-effects using a visual search task, performed after a task in which pictures of real-world objects were compared. So, target objects were first presented in a comparison task (e.g., rate the similarity of this object to another) thus, presumably, modifying some of their features befor...

  6. Including Biological Literature Improves Homology Search

    Chang, Jeffrey T.; Raychaudhuri, Soumya; Altman, Russ B

    2001-01-01

    Annotating the tremendous amount of sequence information being generated requires accurate automated methods for recognizing homology. Although sequence similarity is only one of many indicators of evolutionary homology, it is often the only one used. Here we find that supplementing sequence similarity with information from biomedical literature is successful in increasing the accuracy of homology search results. We modified the PSI-BLAST algorithm to use literature similarity in each iterati...

  7. Gene functional similarity search tool (GFSST)

    Russo James J; Sheng Huitao; Zhang Jinghui; Zhang Peisen; Osborne Brian; Buetow Kenneth

    2006-01-01

    Abstract Background With the completion of the genome sequences of human, mouse, and other species and the advent of high throughput functional genomic research technologies such as biomicroarray chips, more and more genes and their products have been discovered and their functions have begun to be understood. Increasing amounts of data about genes, gene products and their functions have been stored in databases. To facilitate selection of candidate genes for gene-disease research, genetic as...

  8. Personalized Search

    AUTHOR|(SzGeCERN)749939

    2015-01-01

    As the volume of electronically available information grows, relevant items become harder to find. This work presents an approach to personalizing search results in scientific publication databases. This work focuses on re-ranking search results from existing search engines like Solr or ElasticSearch. This work also includes the development of Obelix, a new recommendation system used to re-rank search results. The project was proposed and performed at CERN, using the scientific publications available on the CERN Document Server (CDS). This work experiments with re-ranking using offline and online evaluation of users and documents in CDS. The experiments conclude that the personalized search result outperform both latest first and word similarity in terms of click position in the search result for global search in CDS.

  9. Applying ligands profiling using multiple extended electron distribution based field templates and feature trees similarity searching in the discovery of new generation of urea-based antineoplastic kinase inhibitors.

    Eman M Dokla

    Full Text Available This study provides a comprehensive computational procedure for the discovery of novel urea-based antineoplastic kinase inhibitors while focusing on diversification of both chemotype and selectivity pattern. It presents a systematic structural analysis of the different binding motifs of urea-based kinase inhibitors and the corresponding configurations of the kinase enzymes. The computational model depends on simultaneous application of two protocols. The first protocol applies multiple consecutive validated virtual screening filters including SMARTS, support vector-machine model (ROC = 0.98, Bayesian model (ROC = 0.86 and structure-based pharmacophore filters based on urea-based kinase inhibitors complexes retrieved from literature. This is followed by hits profiling against different extended electron distribution (XED based field templates representing different kinase targets. The second protocol enables cancericidal activity verification by using the algorithm of feature trees (Ftrees similarity searching against NCI database. Being a proof-of-concept study, this combined procedure was experimentally validated by its utilization in developing a novel series of urea-based derivatives of strong anticancer activity. This new series is based on 3-benzylbenzo[d]thiazol-2(3H-one scaffold which has interesting chemical feasibility and wide diversification capability. Antineoplastic activity of this series was assayed in vitro against NCI 60 tumor-cell lines showing very strong inhibition of GI(50 as low as 0.9 uM. Additionally, its mechanism was unleashed using KINEX™ protein kinase microarray-based small molecule inhibitor profiling platform and cell cycle analysis showing a peculiar selectivity pattern against Zap70, c-src, Mink1, csk and MeKK2 kinases. Interestingly, it showed activity on syk kinase confirming the recent studies finding of the high activity of diphenyl urea containing compounds against this kinase. Allover, the new series

  10. A cross-species analysis method to analyze animal models' similarity to human's disease state

    Yu Shuhao; Zheng Lulu; Li Yun; Li Chunyan; Ma Chenchen; Li Yixue; Li Xuan; Hao Pei

    2012-01-01

    Abstract Background Animal models are indispensable tools in studying the cause of human diseases and searching for the treatments. The scientific value of an animal model depends on the accurate mimicry of human diseases. The primary goal of the current study was to develop a cross-species method by using the animal models' expression data to evaluate the similarity to human diseases' and assess drug molecules' efficiency in drug research. Therefore, we hoped to reveal that it is feasible an...

  11. Memory support for desktop search

    Chen, Yi; Kelly, Liadh; Jones, Gareth J.F.

    2010-01-01

    The user's memory plays a very important role in desktop search. A search query with insufficiently or inaccurately recalled information may make the search dramatically less effective. In this paper, we discuss three approaches to support user’s memory during desktop search. These include extended types of well remembered search options, the use of past search queries and results, and search from similar items. We will also introduce our search system which incorporates these featur...

  12. Accurate Finite Difference Algorithms

    Goodrich, John W.

    1996-01-01

    Two families of finite difference algorithms for computational aeroacoustics are presented and compared. All of the algorithms are single step explicit methods, they have the same order of accuracy in both space and time, with examples up to eleventh order, and they have multidimensional extensions. One of the algorithm families has spectral like high resolution. Propagation with high order and high resolution algorithms can produce accurate results after O(10(exp 6)) periods of propagation with eight grid points per wavelength.

  13. Accurate backgrounds to Higgs production at the LHC

    Kauer, N

    2007-01-01

    Corrections of 10-30% for backgrounds to the H --> WW --> l^+l^-\\sla{p}_T search in vector boson and gluon fusion at the LHC are reviewed to make the case for precise and accurate theoretical background predictions.

  14. Custom Search Engines: Tools & Tips

    Notess, Greg R.

    2008-01-01

    Few have the resources to build a Google or Yahoo! from scratch. Yet anyone can build a search engine based on a subset of the large search engines' databases. Use Google Custom Search Engine or Yahoo! Search Builder or any of the other similar programs to create a vertical search engine targeting sites of interest to users. The basic steps to…

  15. Persistent Homology and Partial Similarity of Shapes

    Di Fabio, Barbara; Landi, Claudia

    2011-01-01

    The ability to perform shape retrieval based not only on full similarity, but also partial similarity is a key property for any content-based search engine. We prove that persistence diagrams can reveal a partial similarity between two shapes by showing a common subset of points. This can be explained using the Mayer-Vietoris formulas that we develop for ordinary, relative and extended persistent homology. An experiment outlines the potential of persistence diagrams as shape descriptors in re...

  16. Are Defect Profile Similarity Criteria Different Than Velocity Profile Similarity Criteria for the Turbulent Boundary Layer?

    Weyburne, David

    2015-01-01

    The use of the defect profile instead of the experimentally observed velocity profile for the search for similarity parameters has become firmly imbedded in the turbulent boundary layer literature. However, a search of the literature reveals that there are no theoretical reasons for this defect profile preference over the more traditional velocity profile. In the report herein, we use the flow governing equation approach to develop similarity criteria for the two profiles. Results show that t...

  17. Image Tracking for the High Similarity Drug Tablets Based on Light Intensity Reflective Energy and Artificial Neural Network

    Zhongwei Liang; Liang Zhou; Xiaochu Liu; Xiaogang Wang

    2014-01-01

    It is obvious that tablet image tracking exerts a notable influence on the efficiency and reliability of high-speed drug mass production, and, simultaneously, it also emerges as a big difficult problem and targeted focus during production monitoring in recent years, due to the high similarity shape and random position distribution of those objectives to be searched for. For the purpose of tracking tablets accurately in random distribution, through using surface fitting approach and transition...

  18. Finding Protein and Nucleotide Similarities with FASTA.

    Pearson, William R

    2016-01-01

    The FASTA programs provide a comprehensive set of rapid similarity searching tools (fasta36, fastx36, tfastx36, fasty36, tfasty36), similar to those provided by the BLAST package, as well as programs for slower, optimal, local, and global similarity searches (ssearch36, ggsearch36), and for searching with short peptides and oligonucleotides (fasts36, fastm36). The FASTA programs use an empirical strategy for estimating statistical significance that accommodates a range of similarity scoring matrices and gap penalties, improving alignment boundary accuracy and search sensitivity. The FASTA programs can produce "BLAST-like" alignment and tabular output, for ease of integration into existing analysis pipelines, and can search small, representative databases, and then report results for a larger set of sequences, using links from the smaller dataset. The FASTA programs work with a wide variety of database formats, including mySQL and postgreSQL databases. The programs also provide a strategy for integrating domain and active site annotations into alignments and highlighting the mutational state of functionally critical residues. These protocols describe how to use the FASTA programs to characterize protein and DNA sequences, using protein:protein, protein:DNA, and DNA:DNA comparisons. © 2016 by John Wiley & Sons, Inc. PMID:27010337

  19. Clustering by Pattern Similarity

    Hai-xun Wang; Jian Pei

    2008-01-01

    The task of clustering is to identify classes of similar objects among a set of objects. The definition of similarity varies from one clustering model to another. However, in most of these models the concept of similarity is often based on such metrics as Manhattan distance, Euclidean distance or other Lp distances. In other words, similar objects must have close values in at least a set of dimensions. In this paper, we explore a more general type of similarity. Under the pCluster model we proposed, two objects are similar if they exhibit a coherent pattern on a subset of dimensions. The new similarity concept models a wide range of applications. For instance, in DNA microarray analysis, the expression levels of two genes may rise and fall synchronously in response to a set of environmental stimuli. Although the magnitude of their expression levels may not be close, the patterns they exhibit can be very much alike. Discovery of such clusters of genes is essential in revealing significant connections in gene regulatory networks. E-commerce applications, such as collaborative filtering, can also benefit from the new model, because it is able to capture not only the closeness of values of certain leading indicators but also the closeness of (purchasing, browsing, etc.) patterns exhibited by the customers. In addition to the novel similarity model, this paper also introduces an effective and efficient algorithm to detect such clusters, and we perform tests on several real and synthetic data sets to show its performance.

  20. Automatic face alignment by maximizing similarity score

    Boom, Bas; Spreeuwers, Luuk; Veldhuis, Raymond; Fred, A.; Jain, A. K.

    2007-01-01

    Accurate face registration is of vital importance to the performance of a face recognition algorithm. We propose a face registration method which searches for the optimal alignment by maximizing the score of a face recognition algorithm. In this paper we investigate the practical usability of our face registration method. Experiments show that our registration method achieves better results in face verification than the landmark based registration method. We even obtain face verification resu...

  1. The semantic similarity ensemble

    Andrea Ballatore

    2013-12-01

    Full Text Available Computational measures of semantic similarity between geographic terms provide valuable support across geographic information retrieval, data mining, and information integration. To date, a wide variety of approaches to geo-semantic similarity have been devised. A judgment of similarity is not intrinsically right or wrong, but obtains a certain degree of cognitive plausibility, depending on how closely it mimics human behavior. Thus selecting the most appropriate measure for a specific task is a significant challenge. To address this issue, we make an analogy between computational similarity measures and soliciting domain expert opinions, which incorporate a subjective set of beliefs, perceptions, hypotheses, and epistemic biases. Following this analogy, we define the semantic similarity ensemble (SSE as a composition of different similarity measures, acting as a panel of experts having to reach a decision on the semantic similarity of a set of geographic terms. The approach is evaluated in comparison to human judgments, and results indicate that an SSE performs better than the average of its parts. Although the best member tends to outperform the ensemble, all ensembles outperform the average performance of each ensemble's member. Hence, in contexts where the best measure is unknown, the ensemble provides a more cognitively plausible approach.

  2. Niche Genetic Algorithm with Accurate Optimization Performance

    LIU Jian-hua; YAN De-kun

    2005-01-01

    Based on crowding mechanism, a novel niche genetic algorithm was proposed which can record evolutionary direction dynamically during evolution. After evolution, the solutions's precision can be greatly improved by means of the local searching along the recorded direction. Simulation shows that this algorithm can not only keep population diversity but also find accurate solutions. Although using this method has to take more time compared with the standard GA, it is really worth applying to some cases that have to meet a demand for high solution precision.

  3. Gender similarities and differences.

    Hyde, Janet Shibley

    2014-01-01

    Whether men and women are fundamentally different or similar has been debated for more than a century. This review summarizes major theories designed to explain gender differences: evolutionary theories, cognitive social learning theory, sociocultural theory, and expectancy-value theory. The gender similarities hypothesis raises the possibility of theorizing gender similarities. Statistical methods for the analysis of gender differences and similarities are reviewed, including effect sizes, meta-analysis, taxometric analysis, and equivalence testing. Then, relying mainly on evidence from meta-analyses, gender differences are reviewed in cognitive performance (e.g., math performance), personality and social behaviors (e.g., temperament, emotions, aggression, and leadership), and psychological well-being. The evidence on gender differences in variance is summarized. The final sections explore applications of intersectionality and directions for future research. PMID:23808917

  4. Cluster Tree Based Hybrid Document Similarity Measure

    M. Varshana Devi

    2015-10-01

    Full Text Available similarity measure is established to measure the hybrid similarity. In cluster tree, the hybrid similarity measure can be calculated for the random data even it may not be the co-occurred and generate different views. Different views of tree can be combined and choose the one which is significant in cost. A method is proposed to combine the multiple views. Multiple views are represented by different distance measures into a single cluster. Comparing the cluster tree based hybrid similarity with the traditional statistical methods it gives the better feasibility for intelligent based search. It helps in improving the dimensionality reduction and semantic analysis.

  5. Constructive Similarity of Soils

    Koudelka, Petr

    Singapore : Design, CRC a iTEK CMS Web solutions, 2012 - (Phoon, K.; Beer, M.; Quek, S.; Pang, S.), s. 206-211 ISBN 978-981-07-2218-0. [APS on Structural Reliability and Its Application – Sustainable Civil Infrastructures /5./. Singapore (SG), 23.05.2012-25.05.2012] Grant ostatní: GA ČR(CZ) GAP105/11/1160 Institutional support: RVO:68378297 Keywords : model similarity * database of soil properties * soil similarity characteristic * statistical analysis * ultimate limit states Subject RIV: JM - Building Engineering

  6. Music Retrieval based on Melodic Similarity

    Typke, R.

    2007-01-01

    This thesis introduces a method for measuring melodic similarity for notated music such as MIDI files. This music search algorithm views music as sets of notes that are represented as weighted points in the two-dimensional space of time and pitch. Two point sets can be compared by calculating how mu

  7. Information Extraction Using Distant Supervision and Semantic Similarities

    PARK, Y.

    2016-02-01

    Full Text Available Information extraction is one of the main research tasks in natural language processing and text mining that extracts useful information from unstructured sentences. Information extraction techniques include named entity recognition, relation extraction, and co-reference resolution. Among them, relation extraction refers to a task that extracts semantic relations between entities such as personal and geographic names in documents. This is an important research area, which is used in knowledge base construction and question and answering systems. This study presents relation extraction using a distant supervision learning technique among semi-supervised learning methods, which have been spotlighted in recent years to reduce human manual work and costs required for supervised learning. That is, this study proposes a method that can improve relation extraction by improving a distant supervision learning technique by applying a clustering method to create a learning corpus and semantic analysis for relation extraction that is difficult to identify using existing distant supervision. Through comparison experiments of various semantic similarity comparison methods, similarity calculation methods that are useful to relation extraction using distant supervision are searched, and a large number of accurate relation triples can be extracted using the proposed structural advantages and semantic similarity comparison.

  8. Similarity of molecular shape.

    Meyer, A Y; Richards, W G

    1991-10-01

    The similarity of one molecule to another has usually been defined in terms of electron densities or electrostatic potentials or fields. Here it is expressed as a function of the molecular shape. Formulations of similarity (S) reduce to very simple forms, thus rendering the computerised calculation straightforward and fast. 'Elements of similarity' are identified, in the same spirit as 'elements of chirality', except that the former are understood to be variable rather than present-or-absent. Methods are presented which bypass the time-consuming mathematical optimisation of the relative orientation of the molecules. Numerical results are presented and examined, with emphasis on the similarity of isomers. At the extreme, enantiomeric pairs are considered, where it is the dissimilarity (D = 1 - S) that is of consequence. We argue that chiral molecules can be graded by dissimilarity, and show that D is the shape-analog of the 'chirality coefficient', with the simple form of the former opening up numerical access to the latter. PMID:1770379

  9. The Qualitative Similarity Hypothesis

    Paul, Peter V.; Lee, Chongmin

    2010-01-01

    Evidence is presented for the qualitative similarity hypothesis (QSH) with respect to children and adolescents who are d/Deaf or hard of hearing. The primary focus is on the development of English language and literacy skills, and some information is provided on the acquisition of English as a second language. The QSH is briefly discussed within…

  10. Limiting Similarity Revisited

    Szabo, P; Meszena, G.

    2005-01-01

    We reinvestigate the validity of the limiting similarity principle via numerical simulations of the Lotka-Volterra model. A Gaussian competition kernel is employed to describe decreasing competition with increasing difference in a one-dimensional phenotype variable. The simulations are initiated by a large number of species, evenly distributed along the phenotype axis. Exceptionally, the Gaussian carrying capacity supports coexistence of all species, initially present. In case of any other, d...

  11. The Application of Similar Image Retrieval in Electronic Commerce

    YuPing Hu

    2014-01-01

    Full Text Available Traditional online shopping platform (OSP, which searches product information by keywords, faces three problems: indirect search mode, large search space, and inaccuracy in search results. For solving these problems, we discuss and research the application of similar image retrieval in electronic commerce. Aiming at improving the network customers’ experience and providing merchants with the accuracy of advertising, we design a reasonable and extensive electronic commerce application system, which includes three subsystems: image search display subsystem, image search subsystem, and product information collecting subsystem. This system can provide seamless connection between information platform and OSP, on which consumers can automatically and directly search similar images according to the pictures from information platform. At the same time, it can be used to provide accuracy of internet marketing for enterprises. The experiment shows the efficiency of constructing the system.

  12. The application of similar image retrieval in electronic commerce.

    Hu, YuPing; Yin, Hua; Han, Dezhi; Yu, Fei

    2014-01-01

    Traditional online shopping platform (OSP), which searches product information by keywords, faces three problems: indirect search mode, large search space, and inaccuracy in search results. For solving these problems, we discuss and research the application of similar image retrieval in electronic commerce. Aiming at improving the network customers' experience and providing merchants with the accuracy of advertising, we design a reasonable and extensive electronic commerce application system, which includes three subsystems: image search display subsystem, image search subsystem, and product information collecting subsystem. This system can provide seamless connection between information platform and OSP, on which consumers can automatically and directly search similar images according to the pictures from information platform. At the same time, it can be used to provide accuracy of internet marketing for enterprises. The experiment shows the efficiency of constructing the system. PMID:24883411

  13. A new approach for finding semantic similar scientific articles

    Masumeh Islami Nasab; Reza Javidan

    2015-01-01

    Calculating article similarities enables users to find similar articles and documents in a collection of articles. Two similar documents are extremely helpful for text applications such as document-to-document similarity search, plagiarism checker, text mining for repetition, and text filtering. This paper proposes a new method for calculating the semantic similarities of articles. WordNet is used to find word semantic associations. The proposed technique first compares the similarity of each...

  14. An efficient and accurate 3D displacements tracking strategy for digital volume correlation

    Pan, Bing

    2014-07-01

    Owing to its inherent computational complexity, practical implementation of digital volume correlation (DVC) for internal displacement and strain mapping faces important challenges in improving its computational efficiency. In this work, an efficient and accurate 3D displacement tracking strategy is proposed for fast DVC calculation. The efficiency advantage is achieved by using three improvements. First, to eliminate the need of updating Hessian matrix in each iteration, an efficient 3D inverse compositional Gauss-Newton (3D IC-GN) algorithm is introduced to replace existing forward additive algorithms for accurate sub-voxel displacement registration. Second, to ensure the 3D IC-GN algorithm that converges accurately and rapidly and avoid time-consuming integer-voxel displacement searching, a generalized reliability-guided displacement tracking strategy is designed to transfer accurate and complete initial guess of deformation for each calculation point from its computed neighbors. Third, to avoid the repeated computation of sub-voxel intensity interpolation coefficients, an interpolation coefficient lookup table is established for tricubic interpolation. The computational complexity of the proposed fast DVC and the existing typical DVC algorithms are first analyzed quantitatively according to necessary arithmetic operations. Then, numerical tests are performed to verify the performance of the fast DVC algorithm in terms of measurement accuracy and computational efficiency. The experimental results indicate that, compared with the existing DVC algorithm, the presented fast DVC algorithm produces similar precision and slightly higher accuracy at a substantially reduced computational cost. © 2014 Elsevier Ltd.

  15. The qualitative similarity hypothesis.

    Paul, Peter V; Lee, Chongmin

    2010-01-01

    Evidence is presented for the qualitative similarity hypothesis (QSH) with respect to children and adolescents who are d/Deaf or hard of hearing. The primary focus is on the development of English language and literacy skills, and some information is provided on the acquisition of English as a second language. The QSH is briefly discussed within the purview of two groups of cognitive models: those that emphasize the cognitive development of individuals and those that pertain to disciplinary or knowledge structures. It is argued that the QSH has scientific merit with implications for classroom instruction. Future research should examine the validity of the QSH in other disciplines such as mathematics and science and should include perspectives from social as well as cognitive models. PMID:20415280

  16. Self Similar Optical Fiber

    Lai, Zheng-Xuan

    This research proposes Self Similar optical fiber (SSF) as a new type of optical fiber. It has a special core that consists of self similar structure. Such a structure is obtained by following the formula for generating iterated function systems (IFS) in Fractal Theory. The resulted SSF can be viewed as a true fractal object in optical fibers. In addition, the method of fabricating SSF makes it possible to generate desired structures exponentially in numbers, whereas it also allows lower scale units in the structure to be reduced in size exponentially. The invention of SSF is expected to greatly ease the production of optical fiber when a large number of small hollow structures are needed in the core of the optical fiber. This dissertation will analyze the core structure of SSF based on fractal theory. Possible properties from the structural characteristics and the corresponding applications are explained. Four SSF samples were obtained through actual fabrication in a laboratory environment. Different from traditional conductive heating fabrication system, I used an in-house designed furnace that incorporated a radiation heating method, and was equipped with automated temperature control system. The obtained samples were examined through spectrum tests. Results from the tests showed that SSF does have the optical property of delivering light in a certain wavelength range. However, SSF as a new type of optical fiber requires a systematic research to find out the theory that explains its structure and the associated optical properties. The fabrication and quality of SSF also needs to be improved for product deployment. As a start of this extensive research, this dissertation work opens the door to a very promising new area in optical fiber research.

  17. The place of highly accurate methods by RNAA in metrology

    With the introduction of physical metrological concepts to chemical analysis which require that the result should be accompanied by uncertainty statement written down in terms of Sl units, several researchers started to consider lD-MS as the only method fulfilling this requirement. However, recent publications revealed that in certain cases also some expert laboratories using lD-MS and analyzing the same material, produced results for which their uncertainty statements did not overlap, what theoretically should not have taken place. This shows that no monopoly is good in science and it would be desirable to widen the set of methods acknowledged as primary in inorganic trace analysis. Moreover, lD-MS cannot be used for monoisotopic elements. The need for searching for other methods having similar metrological quality as the lD-MS seems obvious. In this paper, our long-time experience on devising highly accurate ('definitive') methods by RNAA for the determination of selected trace elements in biological materials is reviewed. The general idea of definitive methods based on combination of neutron activation with the highly selective and quantitative isolation of the indicator radionuclide by column chromatography followed by gamma spectrometric measurement is reminded and illustrated by examples of the performance of such methods when determining Cd, Co, Mo, etc. lt is demonstrated that such methods are able to provide very reliable results with very low levels of uncertainty traceable to Sl units

  18. Concept Search: Semantics Enabled Information Retrieval

    Giunchiglia, Fausto; Kharkevich, Uladzimir; Zaihrayeu, Ilya

    2010-01-01

    In this paper we present a novel approach, called Concept Search, which extends syntactic search, i.e., search based on the computation of string similarity between words, with semantic search, i.e., search based on the computation of semantic relations between concepts. The key idea of Concept Search is to operate on complex concepts and to maximally exploit the semantic information available, reducing to syntactic search only when necessary, i.e., when no semantic information is available. ...

  19. Accurate guitar tuning by cochlear implant musicians.

    Thomas Lu

    Full Text Available Modern cochlear implant (CI users understand speech but find difficulty in music appreciation due to poor pitch perception. Still, some deaf musicians continue to perform with their CI. Here we show unexpected results that CI musicians can reliably tune a guitar by CI alone and, under controlled conditions, match simultaneously presented tones to <0.5 Hz. One subject had normal contralateral hearing and produced more accurate tuning with CI than his normal ear. To understand these counterintuitive findings, we presented tones sequentially and found that tuning error was larger at ∼ 30 Hz for both subjects. A third subject, a non-musician CI user with normal contralateral hearing, showed similar trends in performance between CI and normal hearing ears but with less precision. This difference, along with electric analysis, showed that accurate tuning was achieved by listening to beats rather than discriminating pitch, effectively turning a spectral task into a temporal discrimination task.

  20. Accurate guitar tuning by cochlear implant musicians.

    Lu, Thomas; Huang, Juan; Zeng, Fan-Gang

    2014-01-01

    Modern cochlear implant (CI) users understand speech but find difficulty in music appreciation due to poor pitch perception. Still, some deaf musicians continue to perform with their CI. Here we show unexpected results that CI musicians can reliably tune a guitar by CI alone and, under controlled conditions, match simultaneously presented tones to <0.5 Hz. One subject had normal contralateral hearing and produced more accurate tuning with CI than his normal ear. To understand these counterintuitive findings, we presented tones sequentially and found that tuning error was larger at ∼ 30 Hz for both subjects. A third subject, a non-musician CI user with normal contralateral hearing, showed similar trends in performance between CI and normal hearing ears but with less precision. This difference, along with electric analysis, showed that accurate tuning was achieved by listening to beats rather than discriminating pitch, effectively turning a spectral task into a temporal discrimination task. PMID:24651081

  1. Towards accurate emergency response behavior

    Nuclear reactor operator emergency response behavior has persisted as a training problem through lack of information. The industry needs an accurate definition of operator behavior in adverse stress conditions, and training methods which will produce the desired behavior. Newly assembled information from fifty years of research into human behavior in both high and low stress provides a more accurate definition of appropriate operator response, and supports training methods which will produce the needed control room behavior. The research indicates that operator response in emergencies is divided into two modes, conditioned behavior and knowledge based behavior. Methods which assure accurate conditioned behavior, and provide for the recovery of knowledge based behavior, are described in detail

  2. Similarity search and data mining techniques for advanced database systems.

    Pryakhin, Alexey

    2006-01-01

    Modern automated methods for measurement, collection, and analysis of data in industry and science are providing more and more data with drastically increasing structure complexity. On the one hand, this growing complexity is justified by the need for a richer and more precise description of real-world objects, on the other hand it is justified by the rapid progress in measurement and analysis techniques that allow the user a versatile exploration of objects. In order to manage the huge volum...

  3. Time Searching for Similar Binary Vectors in Associative Memory

    Frolov, A. A.; Húsek, Dušan; Rachkovskij, D.

    2006-01-01

    Roč. 42, č. 5 (2006), s. 615-623. ISSN 1060-0396 R&D Projects: GA MŠk(CZ) 1M0567 Institutional research plan: CEZ:AV0Z10300504 Keywords : associative memory * neural network * Hopfield network * binary vector * indexing * hashing Subject RIV: BB - Applied Statistics, Operational Research

  4. Efficient Similarity Retrieval in Music Databases

    Ruxanda, Maria Magdalena; Jensen, Christian Søndergaard

    2006-01-01

    Audio music is increasingly becoming available in digital form, and the digital music collections of individuals continue to grow. Addressing the need for effective means of retrieving music from such collections, this paper proposes new techniques for content-based similarity search. Each music...... object is modeled as a time sequence of high-dimensional feature vectors, and dynamic time warping (DTW) is used as the similarity measure. To accomplish this, the paper extends techniques for time-series-length reduction and lower bounding of DTW distance to the multi-dimensional case. Further, the...

  5. Professional Microsoft search fast search, Sharepoint search, and search server

    Bennett, Mark; Kehoe, Miles; Voskresenskaya, Natalya

    2010-01-01

    Use Microsoft's latest search-based technology-FAST search-to plan, customize, and deploy your search solutionFAST is Microsoft's latest intelligent search-based technology that boasts robustness and an ability to integrate business intelligence with Search. This in-depth guide provides you with advanced coverage on FAST search and shows you how to use it to plan, customize, and deploy your search solution, with an emphasis on SharePoint 2010 and Internet-based search solutions.With a particular appeal for anyone responsible for implementing and managing enterprise search, this book presents t

  6. Accurate determination of antenna directivity

    Dich, Mikael

    1997-01-01

    The derivation of a formula for accurate estimation of the total radiated power from a transmitting antenna for which the radiated power density is known in a finite number of points on the far-field sphere is presented. The main application of the formula is determination of directivity from power...

  7. Integrated Semantic Similarity Model Based on Ontology

    LIU Ya-Jun; ZHAO Yun

    2004-01-01

    To solve the problem of the inadequacy of semantic processing in the intelligent question answering system, an integrated semantic similarity model which calculates the semantic similarity using the geometric distance and information content is presented in this paper.With the help of interrelationship between concepts, the information content of concepts and the strength of the edges in the ontology network, we can calculate the semantic similarity between two concepts and provide information for the further calculation of the semantic similarity between user's question and answers in knowlegdge base.The results of the experiments on the prototype have shown that the semantic problem in natural language processing can also be solved with the help of the knowledge and the abundant semantic information in ontology.More than 90% accuracy with less than 50 ms average searching time in the intelligent question answering prototype system based on ontology has been reached.The result is very satisfied.

  8. The similarity principle - on using models correctly

    Landberg, L.; Mortensen, N.G.; Rathmann, O.;

    2003-01-01

    This paper will present some guiding principles on the most accurate use of the WAsP program in particular, but the principle can be applied to the use of any linear model which predicts some quantity at one location based on another. We have felt a need to lay out these principles out explicitly......, due to the many, many users and the uses (and misuses) of the WAsP program. Put simply, the similarity principle states that one should chose a predictor site which – in as many ways as possible – is similar to the predicted site....

  9. Search Cloud

    ... this page: https://medlineplus.gov/cloud.html Search Cloud To use the sharing features on this page, ... Top 110 zoster vaccine Share the MedlinePlus search cloud with your users by embedding our search cloud ...

  10. Search Tips

    ... do not need to use AND because the search engine automatically finds resources containing all of your search ... Use as a wildcard when you want the search engine to fill in the blank for you; you ...

  11. Search Cloud

    ... https://www.nlm.nih.gov/medlineplus/cloud.html Search Cloud To use the sharing features on this page, please enable JavaScript. Share the MedlinePlus search cloud with your users by embedding our search ...

  12. Accurate pose estimation for forensic identification

    Merckx, Gert; Hermans, Jeroen; Vandermeulen, Dirk

    2010-04-01

    In forensic authentication, one aims to identify the perpetrator among a series of suspects or distractors. A fundamental problem in any recognition system that aims for identification of subjects in a natural scene is the lack of constrains on viewing and imaging conditions. In forensic applications, identification proves even more challenging, since most surveillance footage is of abysmal quality. In this context, robust methods for pose estimation are paramount. In this paper we will therefore present a new pose estimation strategy for very low quality footage. Our approach uses 3D-2D registration of a textured 3D face model with the surveillance image to obtain accurate far field pose alignment. Starting from an inaccurate initial estimate, the technique uses novel similarity measures based on the monogenic signal to guide a pose optimization process. We will illustrate the descriptive strength of the introduced similarity measures by using them directly as a recognition metric. Through validation, using both real and synthetic surveillance footage, our pose estimation method is shown to be accurate, and robust to lighting changes and image degradation.

  13. Notions of similarity for computational biology models

    Waltemath, Dagmar

    2016-03-21

    Computational models used in biology are rapidly increasing in complexity, size, and numbers. To build such large models, researchers need to rely on software tools for model retrieval, model combination, and version control. These tools need to be able to quantify the differences and similarities between computational models. However, depending on the specific application, the notion of similarity may greatly vary. A general notion of model similarity, applicable to various types of models, is still missing. Here, we introduce a general notion of quantitative model similarities, survey the use of existing model comparison methods in model building and management, and discuss potential applications of model comparison. To frame model comparison as a general problem, we describe a theoretical approach to defining and computing similarities based on different model aspects. Potentially relevant aspects of a model comprise its references to biological entities, network structure, mathematical equations and parameters, and dynamic behaviour. Future similarity measures could combine these model aspects in flexible, problem-specific ways in order to mimic users\\' intuition about model similarity, and to support complex model searches in databases.

  14. SProt: sphere-based protein structure similarity algorithm

    2011-01-01

    Background Similarity search in protein databases is one of the most essential issues in computational proteomics. With the growing number of experimentally resolved protein structures, the focus shifted from sequences to structures. The area of structure similarity forms a big challenge since even no standard definition of optimal structure similarity exists in the field. Results We propose a protein structure similarity measure called SProt. SProt concentrates on high-quality modeling of lo...

  15. Measuring Personalization of Web Search

    Hannak, Aniko; Sapiezynski, Piotr; Kakhki, Arash Molavi;

    2013-01-01

    Web search is an integral part of our daily lives. Recently, there has been a trend of personalization in Web search, where different users receive different results for the same search query. The increasing personalization is leading to concerns about Filter Bubble effects, where certain users...... are simply unable to access information that the search engines’ algorithm decidesis irrelevant. Despitetheseconcerns, there has been little quantification of the extent of personalization in Web search today, or the user attributes that cause it. In light of this situation, we make three contributions....... First, we develop a methodology for measuring personalization in Web search results. While conceptually simple, there are numerous details that our methodology must handle in order to accurately attribute differences in search results to personalization. Second, we apply our methodology to 200 users...

  16. Fast and accurate marker-based projective registration method for uncalibrated transmission electron microscope tilt series

    This paper presents a fast and accurate marker-based automatic registration technique for aligning uncalibrated projections taken from a transmission electron microscope (TEM) with different tilt angles and orientations. Most of the existing TEM image alignment methods estimate the similarity between images using the projection model with least-squares metric and guess alignment parameters by computationally expensive nonlinear optimization schemes. Approaches based on the least-squares metric which is sensitive to outliers may cause misalignment since automatic tracking methods, though reliable, can produce a few incorrect trajectories due to a large number of marker points. To decrease the influence of outliers, we propose a robust similarity measure using the projection model with a Gaussian weighting function. This function is very effective in suppressing outliers that are far from correct trajectories and thus provides a more robust metric. In addition, we suggest a fast search strategy based on the non-gradient Powell's multidimensional optimization scheme to speed up optimization as only meaningful parameters are considered during iterative projection model estimation. Experimental results show that our method brings more accurate alignment with less computational cost compared to conventional automatic alignment methods.

  17. Search Patterns

    Morville, Peter

    2010-01-01

    What people are saying about Search Patterns "Search Patterns is a delight to read -- very thoughtful and thought provoking. It's the most comprehensive survey of designing effective search experiences I've seen." --Irene Au, Director of User Experience, Google "I love this book! Thanks to Peter and Jeffery, I now know that search (yes, boring old yucky who cares search) is one of the coolest ways around of looking at the world." --Dan Roam, author, The Back of the Napkin (Portfolio Hardcover) "Search Patterns is a playful guide to the practical concerns of search interface design. It cont

  18. Stochastic Self-Similar and Fractal Universe

    Iovane, G; Tortoriello, F S

    2004-01-01

    The structures formation of the Universe appears as if it were a classically self-similar random process at all astrophysical scales. An agreement is demonstrated for the present hypotheses of segregation with a size of astrophysical structures by using a comparison between quantum quantities and astrophysical ones. We present the observed segregated Universe as the result of a fundamental self-similar law, which generalizes the Compton wavelength relation. It appears that the Universe has a memory of its quantum origin as suggested by R.Penrose with respect to quasi-crystal. A more accurate analysis shows that the present theory can be extended from the astrophysical to the nuclear scale by using generalized (stochastically) self-similar random process. This transition is connected to the relevant presence of the electromagnetic and nuclear interactions inside the matter. In this sense, the presented rule is correct from a subatomic scale to an astrophysical one. We discuss the near full agreement at organic...

  19. Accurate Modeling of Advanced Reflectarrays

    Zhou, Min

    of the incident field, the choice of basis functions, and the technique to calculate the far-field. Based on accurate reference measurements of two offset reflectarrays carried out at the DTU-ESA Spherical NearField Antenna Test Facility, it was concluded that the three latter factors are particularly important...... to the conventional phase-only optimization technique (POT), the geometrical parameters of the array elements are directly optimized to fulfill the far-field requirements, thus maintaining a direct relation between optimization goals and optimization variables. As a result, better designs can be obtained compared...... using the GDOT to demonstrate its capabilities. To verify the accuracy of the GDOT, two offset contoured beam reflectarrays that radiate a high-gain beam on a European coverage have been designed and manufactured, and subsequently measured at the DTU-ESA Spherical Near-Field Antenna Test Facility...

  20. Accurate ab initio spin densities

    Boguslawski, Katharina; Legeza, Örs; Reiher, Markus

    2012-01-01

    We present an approach for the calculation of spin density distributions for molecules that require very large active spaces for a qualitatively correct description of their electronic structure. Our approach is based on the density-matrix renormalization group (DMRG) algorithm to calculate the spin density matrix elements as basic quantity for the spatially resolved spin density distribution. The spin density matrix elements are directly determined from the second-quantized elementary operators optimized by the DMRG algorithm. As an analytic convergence criterion for the spin density distribution, we employ our recently developed sampling-reconstruction scheme [J. Chem. Phys. 2011, 134, 224101] to build an accurate complete-active-space configuration-interaction (CASCI) wave function from the optimized matrix product states. The spin density matrix elements can then also be determined as an expectation value employing the reconstructed wave function expansion. Furthermore, the explicit reconstruction of a CA...

  1. Accurate thickness measurement of graphene

    Shearer, Cameron J.; Slattery, Ashley D.; Stapleton, Andrew J.; Shapter, Joseph G.; Gibson, Christopher T.

    2016-03-01

    Graphene has emerged as a material with a vast variety of applications. The electronic, optical and mechanical properties of graphene are strongly influenced by the number of layers present in a sample. As a result, the dimensional characterization of graphene films is crucial, especially with the continued development of new synthesis methods and applications. A number of techniques exist to determine the thickness of graphene films including optical contrast, Raman scattering and scanning probe microscopy techniques. Atomic force microscopy (AFM), in particular, is used extensively since it provides three-dimensional images that enable the measurement of the lateral dimensions of graphene films as well as the thickness, and by extension the number of layers present. However, in the literature AFM has proven to be inaccurate with a wide range of measured values for single layer graphene thickness reported (between 0.4 and 1.7 nm). This discrepancy has been attributed to tip-surface interactions, image feedback settings and surface chemistry. In this work, we use standard and carbon nanotube modified AFM probes and a relatively new AFM imaging mode known as PeakForce tapping mode to establish a protocol that will allow users to accurately determine the thickness of graphene films. In particular, the error in measuring the first layer is reduced from 0.1-1.3 nm to 0.1-0.3 nm. Furthermore, in the process we establish that the graphene-substrate adsorbate layer and imaging force, in particular the pressure the tip exerts on the surface, are crucial components in the accurate measurement of graphene using AFM. These findings can be applied to other 2D materials.

  2. P2P Concept Search: Some Preliminary Results

    Giunchiglia, Fausto; Kharkevich, Uladzimir; Noori, S.R.H

    2009-01-01

    Concept Search extends syntactic search, i.e., search based on the computation of string similarity between words, with semantic search, i.e., search based on the computation of semantic relations between complex concepts. It allows us to deal with ambiguity of natural language. P2P Concept Search extends Concept Search by allowing distributed semantic search over structured P2P network. The key idea is to exploit distributed, rather than centralized, background knowledge and indices.

  3. Predicting user click behaviour in search engine advertisements

    Daryaie Zanjani, Mohammad; Khadivi, Shahram

    2015-10-01

    According to the specific requirements and interests of users, search engines select and display advertisements that match user needs and have higher probability of attracting users' attention based on their previous search history. New objects such as user, advertisement or query cause a deterioration of precision in targeted advertising due to their lack of history. This article surveys this challenge. In the case of new objects, we first extract similar observed objects to the new object and then we use their history as the history of new object. Similarity between objects is measured based on correlation, which is a relation between user and advertisement when the advertisement is displayed to the user. This method is used for all objects, so it has helped us to accurately select relevant advertisements for users' queries. In our proposed model, we assume that similar users behave in a similar manner. We find that users with few queries are similar to new users. We will show that correlation between users and advertisements' keywords is high. Thus, users who pay attention to advertisements' keywords, click similar advertisements. In addition, users who pay attention to specific brand names might have similar behaviours too.

  4. Functional Similarity and Interpersonal Attraction.

    Neimeyer, Greg J.; Neimeyer, Robert A.

    1981-01-01

    Students participated in dyadic disclosure exercises over a five-week period. Results indicated members of high functional similarity dyads evidenced greater attraction to one another than did members of low functional similarity dyads. "Friendship" pairs of male undergraduates displayed greater functional similarity than did "nominal" pairs from…

  5. Contextual Factors for Finding Similar Experts

    Hofmann, Katja; Balog, Krisztian; Bogers, Toine;

    2010-01-01

    -seeking models, are rarely taken into account. In this article, we extend content-based expert-finding approaches with contextual factors that have been found to influence human expert finding. We focus on a task of science communicators in a knowledge-intensive environment, the task of finding similar experts......-centered perspective. The main focus has been on developing content-based algorithms similar to document search. These algorithms identify matching experts primarily on the basis of the textual content of documents with which experts are associated. Other factors, such as the ones identified by expertise......, given an example expert. Our approach combines expertise-seeking and retrieval research. First, we conduct a user study to identify contextual factors that may play a role in the studied task and environment. Then, we design expert retrieval models to capture these factors. We combine these with content-based...

  6. A More Accurate Fourier Transform

    Courtney, Elya

    2015-01-01

    Fourier transform methods are used to analyze functions and data sets to provide frequencies, amplitudes, and phases of underlying oscillatory components. Fast Fourier transform (FFT) methods offer speed advantages over evaluation of explicit integrals (EI) that define Fourier transforms. This paper compares frequency, amplitude, and phase accuracy of the two methods for well resolved peaks over a wide array of data sets including cosine series with and without random noise and a variety of physical data sets, including atmospheric $\\mathrm{CO_2}$ concentrations, tides, temperatures, sound waveforms, and atomic spectra. The FFT uses MIT's FFTW3 library. The EI method uses the rectangle method to compute the areas under the curve via complex math. Results support the hypothesis that EI methods are more accurate than FFT methods. Errors range from 5 to 10 times higher when determining peak frequency by FFT, 1.4 to 60 times higher for peak amplitude, and 6 to 10 times higher for phase under a peak. The ability t...

  7. Improved Search Techniques

    Albornoz, Caleb Ronald

    2012-01-01

    Thousands of millions of documents are stored and updated daily in the World Wide Web. Most of the information is not efficiently organized to build knowledge from the stored data. Nowadays, search engines are mainly used by users who rely on their skills to look for the information needed. This paper presents different techniques search engine users can apply in Google Search to improve the relevancy of search results. According to the Pew Research Center, the average person spends eight hours a month searching for the right information. For instance, a company that employs 1000 employees wastes $2.5 million dollars on looking for nonexistent and/or not found information. The cost is very high because decisions are made based on the information that is readily available to use. Whenever the information necessary to formulate an argument is not available or found, poor decisions may be made and mistakes will be more likely to occur. Also, the survey indicates that only 56% of Google users feel confident with their current search skills. Moreover, just 76% of the information that is available on the Internet is accurate.

  8. Analytical Searching.

    Pappas, Marjorie L.

    1995-01-01

    Discusses analytical searching, a process that enables searchers of electronic resources to develop a planned strategy by combining words or phrases with Boolean operators. Defines simple and complex searching, and describes search strategies developed with Boolean logic and truncation. Provides guidelines for teaching students analytical…

  9. Relativistic mergers of black hole binaries have large, similar masses, low spins and are circular

    Amaro-Seoane, Pau; Chen, Xian

    2016-05-01

    Gravitational waves are a prediction of general relativity, and with ground-based detectors now running in their advanced configuration, we will soon be able to measure them directly for the first time. Binaries of stellar-mass black holes are among the most interesting sources for these detectors. Unfortunately, the many different parameters associated with the problem make it difficult to promptly produce a large set of waveforms for the search in the data stream. To reduce the number of templates to develop, one must restrict some of the physical parameters to a certain range of values predicted by either (electromagnetic) observations or theoretical modelling. In this work, we show that `hyperstellar' black holes (HSBs) with masses 30 ≲ MBH/M⊙ ≲ 100, i.e black holes significantly larger than the nominal 10 M⊙, will have an associated low value for the spin, i.e. a similar masses. We also address the distribution of the eccentricities of HSB binaries in dense stellar systems using a large suite of three-body scattering experiments that include binary-single interactions and long-lived hierarchical systems with a highly accurate integrator, including relativistic corrections up to O(1/c^5). We find that most sources in the detector band will have nearly zero eccentricities. This correlation between large, similar masses, low spin and low eccentricity will help to accelerate the searches for gravitational-wave signals.

  10. A new adaptive fast motion estimation algorithm based on local motion similarity degree (LMSD)

    LIU Long; HAN Chongzhao; BAI Yan

    2005-01-01

    In the motion vector field adaptive search technique (MVFAST) and the predictive motion vector field adaptive search technique (PMVFAST), the size of the largest motion vector from the three adjacent blocks (left, top, top-right) is compared with the threshold to select different search scheme. But a suitable search center and search pattern will not be selected in the adaptive search technique when the adjacent motion vectors are not coherent in local region. This paper presents an efficient adaptive search algorithm. The motion vector variation degree (MVVD) is considered a reasonable factor for adaptive search selection. By the relationship between local motion similarity degree (LMSD) and the variation degree of motion vector (MVVD), the motion vectors are classified as three categories according to corresponding LMSD; then different proposed search schemes are adopted for motion estimation. The experimental results show that the proposed algorithm has a significant computational speedup compared with MVFAST and PMVFAST algorithms, and offers a similar, even better performance.

  11. A COMPARISON OF SEMANTIC SIMILARITY MODELS IN EVALUATING CONCEPT SIMILARITY

    Q. X. Xu

    2012-08-01

    Full Text Available The semantic similarities are important in concept definition, recognition, categorization, interpretation, and integration. Many semantic similarity models have been established to evaluate semantic similarities of objects or/and concepts. To find out the suitability and performance of different models in evaluating concept similarities, we make a comparison of four main types of models in this paper: the geometric model, the feature model, the network model, and the transformational model. Fundamental principles and main characteristics of these models are introduced and compared firstly. Land use and land cover concepts of NLCD92 are employed as examples in the case study. The results demonstrate that correlations between these models are very high for a possible reason that all these models are designed to simulate the similarity judgement of human mind.

  12. Learning Multi-modal Similarity

    McFee, Brian

    2010-01-01

    In many applications involving multi-media data, the definition of similarity between items is integral to several key tasks, e.g., nearest-neighbor retrieval, classification, and recommendation. Data in such regimes typically exhibits multiple modalities, such as acoustic and visual content of video. Integrating such heterogeneous data to form a holistic similarity space is therefore a key challenge to be overcome in many real-world applications. We present a novel multiple kernel learning technique for integrating heterogeneous data into a single, unified similarity space. Our algorithm learns an optimal ensemble of kernel transfor- mations which conform to measurements of human perceptual similarity, as expressed by relative comparisons. To cope with the ubiquitous problems of subjectivity and inconsistency in multi- media similarity, we develop graph-based techniques to filter similarity measurements, resulting in a simplified and robust training procedure.

  13. Roget's Thesaurus and Semantic Similarity

    Jarmasz, Mario; Szpakowicz, Stan

    2012-01-01

    We have implemented a system that measures semantic similarity using a computerized 1987 Roget's Thesaurus, and evaluated it by performing a few typical tests. We compare the results of these tests with those produced by WordNet-based similarity measures. One of the benchmarks is Miller and Charles' list of 30 noun pairs to which human judges had assigned similarity measures. We correlate these measures with those computed by several NLP systems. The 30 pairs can be traced back to Rubenstein ...

  14. Similarity-Based Prediction of Travel Times for Vehicles Traveling on Known Routes

    Tiesyte, Dalia; Jensen, Christian Søndergaard

    2008-01-01

    , historical data in combination with real-time data may be used to predict the future travel times of vehicles more accurately, thus improving the experience of the users who rely on such information. We propose a Nearest-Neighbor Trajectory (NNT) technique that identifies the historical trajectory that is......The use of centralized, real-time position tracking is proliferating in the areas of logistics and public transportation. Real-time positions can be used to provide up-to-date information to a variety of users, and they can also be accumulated for uses in subsequent data analyses. In particular...... trajectories of vehicles that travel along known routes. In empirical studies with real data from buses, we evaluate how well the proposed distance functions are capable of predicting future vehicle movements. Second, we propose a main-memory index structure that enables incremental similarity search and that...

  15. Aggregated search: a new information retrieval paradigm

    Kopliku, Arlind; Pinel-Sauvagnat, Karen; Boughanem, Mohand

    2014-01-01

    Traditional search engines return ranked lists of search results. It is up to the user to scroll this list, scan within different documents and assemble information that fulfill his/her information need. Aggregated search represents a new class of approaches where the information is not only retrieved but also assembled. This is the current evolution in Web search, where diverse content (images, videos, ...) and relational content (similar entities, features) are included in search results. I...

  16. Towards a more accurate concept of fuels

    Full text: The introduction of LEU in Atucha and the approval of CARA show an advancement of the Argentine power stations fuels, which stimulate and show a direction to follow. In the first case, the use of enriched U fuel relax an important restriction related to neutronic economy; that means that it is possible to design less penalized fuels using more Zry. The second case allows a decrease in the lineal power of the rods, enabling a better performance of the fuel in normal and also in accident conditions. In this work we wish to emphasize this last point, trying to find a design in which the surface power of the rod is diminished. Hence, in accident conditions owing to lack of coolant, the cladding tube will not reach temperatures that will produce oxidation, with the corresponding H2 formation and with plasticity enough to form blisters, which will obstruct the reflooding and hydration that will produce fragility and rupture of the cladding tube, with the corresponding radioactive material dispersion. This work is oriented to find rods designs with quasi rectangular geometry to lower the surface power of the rods, in order to obtain a lower central temperature of the rod. Thus, critical temperatures will not be reached in case of lack of coolant. This design is becoming a reality after PPFAE's efforts in search of cladding tubes fabrication with different circumferential values, rectangular in particular. This geometry, with an appropriate pellet design, can minimize the pellet-cladding interaction and, through the accurate width election, non rectified pellets could be used. This means an important economy in pellets production, as well as an advance in the fabrication of fuels in gloves box and hot cells in the future. The sequence to determine critical geometrical parameters is described and some rod dispositions are explored

  17. Search and Recommendation

    Bogers, Toine

    2014-01-01

    In just a little over half a century, the field of information retrieval has experienced spectacular growth and success, with IR applications such as search engines becoming a billion-dollar industry in the past decades. Recommender systems have seen an even more meteoric rise to success with wide......-scale application by companies like Amazon, Facebook, and Netflix. But are search and recommendation really two different fields of research that address different problems with different sets of algorithms in papers published at distinct conferences? In my talk, I want to argue that search and recommendation are...... more similar than they have been treated in the past decade. By looking more closely at the tasks and problems that search and recommendation try to solve, at the algorithms used to solve these problems and at the way their performance is evaluated, I want to show that there is no clear black and white...

  18. Personalized recommendation with corrected similarity

    Personalized recommendation has attracted a surge of interdisciplinary research. Especially, similarity-based methods in applications of real recommendation systems have achieved great success. However, the computations of similarities are overestimated or underestimated, in particular because of the defective strategy of unidirectional similarity estimation. In this paper, we solve this drawback by leveraging mutual correction of forward and backward similarity estimations, and propose a new personalized recommendation index, i.e., corrected similarity based inference (CSI). Through extensive experiments on four benchmark datasets, the results show a greater improvement of CSI in comparison with these mainstream baselines. And a detailed analysis is presented to unveil and understand the origin of such difference between CSI and mainstream indices. (paper)

  19. Quantifying the similarities within fold space.

    Harrison, Andrew; Pearl, Frances; Mott, Richard; Thornton, Janet; Orengo, Christine

    2002-11-01

    We have used GRATH, a graph-based structure comparison algorithm, to map the similarities between the different folds observed in the CATH domain structure database. Statistical analysis of the distributions of the fold similarities has allowed us to assess the significance for any similarity. Therefore we have examined whether it is best to represent folds as discrete entities or whether, in fact, a more accurate model would be a continuum wherein folds overlap via common motifs. To do this we have introduced a new statistical measure of fold similarity, termed gregariousness. For a particular fold, gregariousness measures how many other folds have a significant structural overlap with that fold, typically comprising 40% or more of the larger structure. Gregarious folds often contain commonly occurring super-secondary structural motifs, such as beta-meanders, greek keys, alpha-beta plait motifs or alpha-hairpins, which are matching similar motifs in other folds. Apart from one example, all the most gregarious folds matching 20% or more of the other folds in the database, are alpha-beta proteins. They also occur in highly populated architectural regions of fold space, adopting sandwich-like arrangements containing two or more layers of alpha-helices and beta-strands.Domains that exhibit a low gregariousness, are those that have very distinctive folds, with few common motifs or motifs that are packed in unusual arrangements. Most of the superhelices exhibit low gregariousness despite containing some commonly occurring super-secondary structural motifs. In these folds, these common motifs are combined in an unusual way and represent a small proportion of the fold (<10%). Our results suggest that fold space may be considered as continuous for some architectural arrangements (e.g. alpha-beta sandwiches), in that super-secondary motifs can be used to link neighbouring fold groups. However, in other regions of fold space much more discrete topologies are observed with

  20. 38 CFR 4.46 - Accurate measurement.

    2010-07-01

    ... 38 Pensions, Bonuses, and Veterans' Relief 1 2010-07-01 2010-07-01 false Accurate measurement. 4... RATING DISABILITIES Disability Ratings The Musculoskeletal System § 4.46 Accurate measurement. Accurate measurement of the length of stumps, excursion of joints, dimensions and location of scars with respect...

  1. Self-similar aftershock rates

    Davidsen, Jörn

    2016-01-01

    In many important systems exhibiting crackling noise --- intermittent avalanche-like relaxation response with power-law and, thus, self-similar distributed event sizes --- the "laws" for the rate of activity after large events are not consistent with the overall self-similar behavior expected on theoretical grounds. This is in particular true for the case of seismicity and a satisfying solution to this paradox has remained outstanding. Here, we propose a generalized description of the aftershock rates which is both self-similar and consistent with all other known self-similar features. Comparing our theoretical predictions with high resolution earthquake data from Southern California we find excellent agreement, providing in particular clear evidence for a unified description of aftershocks and foreshocks. This may offer an improved way of time-dependent seismic hazard assessment and earthquake forecasting.

  2. Supervised Learning with Similarity Functions

    Kar, Purushottam; Jain, Prateek

    2012-01-01

    We address the problem of general supervised learning when data can only be accessed through an (indefinite) similarity function between data points. Existing work on learning with indefinite kernels has concentrated solely on binary/multi-class classification problems. We propose a model that is generic enough to handle any supervised learning task and also subsumes the model previously proposed for classification. We give a "goodness" criterion for similarity functions w.r.t. a given superv...

  3. Similarity measures for protein ensembles

    Lindorff-Larsen, Kresten; Ferkinghoff-Borg, Jesper

    2009-01-01

    Analyses of similarities and changes in protein conformation can provide important information regarding protein function and evolution. Many scores, including the commonly used root mean square deviation, have therefore been developed to quantify the similarities of different protein conformations...... synthetic example from molecular dynamics simulations. We then apply the algorithms to revisit the problem of ensemble averaging during structure determination of proteins, and find that an ensemble refinement method is able to recover the correct distribution of conformations better than standard single...

  4. Learning Multi-modal Similarity

    McFee, Brian; Lanckriet, Gert

    2010-01-01

    In many applications involving multi-media data, the definition of similarity between items is integral to several key tasks, e.g., nearest-neighbor retrieval, classification, and recommendation. Data in such regimes typically exhibits multiple modalities, such as acoustic and visual content of video. Integrating such heterogeneous data to form a holistic similarity space is therefore a key challenge to be overcome in many real-world applications. We present a novel multiple kernel learning t...

  5. Method of similarity for cavitation

    The knowledge of possible cavitation in subassembly nozzles of the fast reactor core implies the realization of a fluid dynamic model test. We propose a method of similarity based on the non-dimensionalization of the equation of motion for viscous capillarity fluid issued from the Cahn and Hilliard model. Taking into account the dissolved gas effect, a condition of compatibility is determined. This condition must be respected by the fluid in experiment, along with the scaling between the two similar flows. (author)

  6. SELF-SIMILAR TRAFFIC GENERATOR

    Linawati Linawati; I Made Suartika

    2009-01-01

    Network traffic generator can be produced using OPNET. OPNET generates the traffic as explicit traffic or background traffic. This paper demonstrates generating traffic in OPNET 7.0 as background traffic. The traffi generator that was simulated is self-similar traffic with different Hurst parameter. The simulation results proved that OPNET with background traffic function can be as a qualified self-similar traffic generator. These results can help in investigating and analysing network perfor...

  7. HOW DISSIMILARLY SIMILAR ARE BIOSIMILARS?

    Ramshankar Vijayalakshmi; Kesavan Sabitha; Krishnamurthy Arvind

    2012-01-01

    Recently Biopharmaceuticals are the new chemotherapeutical agents that are called as “Biosimilars” or “follow on protein products” by the European Medicines Agency (EMA) and the American regulatory agencies (Food and Drug Administration) respectively. Biosimilars are extremely similar to the reference molecule but not identical, however close their similarities may be. A regulatory framework is therefore in place to assess the application for marketing authorisation of biosimilars. When a bi...

  8. Molecular similarity of MDR inhibitors

    Simon Gibbons; Mire Zloh

    2004-01-01

    Abstract: The molecular similarity of multidrug resistance (MDR) inhibitors was evaluated using the point centred atom charge approach in an attempt to find some common features of structurally unrelated inhibitors. A series of inhibitors of bacterial MDR were studied and there is a high similarity between these in terms of their shape, presence and orientation of aromatic ring moieties. A comparison of the lipophilic properties of these molecules has also been conducted suggesting that this ...

  9. Faceted Search

    Tunkelang, Daniel

    2009-01-01

    We live in an information age that requires us, more than ever, to represent, access, and use information. Over the last several decades, we have developed a modern science and technology for information retrieval, relentlessly pursuing the vision of a "memex" that Vannevar Bush proposed in his seminal article, "As We May Think." Faceted search plays a key role in this program. Faceted search addresses weaknesses of conventional search approaches and has emerged as a foundation for interactive information retrieval. User studies demonstrate that faceted search provides more

  10. Are search committees really searching?

    Hoffmeir, Patricia A

    2003-02-01

    Academic chair searches are admittedly a labor-intensive process, but they are made more difficult and often lead to less-than-optimal outcomes because search committees spend their time "advertising," "looking," but not truly searching for academic chairs. At the onset, certain "realities" must be acknowledged, including (1) understanding that unless your organization is renowned in the specialty for which you are conducting the search, candidates won't be pounding at your door for a job, (2) searches that fail to include an overall assessment of the department in question are likely to miss the mark, (3) chairs must have demonstrated not only clinical expertise but also business savvy, (4) the best candidate is not necessarily someone who is already a department chair, (5) when it comes to chair searches, it's a buyer's market, and (6) the search process is inextricably linked to the success of the search. Key to the process of conducting an academic chair search are the judicious formation of the search committee; committee members' willingness to do their homework, attend all committee meeting, and keep the committee's activities confidential; crafting, not revising, the current job description for the open chair position; interviewing viable candidates rather than all candidates and adhering to a coordinated interviewing process; and evaluating internal and external candidates according to the same parameters. PMID:12584089

  11. Representation is representation of similarities.

    Edelman, S

    1998-08-01

    Advanced perceptual systems are faced with the problem of securing a principled (ideally, veridical) relationship between the world and its internal representation. I propose a unified approach to visual representation, addressing the need for superordinate and basic-level categorization and for the identification of specific instances of familiar categories. According to the proposed theory, a shape is represented internally by the responses of a small number of tuned modules, each broadly selective for some reference shape, whose similarity to the stimulus it measures. This amounts to embedding the stimulus in a low-dimensional proximal shape space spanned by the outputs of the active modules. This shape space supports representations of distal shape similarities that are veridical as Shepard's (1968) second-order isomorphisms (i.e., correspondence between distal and proximal similarities among shapes, rather than between distal shapes and their proximal representations). Representation in terms of similarities to reference shapes supports processing (e.g., discrimination) of shapes that are radically different from the reference ones, without the need for the computationally problematic decomposition into parts required by other theories. Furthermore, a general expression for similarity between two stimuli, based on comparisons to reference shapes, can be used to derive models of perceived similarity ranging from continuous, symmetric, and hierarchical ones, as in multidimensional scaling (Shepard 1980), to discrete and nonhierarchical ones, as in the general contrast models (Shepard & Arabie 1979; Tversky 1977). PMID:10097019

  12. Hash: a program to accurately predict protein H{sup {alpha}} shifts from neighboring backbone shifts

    Zeng Jianyang, E-mail: zengjy@gmail.com [Tsinghua University, Institute for Interdisciplinary Information Sciences (China); Zhou Pei [Duke University Medical Center, Department of Biochemistry (United States); Donald, Bruce Randall [Duke University, Department of Computer Science (United States)

    2013-01-15

    Chemical shifts provide not only peak identities for analyzing nuclear magnetic resonance (NMR) data, but also an important source of conformational information for studying protein structures. Current structural studies requiring H{sup {alpha}} chemical shifts suffer from the following limitations. (1) For large proteins, the H{sup {alpha}} chemical shifts can be difficult to assign using conventional NMR triple-resonance experiments, mainly due to the fast transverse relaxation rate of C{sup {alpha}} that restricts the signal sensitivity. (2) Previous chemical shift prediction approaches either require homologous models with high sequence similarity or rely heavily on accurate backbone and side-chain structural coordinates. When neither sequence homologues nor structural coordinates are available, we must resort to other information to predict H{sup {alpha}} chemical shifts. Predicting accurate H{sup {alpha}} chemical shifts using other obtainable information, such as the chemical shifts of nearby backbone atoms (i.e., adjacent atoms in the sequence), can remedy the above dilemmas, and hence advance NMR-based structural studies of proteins. By specifically exploiting the dependencies on chemical shifts of nearby backbone atoms, we propose a novel machine learning algorithm, called Hash, to predict H{sup {alpha}} chemical shifts. Hash combines a new fragment-based chemical shift search approach with a non-parametric regression model, called the generalized additive model, to effectively solve the prediction problem. We demonstrate that the chemical shifts of nearby backbone atoms provide a reliable source of information for predicting accurate H{sup {alpha}} chemical shifts. Our testing results on different possible combinations of input data indicate that Hash has a wide rage of potential NMR applications in structural and biological studies of proteins.

  13. Capacity Planning for Vertical Search Engines

    Badue, Claudine; Almeida, Virgilio; Baeza-Yates, Ricardo; Ribeiro-Neto, Berthier; Ziviani, Artur; Ziviani, Nivio

    2010-01-01

    Vertical search engines focus on specific slices of content, such as the Web of a single country or the document collection of a large corporation. Despite this, like general open web search engines, they are expensive to maintain, expensive to operate, and hard to design. Because of this, predicting the response time of a vertical search engine is usually done empirically through experimentation, requiring a costly setup. An alternative is to develop a model of the search engine for predicting performance. However, this alternative is of interest only if its predictions are accurate. In this paper we propose a methodology for analyzing the performance of vertical search engines. Applying the proposed methodology, we present a capacity planning model based on a queueing network for search engines with a scale typically suitable for the needs of large corporations. The model is simple and yet reasonably accurate and, in contrast to previous work, considers the imbalance in query service times among homogeneous...

  14. Similarity measures for face recognition

    Vezzetti, Enrico

    2015-01-01

    Face recognition has several applications, including security, such as (authentication and identification of device users and criminal suspects), and in medicine (corrective surgery and diagnosis). Facial recognition programs rely on algorithms that can compare and compute the similarity between two sets of images. This eBook explains some of the similarity measures used in facial recognition systems in a single volume. Readers will learn about various measures including Minkowski distances, Mahalanobis distances, Hansdorff distances, cosine-based distances, among other methods. The book also summarizes errors that may occur in face recognition methods. Computer scientists "facing face" and looking to select and test different methods of computing similarities will benefit from this book. The book is also useful tool for students undertaking computer vision courses.

  15. A Novel Personalized Web Search Model

    ZHU Zhengyu; XU Jingqiu; TIAN Yunyan; REN Xiang

    2007-01-01

    A novel personalized Web search model is proposed.The new system, as a middleware between a user and a Web search engine, is set up on the client machine. It can learn a user's preference implicitly and then generate the user profile automatically. When the user inputs query keywords, the system can automatically generate a few personalized expansion words by computing the term-term associations according to the current user profile, and then these words together with the query keywords are submitted to a popular search engine such as Yahoo or Google.These expansion words help to express accurately the user's search intention. The new Web search model can make a common search engine personalized, that is, the search engine can return different search results to different users who input the same keywords. The experimental results show the feasibility and applicability of the presented work.

  16. Similarity Measures for Comparing Biclusterings.

    Horta, Danilo; Campello, Ricardo J G B

    2014-01-01

    The comparison of ordinary partitions of a set of objects is well established in the clustering literature, which comprehends several studies on the analysis of the properties of similarity measures for comparing partitions. However, similarity measures for clusterings are not readily applicable to biclusterings, since each bicluster is a tuple of two sets (of rows and columns), whereas a cluster is only a single set (of rows). Some biclustering similarity measures have been defined as minor contributions in papers which primarily report on proposals and evaluation of biclustering algorithms or comparative analyses of biclustering algorithms. The consequence is that some desirable properties of such measures have been overlooked in the literature. We review 14 biclustering similarity measures. We define eight desirable properties of a biclustering measure, discuss their importance, and prove which properties each of the reviewed measures has. We show examples drawn and inspired from important studies in which several biclustering measures convey misleading evaluations due to the absence of one or more of the discussed properties. We also advocate the use of a more general comparison approach that is based on the idea of transforming the original problem of comparing biclusterings into an equivalent problem of comparing clustering partitions with overlapping clusters. PMID:26356865

  17. A square from similar rectangles

    Dorichenko, Sergey; Skopenkov, Mikhail

    2013-01-01

    In the present popular science paper we determine when a square can be dissected into rectangles similar to a given rectangle. The approach to the question is based on a physical interpretation using electrical networks. Only secondary school background is assumed in the paper.

  18. Approaches to Sequence Similarity Representation

    Sokolov, Artem; Rachkovskij, Dmitri

    2006-01-01

    We discuss several approaches to similarity preserving coding of symbol sequences and possible connections of their distributed versions to metric embeddings. Interpreting sequence representation methods with embeddings can help develop an approach to their analysis and may lead to discovering useful properties.

  19. HOW DISSIMILARLY SIMILAR ARE BIOSIMILARS?

    Ramshankar Vijayalakshmi

    2012-05-01

    Full Text Available Recently Biopharmaceuticals are the new chemotherapeutical agents that are called as “Biosimilars” or “follow on protein products” by the European Medicines Agency (EMA and the American regulatory agencies (Food and Drug Administration respectively. Biosimilars are extremely similar to the reference molecule but not identical, however close their similarities may be. A regulatory framework is therefore in place to assess the application for marketing authorisation of biosimilars. When a biosimilar is similar to the reference biopharmaceutical in terms of safety, quality, and efficacy, it can be registered. It is important to document data from clinical trials with a view of similar safety and efficacy. If the development time for a generic medicine is around 3 years, a biosimilar takes about 6-9 years. Generic medicines need to demonstrate bioequivalence only unlike biosimilars that need to conduct phase I and Phase III clinical trials. In this review, different biosimilars that are already being used successfully in the field on Oncology is discussed. Their similarity, differences and guidelines to be followed before a clinically informed decision to be taken, is discussed. More importantly the regulatory guidelines that are operational in India with a work flow of making a biosimilar with relevant dos and dont’s are discussed. For a large populous country like India, where with improved treatments in all sectors including oncology, our ageing population is increasing. For the health care of this sector, we need more newer, cheaper and effective biosimilars in the market. It becomes therefore important to understand the regulatory guidelines and steps to come up with more biosimilars for the existing population and also more information is mandatory for the practicing clinicians to translate these effectively into clinical practice.

  20. Practical fulltext search in medical records

    Vít Volšička

    2015-09-01

    Full Text Available Performing a search through previously existing documents, including medical reports, is an integral part of acquiring new information and educational processes. Unfortunately, finding relevant information is not always easy, since many documents are saved in free text formats, thereby making it difficult to search through them. A full-text search is a viable solution for searching through documents. The full-text search makes it possible to efficiently search through large numbers of documents and to find those that contain specific search phrases in a short time. All leading database systems currently offer full-text search, but some do not support the complex morphology of the Czech language. Apache Solr provides full support options and some full-text libraries. This programme provides the good support of the Czech language in the basic installation, and a wide range of settings and options for its deployment over any platform. The library had been satisfactorily tested using real data from the hospitals. Solr provided useful, fast, and accurate searches. However, there is still a need to make adjustments in order to receive effective search results, particularly by correcting typographical errors made not only in the text, but also when entering words in the search box and creating a list of frequently used abbreviations and synonyms for more accurate results.

  1. Search Engines Selection Based on Relevance Terms%基于相关术语集的搜索引擎选择

    欧洁

    2003-01-01

    Metasearch can effectively search distributed immense electronic resources. It is built on top of severalsearch engines, providing user with uniform access to these engines. Metasearch first passes user's query to underly-ing useful search engines, and then collects and reorganizes the results from the search engines used. It is calledsearch engines selection when metasearch selects underlying useful search engines. In this paper, we present a statis-tical method based on relevance terms to estimate the usefulness of a search engine for any given query, which is suit-able for both Boolean query and vector query. Experimental results indicate that the proposed estimation method isquite accurate, especially when the critical similarity is high between the query and the results.

  2. Implementation Of ROCK Clustering Algorithm For The Optimization Of Query Searching Time

    Ashwina Tyagi

    2012-05-01

    Full Text Available Clustering is a data mining technique of grouping similar type of data or queries together which helps in identifying similar subject areas. The major problem is to identify heterogeneous subjectareas where frequent queries are asked. There are number of agglomerative clustering algorithms which are used to cluster the data. The problem with these algorithms is that they make use of distance measures to calculate similarity. So the best suited algorithm for clustering the categorical data is Robust Clustering Using Links (ROCK [1] algorithm because it uses Jaccard coefficient instead of using the distance measures to find the similarity between the data or documents to classify the clusters. The mechanism for classifying the clusters based on the similarity measure shall be used over a given set of data. This method will make clusters of the data corresponding to different subject areas so that a priorknowledge about similarity can be maintained which in turn will help to discover accurate and consistent clusters and will reduce the query response time. The main objective of our work is to implement ROCK [1] and to decrease the query response time by searching the documents in the resulted clusters instead of searching the whole database. This technique actually reduces the searching time of documents from the database.

  3. Accurate hydrocarbon estimates attained with radioactive isotope

    To make accurate economic evaluations of new discoveries, an oil company needs to know how much gas and oil a reservoir contains. The porous rocks of these reservoirs are not completely filled with gas or oil, but contain a mixture of gas, oil and water. It is extremely important to know what volume percentage of this water--called connate water--is contained in the reservoir rock. The percentage of connate water can be calculated from electrical resistivity measurements made downhole. The accuracy of this method can be improved if a pure sample of connate water can be analyzed or if the chemistry of the water can be determined by conventional logging methods. Because of the similarity of the mud filtrate--the water in a water-based drilling fluid--and the connate water, this is not always possible. If the oil company cannot distinguish between connate water and mud filtrate, its oil-in-place calculations could be incorrect by ten percent or more. It is clear that unless an oil company can be sure that a sample of connate water is pure, or at the very least knows exactly how much mud filtrate it contains, its assessment of the reservoir's water content--and consequently its oil or gas content--will be distorted. The oil companies have opted for the Repeat Formation Tester (RFT) method. Label the drilling fluid with small doses of tritium--a radioactive isotope of hydrogen--and it will be easy to detect and quantify in the sample

  4. Combinatorial Approaches to Accurate Identification of Orthologous Genes

    Shi, Guanqun

    2011-01-01

    The accurate identification of orthologous genes across different species is a critical and challenging problem in comparative genomics and has a wide spectrum of biological applications including gene function inference, evolutionary studies and systems biology. During the past several years, many methods have been proposed for ortholog assignment based on sequence similarity, phylogenetic approaches, synteny information, and genome rearrangement. Although these methods share many commonly a...

  5. Retrieval of similar chess positions

    Ganguly, Debasis; LEVELING, JOHANNES; Jones, Gareth J.F.

    2014-01-01

    We address the problem of retrieving chess game positions similar to a given query position from a collection of archived chess games. We investigate this problem from an information retrieval (IR) perspective. The advantage of our proposed IR-based approach is that it allows using the standard inverted organization of stored chess positions, leading to an ecient retrieval. Moreover, in contrast to retrieving exactly identical board positions, the IR-based approach is able to provide approxim...

  6. SPATIO-TEXTUAL SIMILARITY JOIN

    Ch Shylaja and Supreethi K.P

    2015-07-01

    Full Text Available Data mining is the process of discovering interesting patterns and knowledge from large amounts of data. Spatial databases store large space related data, such as maps, preprocessed remote sensing or medical imaging data. Modern mobile phones and mobile devices are equipped with GPS devices; this is the reason for the Location based services to gain significant attention. These Location based services generate large amounts of spatio- textual data which contain both spatial location and textual description. The spatiotextual objects have different representations because of deviations in GPS or due to different user descriptions. This calls for the need of efficient methods to integrate spatio-textual data. Spatio-textual similarity join meets this need. Spatio-textual similarity join: Given two sets of spatio-textual data, it finds all the similar pairs. Filter and refine framework will be developed to device the algorithms. The prefix filter technique will be extended to generate spatial and textual signatures and inverted indexes will be built on top of these signatures. Candidate pairs will be found using these indexes. Finally the candidate pairs will be refined to get the result. MBR-prefix based signature will be used to prune dissimilar objects. Hybrid signature will be used to support spatial and textual pruning simultaneously.

  7. Roget's Thesaurus and Semantic Similarity

    Jarmasz, Mario

    2012-01-01

    We have implemented a system that measures semantic similarity using a computerized 1987 Roget's Thesaurus, and evaluated it by performing a few typical tests. We compare the results of these tests with those produced by WordNet-based similarity measures. One of the benchmarks is Miller and Charles' list of 30 noun pairs to which human judges had assigned similarity measures. We correlate these measures with those computed by several NLP systems. The 30 pairs can be traced back to Rubenstein and Goodenough's 65 pairs, which we have also studied. Our Roget's-based system gets correlations of .878 for the smaller and .818 for the larger list of noun pairs; this is quite close to the .885 that Resnik obtained when he employed humans to replicate the Miller and Charles experiment. We further evaluate our measure by using Roget's and WordNet to answer 80 TOEFL, 50 ESL and 300 Reader's Digest questions: the correct synonym must be selected amongst a group of four words. Our system gets 78.75%, 82.00% and 74.33% of ...

  8. Interfacial Molecular Searching Using Forager Dynamics

    Monserud, Jon H.; Schwartz, Daniel K.

    2016-03-01

    Many biological and technological systems employ efficient non-Brownian intermittent search strategies where localized searches alternate with long flights. Coincidentally, molecular species exhibit intermittent behavior at the solid-liquid interface, where periods of slow motion are punctuated by fast flights through the liquid phase. Single-molecule tracking was used here to observe the interfacial search process of DNA for complementary DNA. Measured search times were qualitatively consistent with an intermittent-flight model, and ˜10 times faster than equivalent Brownian searches, suggesting that molecular searches for reactive sites benefit from similar efficiencies as biological organisms.

  9. Landscape similarity, retrieval, and machine mapping of physiographic units

    Jasiewicz, Jaroslaw; Netzel, Pawel; Stepinski, Tomasz F.

    2014-09-01

    We introduce landscape similarity - a numerical measure that assesses affinity between two landscapes on the basis of similarity between the patterns of their constituent landform elements. Such a similarity function provides core technology for a landscape search engine - an algorithm that parses the topography of a study area and finds all places with landscapes broadly similar to a landscape template. A landscape search can yield answers to a query in real time, enabling a highly effective means to explore large topographic datasets. In turn, a landscape search facilitates auto-mapping of physiographic units within a study area. The country of Poland serves as a test bed for these novel concepts. The topography of Poland is given by a 30 m resolution DEM. The geomorphons method is applied to this DEM to classify the topography into ten common types of landform elements. A local landscape is represented by a square tile cut out of a map of landform elements. A histogram of cell-pair features is used to succinctly encode the composition and texture of a pattern within a local landscape. The affinity between two local landscapes is assessed using the Wave-Hedges similarity function applied to the two corresponding histograms. For a landscape search the study area is organized into a lattice of local landscapes. During the search the algorithm calculates the similarity between each local landscape and a given query. Our landscape search for Poland is implemented as a GeoWeb application called TerraEx-Pl and is available at http://sil.uc.edu/. Given a sample, or a number of samples, from a target physiographic unit the landscape search delineates this unit using the principles of supervised machine learning. Repeating this procedure for all units yields a complete physiographic map. The application of this methodology to topographic data of Poland results in the delineation of nine physiographic units. The resultant map bears a close resemblance to a conventional

  10. Turning Search into Knowledge Management.

    Kaufman, David

    2002-01-01

    Discussion of knowledge management for electronic data focuses on creating a high quality similarity ranking algorithm. Topics include similarity ranking and unstructured data management; searching, categorization, and summarization of documents; query evaluation; considering sentences in addition to keywords; and vector models. (LRW)