WorldWideScience

Sample records for accurate similarity search

  1. Application of kernel functions for accurate similarity search in large chemical databases

    2010-01-01

    Background Similaritysearch in chemical structure databases is an important problem with many applications in chemical genomics, drug design, and efficient chemical probe screening among others. It is widely believed that structure based methods provide an efficient way to do the query. Recently various graph kernel functions have been designed to capture the intrinsic similarity of graphs. Though successful in constructing accurate predictive and classification models, graph kernel functions...

  2. Approximate similarity search

    Amato, Giuseppe

    2000-01-01

    Similarity searching is fundamental in various application areas. Recently it has attracted much attention in the database community because of the growing need to deal with large volume of data. Consequently, efficiency has become a matter of concern in design. Although much has been done to develop structures able to perform fast similarity search, results are still not satisfactory, and more research is needed. The performance of similarity search for complex features deteriorates and does...

  3. Protein structural similarity search by Ramachandran codes

    Chang Chih-Hung; Huang Po-Jung; Lo Wei-Cheng; Lyu Ping-Chiang

    2007-01-01

    Abstract Background Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we ai...

  4. Multivariate Time Series Similarity Searching

    Jimin Wang; Yuelong Zhu; Shijin Li; Dingsheng Wan; Pengcheng Zhang

    2014-01-01

    Multivariate time series (MTS) datasets are very common in various financial, multimedia, and hydrological fields. In this paper, a dimension-combination method is proposed to search similar sequences for MTS. Firstly, the similarity of single-dimension series is calculated; then the overall similarity of the MTS is obtained by synthesizing each of the single-dimension similarity based on weighted BORDA voting method. The dimension-combination method could use the existing similarity searchin...

  5. Protein structural similarity search by Ramachandran codes

    Chang Chih-Hung

    2007-08-01

    Full Text Available Abstract Background Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases. Results We propose a new linear encoding method, SARST (Structural similarity search Aided by Ramachandran Sequential Transformation. SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms. Conclusion As a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era.

  6. Scaling Group Testing Similarity Search

    Iscen, Ahmet; Amsaleg, Laurent; Furon, Teddy

    2016-01-01

    The large dimensionality of modern image feature vectors, up to thousands of dimensions, is challenging the high dimensional indexing techniques. Traditional approaches fail at returning good quality results within a response time that is usable in practice. However, similarity search techniques inspired by the group testing framework have recently been proposed in an attempt to specifically defeat the curse of dimensionality. Yet, group testing does not scale and fails at indexing very large...

  7. Semantically enabled image similarity search

    Casterline, May V.; Emerick, Timothy; Sadeghi, Kolia; Gosse, C. A.; Bartlett, Brent; Casey, Jason

    2015-05-01

    Georeferenced data of various modalities are increasingly available for intelligence and commercial use, however effectively exploiting these sources demands a unified data space capable of capturing the unique contribution of each input. This work presents a suite of software tools for representing geospatial vector data and overhead imagery in a shared high-dimension vector or embedding" space that supports fused learning and similarity search across dissimilar modalities. While the approach is suitable for fusing arbitrary input types, including free text, the present work exploits the obvious but computationally difficult relationship between GIS and overhead imagery. GIS is comprised of temporally-smoothed but information-limited content of a GIS, while overhead imagery provides an information-rich but temporally-limited perspective. This processing framework includes some important extensions of concepts in literature but, more critically, presents a means to accomplish them as a unified framework at scale on commodity cloud architectures.

  8. Biosequence Similarity Search on the Mercury System

    Krishnamurthy, Praveen; Buhler, Jeremy; Chamberlain, Roger; Franklin, Mark; Gyang, Kwame; Jacob, Arpith; Lancaster, Joseph

    2007-01-01

    Biosequence similarity search is an important application in modern molecular biology. Search algorithms aim to identify sets of sequences whose extensional similarity suggests a common evolutionary origin or function. The most widely used similarity search tool for biosequences is BLAST, a program designed to compare query sequences to a database. Here, we present the design of BLASTN, the version of BLAST that searches DNA sequences, on the Mercury system, an architecture that supports high...

  9. Similarity Measures for Boolean Search Request Formulations.

    Radecki, Tadeusz

    1982-01-01

    Proposes a means for determining the similarity between search request formulations in online information retrieval systems, and discusses the use of similarity measures for clustering search formulations and document files in such systems. Experimental results using the proposed methods are presented in three tables. A reference list is provided.…

  10. Secure sketch search for document similarity

    Örencik, Cengiz; Orencik, Cengiz; Alewiwi, Mahmoud Khaled; SAVAŞ, Erkay; Savas, Erkay

    2015-01-01

    Document similarity search is an important problem that has many applications especially in outsourced data. With the wide spread of cloud computing, users tend to outsource their data to remote servers which are not necessarily trusted. This leads to the problem of protecting the privacy of sensitive data. We design and implement two secure similarity search schemes for textual documents utilizing locality sensitive hashing techniques for cosine similarity. While the first one provides very ...

  11. Efficient Authentication of Outsourced String Similarity Search

    Dong, Boxiang; Wang, Hui

    2016-01-01

    Cloud computing enables the outsourcing of big data analytics, where a third party server is responsible for data storage and processing. In this paper, we consider the outsourcing model that provides string similarity search as the service. In particular, given a similarity search query, the service provider returns all strings from the outsourced dataset that are similar to the query string. A major security concern of the outsourcing paradigm is to authenticate whether the service provider...

  12. Mobile P2P Fast Similarity Search

    Bocek, T; Hecht, F. V.; Hausheer, D; Hunt, E; Stiller, B.

    2009-01-01

    In informal data sharing environments, misspellings cause problems for data indexing and retrieval. This is even more pronounced in mobile environments, in which devices with limited input devices are used. In a mobile environment, similarity search algorithms for finding misspelled data need to account for limited CPU and bandwidth. This demo shows P2P fast similarity search (P2PFastSS) running on mobile phones and laptops that is tailored to uncertain data entry and use...

  13. Multiresolution Similarity Search in Image Databases

    Heczko, Martin; Hinneburg, Alexander; Keim, Daniel A.; Wawryniuk, Markus

    2004-01-01

    Typically searching image collections is based on features of the images. In most cases the features are based on the color histogram of the images. Similarity search based on color histograms is very efficient, but the quality of the search results is often rather poor. One of the reasons is that histogram-based systems only support a specific form of global similarity using the whole histogram as one vector. But there is more information in a histogram than the distribution of colors. This ...

  14. Representation Independent Proximity and Similarity Search

    Chodpathumwan, Yodsawalai; Aleyasin, Amirhossein; Termehchy, Arash; Sun, Yizhou

    2015-01-01

    Finding similar or strongly related entities in a graph database is a fundamental problem in data management and analytics with applications in similarity query processing, entity resolution, and pattern matching. Similarity search algorithms usually leverage the structural properties of the data graph to quantify the degree of similarity or relevance between entities. Nevertheless, the same information can be represented in many different structures and the structural properties observed ove...

  15. Web Search Results Summarization Using Similarity Assessment

    Sawant V.V.

    2014-06-01

    Full Text Available Now day’s internet has become part of our life, the WWW is most important service of internet because it allows presenting information such as document, imaging etc. The WWW grows rapidly and caters to a diversified levels and categories of users. For user specified results web search results are extracted. Millions of information pouring online, users has no time to surf the contents completely .Moreover the information available is repeated or duplicated in nature. This issue has created the necessity to restructure the search results that could yield results summarized. The proposed approach comprises of different feature extraction of web pages. Web page visual similarity assessment has been employed to address the problems in different fields including phishing, web archiving, web search engine etc. In this approach, initially by enters user query the number of search results get stored. The Earth Mover's Distance is used to assessment of web page visual similarity, in this technique take the web page as a low resolution image, create signature of that web page image with color and co-ordinate features .Calculate the distance between web pages by applying EMD method. Compute the Layout Similarity value by using tag comparison algorithm and template comparison algorithm. Textual similarity is computed by using cosine similarity, and hyperlink analysis is performed to compute outward links. The final similarity value is calculated by fusion of layout, text, hyperlink and EMD value. Once the similarity matrix is found clustering is employed with the help of connected component. Finally group of similar web pages i.e. summarized results get displayed to user. Experiment conducted to demonstrate the effectiveness of four methods to generate summarized result on different web pages and user queries also.

  16. SEAL: Spatio-Textual Similarity Search

    Fan, Ju; Zhou, Lizhu; Chen, Shanshan; Hu, Jun

    2012-01-01

    Location-based services (LBS) have become more and more ubiquitous recently. Existing methods focus on finding relevant points-of-interest (POIs) based on users' locations and query keywords. Nowadays, modern LBS applications generate a new kind of spatio-textual data, regions-of-interest (ROIs), containing region-based spatial information and textual description, e.g., mobile user profiles with active regions and interest tags. To satisfy search requirements on ROIs, we study a new research problem, called spatio-textual similarity search: Given a set of ROIs and a query ROI, we find the similar ROIs by considering spatial overlap and textual similarity. Spatio-textual similarity search has many important applications, e.g., social marketing in location-aware social networks. It calls for an efficient search method to support large scales of spatio-textual data in LBS systems. To this end, we introduce a filter-and-verification framework to compute the answers. In the filter step, we generate signatures for ...

  17. New similarity search based glioma grading

    MR-based differentiation between low- and high-grade gliomas is predominately based on contrast-enhanced T1-weighted images (CE-T1w). However, functional MR sequences as perfusion- and diffusion-weighted sequences can provide additional information on tumor grade. Here, we tested the potential of a recently developed similarity search based method that integrates information of CE-T1w and perfusion maps for non-invasive MR-based glioma grading. We prospectively included 37 untreated glioma patients (23 grade I/II, 14 grade III gliomas), in whom 3T MRI with FLAIR, pre- and post-contrast T1-weighted, and perfusion sequences was performed. Cerebral blood volume, cerebral blood flow, and mean transit time maps as well as CE-T1w images were used as input for the similarity search. Data sets were preprocessed and converted to four-dimensional Gaussian Mixture Models that considered correlations between the different MR sequences. For each patient, a so-called tumor feature vector (= probability-based classifier) was defined and used for grading. Biopsy was used as gold standard, and similarity based grading was compared to grading solely based on CE-T1w. Accuracy, sensitivity, and specificity of pure CE-T1w based glioma grading were 64.9%, 78.6%, and 56.5%, respectively. Similarity search based tumor grading allowed differentiation between low-grade (I or II) and high-grade (III) gliomas with an accuracy, sensitivity, and specificity of 83.8%, 78.6%, and 87.0%. Our findings indicate that integration of perfusion parameters and CE-T1w information in a semi-automatic similarity search based analysis improves the potential of MR-based glioma grading compared to CE-T1w data alone. (orig.)

  18. New similarity search based glioma grading

    Haegler, Katrin; Brueckmann, Hartmut; Linn, Jennifer [Ludwig-Maximilians-University of Munich, Department of Neuroradiology, Munich (Germany); Wiesmann, Martin; Freiherr, Jessica [RWTH Aachen University, Department of Neuroradiology, Aachen (Germany); Boehm, Christian [Ludwig-Maximilians-University of Munich, Department of Computer Science, Munich (Germany); Schnell, Oliver; Tonn, Joerg-Christian [Ludwig-Maximilians-University of Munich, Department of Neurosurgery, Munich (Germany)

    2012-08-15

    MR-based differentiation between low- and high-grade gliomas is predominately based on contrast-enhanced T1-weighted images (CE-T1w). However, functional MR sequences as perfusion- and diffusion-weighted sequences can provide additional information on tumor grade. Here, we tested the potential of a recently developed similarity search based method that integrates information of CE-T1w and perfusion maps for non-invasive MR-based glioma grading. We prospectively included 37 untreated glioma patients (23 grade I/II, 14 grade III gliomas), in whom 3T MRI with FLAIR, pre- and post-contrast T1-weighted, and perfusion sequences was performed. Cerebral blood volume, cerebral blood flow, and mean transit time maps as well as CE-T1w images were used as input for the similarity search. Data sets were preprocessed and converted to four-dimensional Gaussian Mixture Models that considered correlations between the different MR sequences. For each patient, a so-called tumor feature vector (= probability-based classifier) was defined and used for grading. Biopsy was used as gold standard, and similarity based grading was compared to grading solely based on CE-T1w. Accuracy, sensitivity, and specificity of pure CE-T1w based glioma grading were 64.9%, 78.6%, and 56.5%, respectively. Similarity search based tumor grading allowed differentiation between low-grade (I or II) and high-grade (III) gliomas with an accuracy, sensitivity, and specificity of 83.8%, 78.6%, and 87.0%. Our findings indicate that integration of perfusion parameters and CE-T1w information in a semi-automatic similarity search based analysis improves the potential of MR-based glioma grading compared to CE-T1w data alone. (orig.)

  19. Comparison of Two ``Document Similarity Search Engines''

    Poinçot, Phillipe; Lesteven, Soizick; Murtagh, Fionn

    We have developed and used the ``CDS document map'' based on neural networks (Kohonen maps) http://simbad.u-strasbg.fr/A+A/map.pl In this self-organizing map, documents are gradually clustered by subject themes. The tool is based on keywords associated with the documents. For one selected document, we locate it on the CDS document map and retrieve articles clustered in the same area. The second search engine, used by the ADS (NASA Astrophysics Data System http://cdsads.u-strasbg.fr http://adswww.harvard.edu http://ads.nao.ac.jp, has the capability to find all similar abstracts in the ADS database, with ``keyword request''. We have compared the results of the document similarity search engines, using the same set of documents. One example will be described and results will be discussed.

  20. Efficient Video Similarity Measurement and Search

    Cheung, S-C S

    2002-12-19

    The amount of information on the world wide web has grown enormously since its creation in 1990. Duplication of content is inevitable because there is no central management on the web. Studies have shown that many similar versions of the same text documents can be found throughout the web. This redundancy problem is more severe for multimedia content such as web video sequences, as they are often stored in multiple locations and different formats to facilitate downloading and streaming. Similar versions of the same video can also be found, unknown to content creators, when web users modify and republish original content using video editing tools. Identifying similar content can benefit many web applications and content owners. For example, it will reduce the number of similar answers to a web search and identify inappropriate use of copyright content. In this dissertation, they present a system architecture and corresponding algorithms to efficiently measure, search, and organize similar video sequences found on any large database such as the web.

  1. Outsourced similarity search on metric data assets

    Yiu, Man Lung

    2012-02-01

    This paper considers a cloud computing setting in which similarity querying of metric data is outsourced to a service provider. The data is to be revealed only to trusted users, not to the service provider or anyone else. Users query the server for the most similar data objects to a query example. Outsourcing offers the data owner scalability and a low-initial investment. The need for privacy may be due to the data being sensitive (e.g., in medicine), valuable (e.g., in astronomy), or otherwise confidential. Given this setting, the paper presents techniques that transform the data prior to supplying it to the service provider for similarity queries on the transformed data. Our techniques provide interesting trade-offs between query cost and accuracy. They are then further extended to offer an intuitive privacy guarantee. Empirical studies with real data demonstrate that the techniques are capable of offering privacy while enabling efficient and accurate processing of similarity queries.

  2. Earthquake detection through computationally efficient similarity search.

    Yoon, Clara E; O'Reilly, Ossian; Bergen, Karianne J; Beroza, Gregory C

    2015-12-01

    Seismology is experiencing rapid growth in the quantity of data, which has outpaced the development of processing algorithms. Earthquake detection-identification of seismic events in continuous data-is a fundamental operation for observational seismology. We developed an efficient method to detect earthquakes using waveform similarity that overcomes the disadvantages of existing detection methods. Our method, called Fingerprint And Similarity Thresholding (FAST), can analyze a week of continuous seismic waveform data in less than 2 hours, or 140 times faster than autocorrelation. FAST adapts a data mining algorithm, originally designed to identify similar audio clips within large databases; it first creates compact "fingerprints" of waveforms by extracting key discriminative features, then groups similar fingerprints together within a database to facilitate fast, scalable search for similar fingerprint pairs, and finally generates a list of earthquake detections. FAST detected most (21 of 24) cataloged earthquakes and 68 uncataloged earthquakes in 1 week of continuous data from a station located near the Calaveras Fault in central California, achieving detection performance comparable to that of autocorrelation, with some additional false detections. FAST is expected to realize its full potential when applied to extremely long duration data sets over a distributed network of seismic stations. The widespread application of FAST has the potential to aid in the discovery of unexpected seismic signals, improve seismic monitoring, and promote a greater understanding of a variety of earthquake processes. PMID:26665176

  3. Performance Evaluation and Optimization of Math-Similarity Search

    Zhang, Qun; Youssef, Abdou

    2015-01-01

    Similarity search in math is to find mathematical expressions that are similar to a user's query. We conceptualized the similarity factors between mathematical expressions, and proposed an approach to math similarity search (MSS) by defining metrics based on those similarity factors [11]. Our preliminary implementation indicated the advantage of MSS compared to non-similarity based search. In order to more effectively and efficiently search similar math expressions, MSS is further optimized. ...

  4. Highly accurate recommendation algorithm based on high-order similarities

    Liu, Jian-Guo; Wang, Bing-Hong; Zhang, Yi-Cheng

    2008-01-01

    In this Letter, we introduce a modified collaborative filtering (MCF) algorithm, which has remarkably higher accuracy than the standard collaborative filtering. In the MCF, instead of the standard Pearson coefficient, the user-user similarities are obtained by a diffusion process. Furthermore, by considering the second order similarities, we design an effective algorithm that depresses the influence of mainstream preferences. The corresponding algorithmic accuracy, measured by the ranking score, is further improved by 24.9% in the optimal case. In addition, two significant criteria of algorithmic performance, diversity and popularity, are also taken into account. Numerical results show that the algorithm based on second order similarity can outperform the MCF simultaneously in all three criteria.

  5. Web Search Results Summarization Using Similarity Assessment

    Sawant V.V.; Takale S.A.

    2014-01-01

    Now day’s internet has become part of our life, the WWW is most important service of internet because it allows presenting information such as document, imaging etc. The WWW grows rapidly and caters to a diversified levels and categories of users. For user specified results web search results are extracted. Millions of information pouring online, users has no time to surf the contents completely .Moreover the information available is repeated or duplicated in nature. This issue has created th...

  6. Outsourced similarity search on metric data assets

    Yiu, Man Lung; Assent, Ira; Jensen, Christian Søndergaard;

    2012-01-01

    This paper considers a cloud computing setting in which similarity querying of metric data is outsourced to a service provider. The data is to be revealed only to trusted users, not to the service provider or anyone else. Users query the server for the most similar data objects to a query example...

  7. A Similarity Search Using Molecular Topological Graphs

    Yoshifumi Fukunishi

    2009-01-01

    Full Text Available A molecular similarity measure has been developed using molecular topological graphs and atomic partial charges. Two kinds of topological graphs were used. One is the ordinary adjacency matrix and the other is a matrix which represents the minimum path length between two atoms of the molecule. The ordinary adjacency matrix is suitable to compare the local structures of molecules such as functional groups, and the other matrix is suitable to compare the global structures of molecules. The combination of these two matrices gave a similarity measure. This method was applied to in silico drug screening, and the results showed that it was effective as a similarity measure.

  8. Fast similarity search in peer-to-peer networks

    Bocek, T; Hunt, E; Hausheer, D; Stiller, B.

    2008-01-01

    Peer-to-peer (P2P) systems show numerous advantages over centralized systems, such as load balancing, scalability, and fault tolerance, and they require certain functionality, such as search, repair, and message and data transfer. In particular, structured P2P networks perform an exact search in logarithmic time proportional to the number of peers. However, keyword similarity search in a structured P2P network remains a challenge. Similarity search for service discovery can significantly impr...

  9. The Time Course of Similarity Effects in Visual Search

    Guest, Duncan; Lamberts, Koen

    2011-01-01

    It is well established that visual search becomes harder when the similarity between target and distractors is increased and the similarity between distractors is decreased. However, in models of visual search, similarity is typically treated as a static, time-invariant property of the relation between objects. Data from other perceptual tasks…

  10. Learning Style Similarity for Searching Infographics

    Saleh, Babak; Dontcheva, Mira; Hertzmann, Aaron; Liu, Zhicheng

    2015-01-01

    Infographics are complex graphic designs integrating text, images, charts and sketches. Despite the increasing popularity of infographics and the rapid growth of online design portfolios, little research investigates how we can take advantage of these design resources. In this paper we present a method for measuring the style similarity between infographics. Based on human perception data collected from crowdsourced experiments, we use computer vision and machine learning algorithms to learn ...

  11. A Similarity Search Using Molecular Topological Graphs

    2009-01-01

    A molecular similarity measure has been developed using molecular topological graphs and atomic partial charges. Two kinds of topological graphs were used. One is the ordinary adjacency matrix and the other is a matrix which represents the minimum path length between two atoms of the molecule. The ordinary adjacency matrix is suitable to compare the local structures of molecules such as functional groups, and the other matrix is suitable to compare the global structures of molecules. The comb...

  12. Visual similarity is stronger than semantic similarity in guiding visual search for numbers

    Godwin, H.J.; Hout, M.C.; Menneer, T.

    2014-01-01

    Using a visual search task, we explored how behavior is influenced by both visual and semantic information. We recorded participants’ eye movements as they searched for a single target number in a search array of single-digit numbers (0–9). We examined the probability of fixating the various distractors as a function of two key dimensions: the visual similarity between the target and each distractor, and the semantic similarity (i.e., the numerical distance) between the target and each distra...

  13. Fast and secure similarity search in high dimensional space

    Furon, Teddy; Jégou, Hervé; Amsaleg, Laurent; Mathon, Benjamin

    2013-01-01

    Similarity search in high dimensional space database is split into two worlds: i) fast, scalable, and approximate search algorithms which are not secure, and ii) search protocols based on secure computation which are not scalable. This paper presents a one-way privacy protocol that lies in between these two worlds. Approximate metrics for the cosine similarity allows speed. Elements of large random matrix theory provides security evidences if the size of the database is not too big with respe...

  14. Distributed efficient similarity search mechanism in wireless sensor networks.

    Ahmed, Khandakar; Gregory, Mark A

    2015-01-01

    The Wireless Sensor Network similarity search problem has received considerable research attention due to sensor hardware imprecision and environmental parameter variations. Most of the state-of-the-art distributed data centric storage (DCS) schemes lack optimization for similarity queries of events. In this paper, a DCS scheme with metric based similarity searching (DCSMSS) is proposed. DCSMSS takes motivation from vector distance index, called iDistance, in order to transform the issue of similarity searching into the problem of an interval search in one dimension. In addition, a sector based distance routing algorithm is used to efficiently route messages. Extensive simulation results reveal that DCSMSS is highly efficient and significantly outperforms previous approaches in processing similarity search queries. PMID:25751081

  15. Distributed Efficient Similarity Search Mechanism in Wireless Sensor Networks

    Khandakar Ahmed

    2015-03-01

    Full Text Available The Wireless Sensor Network similarity search problem has received considerable research attention due to sensor hardware imprecision and environmental parameter variations. Most of the state-of-the-art distributed data centric storage (DCS schemes lack optimization for similarity queries of events. In this paper, a DCS scheme with metric based similarity searching (DCSMSS is proposed. DCSMSS takes motivation from vector distance index, called iDistance, in order to transform the issue of similarity searching into the problem of an interval search in one dimension. In addition, a sector based distance routing algorithm is used to efficiently route messages. Extensive simulation results reveal that DCSMSS is highly efficient and significantly outperforms previous approaches in processing similarity search queries.

  16. Distributed Efficient Similarity Search Mechanism in Wireless Sensor Networks

    Khandakar Ahmed; Gregory, Mark A.

    2015-01-01

    The Wireless Sensor Network similarity search problem has received considerable research attention due to sensor hardware imprecision and environmental parameter variations. Most of the state-of-the-art distributed data centric storage (DCS) schemes lack optimization for similarity queries of events. In this paper, a DCS scheme with metric based similarity searching (DCSMSS) is proposed. DCSMSS takes motivation from vector distance index, called iDistance, in order to transform the issue of s...

  17. Activity-relevant similarity values for fingerprints and implications for similarity searching

    Swarit Jasial; Ye Hu; Martin Vogt; Jürgen Bajorath

    2016-01-01

    A largely unsolved problem in chemoinformatics is the issue of how calculated compound similarity relates to activity similarity, which is central to many applications. In general, activity relationships are predicted from calculated similarity values. However, there is no solid scientific foundation to bridge between calculated molecular and observed activity similarity. Accordingly, the success rate of identifying new active compounds by similarity searching is limited. Although various att...

  18. How Google Web Search copes with very similar documents

    Mettrop, W.; Nieuwenhuysen, P.; Smulders, H.

    2006-01-01

    A significant portion of the computer files that carry documents, multimedia, programs etc. on the Web are identical or very similar to other files on the Web. How do search engines cope with this? Do they perform some kind of “deduplication”? How should users take into account that web search resul

  19. Effective and Efficient Similarity Search in Scientific Workflow Repositories

    Starlinger, Johannes; Cohen-Boulakia, Sarah; Khanna, Sanjeev; Davidson, Susan; Leser, Ulf

    2015-01-01

    Scientific workflows have become a valuable tool for large-scale data processing and analysis. This has led to the creation of specialized online repositories to facilitate worflkow sharing and reuse. Over time, these repositories have grown to sizes that call for advanced methods to support workflow discovery, in particular for similarity search. Effective similarity search requires both high quality algorithms for the comparison of scientific workflows and efficient strategies for indexing,...

  20. Fast and accurate protein substructure searching with simulated annealing and GPUs

    Stivala Alex D

    2010-09-01

    Full Text Available Abstract Background Searching a database of protein structures for matches to a query structure, or occurrences of a structural motif, is an important task in structural biology and bioinformatics. While there are many existing methods for structural similarity searching, faster and more accurate approaches are still required, and few current methods are capable of substructure (motif searching. Results We developed an improved heuristic for tableau-based protein structure and substructure searching using simulated annealing, that is as fast or faster and comparable in accuracy, with some widely used existing methods. Furthermore, we created a parallel implementation on a modern graphics processing unit (GPU. Conclusions The GPU implementation achieves up to 34 times speedup over the CPU implementation of tableau-based structure search with simulated annealing, making it one of the fastest available methods. To the best of our knowledge, this is the first application of a GPU to the protein structural search problem.

  1. Indexing schemes for similarity search: an illustrated paradigm

    Pestov, Vladimir; Stojmirovic, Aleksandar

    2002-01-01

    We suggest a variation of the Hellerstein--Koutsoupias--Papadimitriou indexability model for datasets equipped with a similarity measure, with the aim of better understanding the structure of indexing schemes for similarity-based search and the geometry of similarity workloads. This in particular provides a unified approach to a great variety of schemes used to index into metric spaces and facilitates their transfer to more general similarity measures such as quasi-metrics. We discuss links b...

  2. Efficient Subgraph Similarity Search on Large Probabilistic Graph Databases

    Yuan, Ye; Chen, Lei; Wang, Haixun

    2012-01-01

    Many studies have been conducted on seeking the efficient solution for subgraph similarity search over certain (deterministic) graphs due to its wide application in many fields, including bioinformatics, social network analysis, and Resource Description Framework (RDF) data management. All these works assume that the underlying data are certain. However, in reality, graphs are often noisy and uncertain due to various factors, such as errors in data extraction, inconsistencies in data integration, and privacy preserving purposes. Therefore, in this paper, we study subgraph similarity search on large probabilistic graph databases. Different from previous works assuming that edges in an uncertain graph are independent of each other, we study the uncertain graphs where edges' occurrences are correlated. We formally prove that subgraph similarity search over probabilistic graphs is #P-complete, thus, we employ a filter-and-verify framework to speed up the search. In the filtering phase,we develop tight lower and u...

  3. SEARCH PROFILES BASED ON USER TO CLUSTER SIMILARITY

    Ilija Subasic

    2007-12-01

    Full Text Available Privacy of web users' query search logs has, since last year's AOL dataset release, been treated as one of the central issues concerning privacy on the Internet, Therefore, the question of privacy preservation has also raised a lot of attention in different communities surrounding the search engines. Usage of clustering methods for providing low level contextual search, wriile retaining high privacy/utility is examined in this paper. By using only the user's cluster membership the search query terms could be no longer retained thus providing less privacy concerns both for the users and companies. The paper brings lightweight framework for combining query words, user similarities and clustering in order to provide a meaningful way of mining user searches while protecting their privacy. This differs from previous attempts for privacy preserving in the attempt to anonymize the queries instead of the users.

  4. The breakfast effect: dogs (Canis familiaris) search more accurately when they are less hungry.

    Miller, Holly C; Bender, Charlotte

    2012-11-01

    We investigated whether the consumption of a morning meal (breakfast) by dogs (Canis familiaris) would affect search accuracy on a working memory task following the exertion of self-control. Dogs were tested either 30 or 90 min after consuming half of their daily resting energy requirements (RER). During testing dogs were initially required to sit still for 10 min before searching for hidden food in a visible displacement task. We found that 30 min following the consumption of breakfast, and 10 min after the behavioral inhibition task, dogs searched more accurately than they did in a fasted state. Similar differences were not observed when dogs were tested 90 min after meal consumption. This pattern of behavior suggests that breakfast enhanced search accuracy following a behavioral inhibition task by providing energy for cognitive processes, and that search accuracy decreased as a function of energy depletion. PMID:23032958

  5. Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision

    Holliday John D; Kanoulas Evangelos; Malim Nurul; Willett Peter

    2011-01-01

    Abstract Background Data fusion methods are widely used in virtual screening, and make the implicit assumption that the more often a molecule is retrieved in multiple similarity searches, the more likely it is to be active. This paper tests the correctness of this assumption. Results Sets of 25 searches using either the same reference structure and 25 different similarity measures (similarity fusion) or 25 different reference structures and the same similarity measure (group fusion) show that...

  6. Improving spectral library search by redefining similarity measures.

    Garg, Ankita; Enright, Catherine G; Madden, Michael G

    2015-05-26

    Similarity plays a central role in spectral library search. The goal of spectral library search is to identify those spectra in a reference library of known materials that most closely match an unknown query spectrum, on the assumption that this will allow us to identify the main constituent(s) of the query spectrum. The similarity measures used for this task in software and the academic literature are almost exclusively metrics, meaning that the measures obey the three axioms of metrics: (1) minimality; (2) symmetry; (3) triangle inequality. Consequently, they implicitly assume that the query spectrum is drawn from the same distribution as that of the reference library. In this paper, we demonstrate that this assumption is not necessary in practical spectral library search and that in fact it is often violated in practice. Although the reference library may be constructed carefully, it is generally impossible to guarantee that all future query spectra will be drawn from the same distribution as the reference library. Before evaluating different similarity measures, we need to understand how they define the relationship between spectra. In spectral library search, we often aim to find the constituent(s) of a mixture. We propose that, rather than asking which reference library spectra are similar to the mixture, we should ask which of the reference library spectra are contained in the given query mixture. This question is inherently asymmetric. Therefore, we should adopt a nonmetric measure. To evaluate our hypothesis, we apply a nonmetric measure formulated by Tversky [Psychol. Rev. 1977, 84, 327-352] known as the Contrast Model and compare its performance to the well-known Jaccard similarity index metric on spectroscopic data sets. Our results show that the Tversky similarity measure yields better results than the Jaccard index. PMID:25902003

  7. A Visual Similarity-Based 3D Search Engine

    Lmaati, Elmustapha Ait; Oirrak, Ahmed El; M.N. Kaddioui

    2009-01-01

    Retrieval systems for 3D objects are required because 3D databases used around the web are growing. In this paper, we propose a visual similarity based search engine for 3D objects. The system is based on a new representation of 3D objects given by a 3D closed curve that captures all information about the surface of the 3D object. We propose a new 3D descriptor, which is a combination of three signatures of this new representation, and we implement it in our interactive web based search engin...

  8. RAPSearch: a fast protein similarity search tool for short reads

    Choi Jeong-Hyeon

    2011-05-01

    Full Text Available Abstract Background Next Generation Sequencing (NGS is producing enormous corpuses of short DNA reads, affecting emerging fields like metagenomics. Protein similarity search--a key step to achieve annotation of protein-coding genes in these short reads, and identification of their biological functions--faces daunting challenges because of the very sizes of the short read datasets. Results We developed a fast protein similarity search tool RAPSearch that utilizes a reduced amino acid alphabet and suffix array to detect seeds of flexible length. For short reads (translated in 6 frames we tested, RAPSearch achieved ~20-90 times speedup as compared to BLASTX. RAPSearch missed only a small fraction (~1.3-3.2% of BLASTX similarity hits, but it also discovered additional homologous proteins (~0.3-2.1% that BLASTX missed. By contrast, BLAT, a tool that is even slightly faster than RAPSearch, had significant loss of sensitivity as compared to RAPSearch and BLAST. Conclusions RAPSearch is implemented as open-source software and is accessible at http://omics.informatics.indiana.edu/mg/RAPSearch. It enables faster protein similarity search. The application of RAPSearch in metageomics has also been demonstrated.

  9. Online multiple kernel similarity learning for visual search.

    Xia, Hao; Hoi, Steven C H; Jin, Rong; Zhao, Peilin

    2014-03-01

    Recent years have witnessed a number of studies on distance metric learning to improve visual similarity search in content-based image retrieval (CBIR). Despite their successes, most existing methods on distance metric learning are limited in two aspects. First, they usually assume the target proximity function follows the family of Mahalanobis distances, which limits their capacity of measuring similarity of complex patterns in real applications. Second, they often cannot effectively handle the similarity measure of multimodal data that may originate from multiple resources. To overcome these limitations, this paper investigates an online kernel similarity learning framework for learning kernel-based proximity functions which goes beyond the conventional linear distance metric learning approaches. Based on the framework, we propose a novel online multiple kernel similarity (OMKS) learning method which learns a flexible nonlinear proximity function with multiple kernels to improve visual similarity search in CBIR. We evaluate the proposed technique for CBIR on a variety of image data sets in which encouraging results show that OMKS outperforms the state-of-the-art techniques significantly. PMID:24457509

  10. Similarity preserving snippet-based visualization of web search results.

    Gomez-Nieto, Erick; San Roman, Frizzi; Pagliosa, Paulo; Casaca, Wallace; Helou, Elias S; de Oliveira, Maria Cristina F; Nonato, Luis Gustavo

    2014-03-01

    Internet users are very familiar with the results of a search query displayed as a ranked list of snippets. Each textual snippet shows a content summary of the referred document (or webpage) and a link to it. This display has many advantages, for example, it affords easy navigation and is straightforward to interpret. Nonetheless, any user of search engines could possibly report some experience of disappointment with this metaphor. Indeed, it has limitations in particular situations, as it fails to provide an overview of the document collection retrieved. Moreover, depending on the nature of the query--for example, it may be too general, or ambiguous, or ill expressed--the desired information may be poorly ranked, or results may contemplate varied topics. Several search tasks would be easier if users were shown an overview of the returned documents, organized so as to reflect how related they are, content wise. We propose a visualization technique to display the results of web queries aimed at overcoming such limitations. It combines the neighborhood preservation capability of multidimensional projections with the familiar snippet-based representation by employing a multidimensional projection to derive two-dimensional layouts of the query search results that preserve text similarity relations, or neighborhoods. Similarity is computed by applying the cosine similarity over a "bag-of-words" vector representation of collection built from the snippets. If the snippets are displayed directly according to the derived layout, they will overlap considerably, producing a poor visualization. We overcome this problem by defining an energy functional that considers both the overlapping among snippets and the preservation of the neighborhood structure as given in the projected layout. Minimizing this energy functional provides a neighborhood preserving two-dimensional arrangement of the textual snippets with minimum overlap. The resulting visualization conveys both a global

  11. Self-Taught Hashing for Fast Similarity Search

    Zhang, Dell; Cai, Deng; Lu, Jinsong

    2010-01-01

    The ability of fast similarity search at large scale is of great importance to many Information Retrieval (IR) applications. A promising way to accelerate similarity search is semantic hashing which designs compact binary codes for a large number of documents so that semantically similar documents are mapped to similar codes (within a short Hamming distance). Although some recently proposed techniques are able to generate high-quality codes for documents known in advance, obtaining the codes for previously unseen documents remains to be a very challenging problem. In this paper, we emphasise this issue and propose a novel Self-Taught Hashing (STH) approach to semantic hashing: we first find the optimal $l$-bit binary codes for all documents in the given corpus via unsupervised learning, and then train $l$ classifiers via supervised learning to predict the $l$-bit code for any query document unseen before. Our experiments on three real-world text datasets show that the proposed approach using binarised Laplaci...

  12. Computing Semantic Similarity Measure Between Words Using Web Search Engine

    Pushpa C N

    2013-05-01

    Full Text Available Semantic Similarity measures between words plays an important role in information retrieval, natural language processing and in various tasks on the web. In this paper, we have proposed a Modified Pattern Extraction Algorithm to compute th e supervised semantic similarity measure between the words by combining both page count meth od and web snippets method. Four association measures are used to find semantic simi larity between words in page count method using web search engines. We use a Sequential Minim al Optimization (SMO support vector machines (SVM to find the optimal combination of p age counts-based similarity scores and top-ranking patterns from the web snippets method. The SVM is trained to classify synonymous word-pairs and non-synonymous word-pairs. The propo sed Modified Pattern Extraction Algorithm outperforms by 89.8 percent of correlatio n value.

  13. On optimizing distance-based similarity search for biological databases.

    Mao, Rui; Xu, Weijia; Ramakrishnan, Smriti; Nuckolls, Glen; Miranker, Daniel P

    2005-01-01

    Similarity search leveraging distance-based index structures is increasingly being used for both multimedia and biological database applications. We consider distance-based indexing for three important biological data types, protein k-mers with the metric PAM model, DNA k-mers with Hamming distance and peptide fragmentation spectra with a pseudo-metric derived from cosine distance. To date, the primary driver of this research has been multimedia applications, where similarity functions are often Euclidean norms on high dimensional feature vectors. We develop results showing that the character of these biological workloads is different from multimedia workloads. In particular, they are not intrinsically very high dimensional, and deserving different optimization heuristics. Based on MVP-trees, we develop a pivot selection heuristic seeking centers and show it outperforms the most widely used corner seeking heuristic. Similarly, we develop a data partitioning approach sensitive to the actual data distribution in lieu of median splits. PMID:16447992

  14. On Fuzzy vs. Metric Similarity Search in Complex Databases

    Eckhardt, Alan; Skopal, T.; Vojtáš, Peter

    Berlin: Springer, 2009 - ( And reasen, T.; Yager, R.; Bulskov, H.; Christiansen, H.; Larsen, H.), s. 64-75. (Lecture Notes in Artificial Intelligence . 5822). ISBN 978-3-642-04956-9. ISSN 0302-9743. [FQAS 2009. International Conference on Flexible Query Answering Systems /8./. Roskilde (DK), 26.10.2009-28.10.2009] R&D Projects: GA AV ČR 1ET100300517; GA ČR GD201/09/H057 Grant ostatní: GA ČR(CZ) GA201/09/0683 Institutional research plan: CEZ:AV0Z10300504 Keywords : fuzzy operators * non-metric search * similarity search * indexing Subject RIV: IN - Informatics, Computer Science

  15. SHOP: scaffold hopping by GRID-based similarity searches

    Bergmann, Rikke; Linusson, Anna; Zamora, Ismael

    2007-01-01

    A new GRID-based method for scaffold hopping (SHOP) is presented. In a fully automatic manner, scaffolds were identified in a database based on three types of 3D-descriptors. SHOP's ability to recover scaffolds was assessed and validated by searching a database spiked with fragments of known...... scaffolds were in the 31 top-ranked scaffolds. SHOP also identified new scaffolds with substantially different chemotypes from the queries. Docking analysis indicated that the new scaffolds would have similar binding modes to those of the respective query scaffolds observed in X-ray structures. The...

  16. Quick and easy implementation of approximate similarity search with Lucene

    Amato, Giuseppe; Bolettieri, Paolo; Gennaro, Claudio; Rabitti, Fausto

    2013-01-01

    Similarity search technique has been proved to be an effective way for retrieving multimedia content. However, as the amount of available multimedia data increases, the cost of developing from scratch a robust and scalable system with content-based image retrieval facilities is quite prohibitive. In this paper, we propose to exploit an approach that allows us to convert low level features into a textual form. In this way, we are able to easily set up a retrieval system on top of the Lucene se...

  17. An efficient similarity search based on indexing in large DNA databases.

    Jeong, In-Seon; Park, Kyoung-Wook; Kang, Seung-Ho; Lim, Hyeong-Seok

    2010-04-01

    Index-based search algorithms are an important part of a genomic search, and how to construct indices is the key to an index-based search algorithm to compute similarities between two DNA sequences. In this paper, we propose an efficient query processing method that uses special transformations to construct an index. It uses small storage and it rapidly finds the similarity between two sequences in a DNA sequence database. At first, a sequence is partitioned into equal length windows. We select the likely subsequences by computing Hamming distance to query sequence. The algorithm then transforms the subsequences in each window into a multidimensional vector space by indexing the frequencies of the characters, including the positional information of the characters in the subsequences. The result of our experiments shows that the algorithm has faster run time than other heuristic algorithms based on index structure. Also, the algorithm is as accurate as those heuristic algorithms. PMID:20418167

  18. Rank-Based Similarity Search: Reducing the Dimensional Dependence.

    Houle, Michael E; Nett, Michael

    2015-01-01

    This paper introduces a data structure for k-NN search, the Rank Cover Tree (RCT), whose pruning tests rely solely on the comparison of similarity values; other properties of the underlying space, such as the triangle inequality, are not employed. Objects are selected according to their ranks with respect to the query object, allowing much tighter control on the overall execution costs. A formal theoretical analysis shows that with very high probability, the RCT returns a correct query result in time that depends very competitively on a measure of the intrinsic dimensionality of the data set. The experimental results for the RCT show that non-metric pruning strategies for similarity search can be practical even when the representational dimension of the data is extremely high. They also show that the RCT is capable of meeting or exceeding the level of performance of state-of-the-art methods that make use of metric pruning or other selection tests involving numerical constraints on distance values. PMID:26353214

  19. Query-dependent banding (QDB for faster RNA similarity searches.

    Eric P Nawrocki

    2007-03-01

    Full Text Available When searching sequence databases for RNAs, it is desirable to score both primary sequence and RNA secondary structure similarity. Covariance models (CMs are probabilistic models well-suited for RNA similarity search applications. However, the computational complexity of CM dynamic programming alignment algorithms has limited their practical application. Here we describe an acceleration method called query-dependent banding (QDB, which uses the probabilistic query CM to precalculate regions of the dynamic programming lattice that have negligible probability, independently of the target database. We have implemented QDB in the freely available Infernal software package. QDB reduces the average case time complexity of CM alignment from LN(2.4 to LN(1.3 for a query RNA of N residues and a target database of L residues, resulting in a 4-fold speedup for typical RNA queries. Combined with other improvements to Infernal, including informative mixture Dirichlet priors on model parameters, benchmarks also show increased sensitivity and specificity resulting from improved parameterization.

  20. Efficient Similarity Search Using the Earth Mover's Distance for Large Multimedia Databases

    Assent, Ira; Wichterich, Marc; Meisen, Tobias;

    2008-01-01

    Multimedia similarity search in large databases requires efficient query processing. The Earth mover's distance, introduced in computer vision, is successfully used as a similarity model in a number of small-scale applications. Its computational complexity hindered its adoption in large multimedia...... databases. We enable directly indexing the Earth mover's distance in structures such as the R-tree and the VA-file by providing the accurate 'MinDist' function to any bounding rectangle in the index. We exploit the computational structure of the new MinDist to derive a new lower bound for the EMD Min...

  1. Activity-relevant similarity values for fingerprints and implications for similarity searching [version 1; referees: 3 approved

    Swarit Jasial

    2016-04-01

    Full Text Available A largely unsolved problem in chemoinformatics is the issue of how calculated compound similarity relates to activity similarity, which is central to many applications. In general, activity relationships are predicted from calculated similarity values. However, there is no solid scientific foundation to bridge between calculated molecular and observed activity similarity. Accordingly, the success rate of identifying new active compounds by similarity searching is limited. Although various attempts have been made to establish relationships between calculated fingerprint similarity values and biological activities, none of these has yielded generally applicable rules for similarity searching. In this study, we have addressed the question of molecular versus activity similarity in a more fundamental way. First, we have evaluated if activity-relevant similarity value ranges could in principle be identified for standard fingerprints and distinguished from similarity resulting from random compound comparisons. Then, we have analyzed if activity-relevant similarity values could be used to guide typical similarity search calculations aiming to identify active compounds in databases. It was found that activity-relevant similarity values can be identified as a characteristic feature of fingerprints. However, it was also shown that such values cannot be reliably used as thresholds for practical similarity search calculations. In addition, the analysis presented herein helped to rationalize differences in fingerprint search performance.

  2. Fast and accurate database searches with MS-GF+Percolator.

    Granholm, Viktor; Kim, Sangtae; Navarro, José C F; Sjölund, Erik; Smith, Richard D; Käll, Lukas

    2014-02-01

    One can interpret fragmentation spectra stemming from peptides in mass-spectrometry-based proteomics experiments using so-called database search engines. Frequently, one also runs post-processors such as Percolator to assess the confidence, infer unique peptides, and increase the number of identifications. A recent search engine, MS-GF+, has shown promising results, due to a new and efficient scoring algorithm. However, MS-GF+ provides few statistical estimates about the peptide-spectrum matches, hence limiting the biological interpretation. Here, we enabled Percolator processing for MS-GF+ output and observed an increased number of identified peptides for a wide variety of data sets. In addition, Percolator directly reports p values and false discovery rate estimates, such as q values and posterior error probabilities, for peptide-spectrum matches, peptides, and proteins, functions that are useful for the whole proteomics community. PMID:24344789

  3. An accurate algorithm to calculate the Hurst exponent of self-similar processes

    In this paper, we introduce a new approach which generalizes the GM2 algorithm (introduced in Sánchez-Granero et al. (2008) [52]) as well as fractal dimension algorithms (FD1, FD2 and FD3) (first appeared in Sánchez-Granero et al. (2012) [51]), providing an accurate algorithm to calculate the Hurst exponent of self-similar processes. We prove that this algorithm performs properly in the case of short time series when fractional Brownian motions and Lévy stable motions are considered. We conclude the paper with a dynamic study of the Hurst exponent evolution in the S and P500 index stocks. - Highlights: • We provide a new approach to properly calculate the Hurst exponent. • This generalizes FD algorithms and GM2, introduced previously by the authors. • This method (FD4) results especially appropriate for short time series. • FD4 may be used in both unifractal and multifractal contexts. • As an empirical application, we show that S and P500 stocks improved their efficiency

  4. Keyword Search over Data Service Integration for Accurate Results

    Zemleris, Vidmantas; Robert Gwadera

    2013-01-01

    Virtual data integration provides a coherent interface for querying heterogeneous data sources (e.g., web services, proprietary systems) with minimum upfront effort. Still, this requires its users to learn the query language and to get acquainted with data organization, which may pose problems even to proficient users. We present a keyword search system, which proposes a ranked list of structured queries along with their explanations. It operates mainly on the metadata, such as the constraints on inputs accepted by services. It was developed as an integral part of the CMS data discovery service, and is currently available as open source.

  5. Keyword search over data service integration for accurate results

    Virtual Data Integration provides a coherent interface for querying heterogeneous data sources (e.g., web services, proprietary systems) with minimum upfront effort. Still, this requires its users to learn a new query language and to get acquainted with data organization which may pose problems even to proficient users. We present a keyword search system, which proposes a ranked list of structured queries along with their explanations. It operates mainly on the metadata, such as the constraints on inputs accepted by services. It was developed as an integral part of the CMS data discovery service, and is currently available as open source.

  6. Activity-relevant similarity values for fingerprints and implications for similarity searching [version 2; referees: 3 approved

    Swarit Jasial; Ye Hu; Martin Vogt; Jürgen Bajorath

    2016-01-01

    A largely unsolved problem in chemoinformatics is the issue of how calculated compound similarity relates to activity similarity, which is central to many applications. In general, activity relationships are predicted from calculated similarity values. However, there is no solid scientific foundation to bridge between calculated molecular and observed activity similarity. Accordingly, the success rate of identifying new active compounds by similarity searching is limited. Although various att...

  7. Similarity between Grover's quantum search algorithm and classical two-body collisions

    Zhang, Jingfu; Lu, Zhiheng

    2001-01-01

    By studying the attribute of the inversion about average operation in quantum searching algorithm, we find the similarity between the quantum searching and the course of two rigid bodies'collision. Some related questions are discussed from this similarity.

  8. Gene expression module-based chemical function similarity search

    Li, Yun; Hao, Pei; Zheng, Siyuan; Tu, Kang; Fan, Haiwei; Zhu, Ruixin; Ding, Guohui; Dong, Changzheng; Wang, Chuan; Li, Xuan; Thiesen, H.-J.; Chen, Y. Eugene; Jiang, HuaLiang; Liu, Lei; Li, Yixue

    2008-01-01

    Investigation of biological processes using selective chemical interventions is generally applied in biomedical research and drug discovery. Many studies of this kind make use of gene expression experiments to explore cellular responses to chemical interventions. Recently, some research groups constructed libraries of chemical related expression profiles, and introduced similarity comparison into chemical induced transcriptome analysis. Resembling sequence similarity alignment, expression pat...

  9. Cognitive Residues of Similarity: 'After-Effects' of Similarity Computations in Visual Search

    O'Toole, Stephanie; Keane, Mark T.

    2013-01-01

    What are the 'cognitive after-effects' of making a similarity judgement? What, cognitively, is left behind and what effect might these residues have on subsequent processing? In this paper, we probe for such after-effects using a visual searcht ask, performed after a task in which pictures of real-world objects were compared. So, target objects were first presented in a comparison task (e.g., rate the similarity of this object to another) thus, presumably, modifying some of their features bef...

  10. G-Hash: Towards Fast Kernel-based Similarity Search in Large Graph Databases

    Wang, Xiaohong; Smalter, Aaron; Huan, Jun; Lushington, Gerald H.

    2009-01-01

    Structured data including sets, sequences, trees and graphs, pose significant challenges to fundamental aspects of data management such as efficient storage, indexing, and similarity search. With the fast accumulation of graph databases, similarity search in graph databases has emerged as an important research topic. Graph similarity search has applications in a wide range of domains including cheminformatics, bioinformatics, sensor network management, social network management, and XML docum...

  11. Ranking and clustering of search results: Analysis of Similarity graph

    Shevchuk, Ksenia Alexander

    2008-01-01

    Evaluate the clustering of the similarity matrix and confirm that it is high. Compare the ranking results of the eigenvector ranking and the Link Popularity ranking and confirm for the high clustered graph the correlation between those is larger than for the low clustered graph.

  12. Density-based similarity measures for content based search

    Hush, Don R [Los Alamos National Laboratory; Porter, Reid B [Los Alamos National Laboratory; Ruggiero, Christy E [Los Alamos National Laboratory

    2009-01-01

    We consider the query by multiple example problem where the goal is to identify database samples whose content is similar to a coUection of query samples. To assess the similarity we use a relative content density which quantifies the relative concentration of the query distribution to the database distribution. If the database distribution is a mixture of the query distribution and a background distribution then it can be shown that database samples whose relative content density is greater than a particular threshold {rho} are more likely to have been generated by the query distribution than the background distribution. We describe an algorithm for predicting samples with relative content density greater than {rho} that is computationally efficient and possesses strong performance guarantees. We also show empirical results for applications in computer network monitoring and image segmentation.

  13. Perceptual Grouping in Haptic Search: The Influence of Proximity, Similarity, and Good Continuation

    Overvliet, Krista E.; Krampe, Ralf Th.; Wagemans, Johan

    2012-01-01

    We conducted a haptic search experiment to investigate the influence of the Gestalt principles of proximity, similarity, and good continuation. We expected faster search when the distractors could be grouped. We chose edges at different orientations as stimuli because they are processed similarly in the haptic and visual modality. We therefore…

  14. Improving image similarity search effectiveness in a multimedia content management system

    Amato, Giuseppe; Falchi, Fabrizio; Gennaro, Claudio; Rabitti, Fausto; Savino, Pasquale; Stanchev, Peter

    2004-01-01

    In this paper, a technique for making more effective the similarity search process of images in a Multimedia Content Management System is proposed. The content-based retrieval process integrates the search on different multimedia components, linked in XML structures. Depending on the specific characteristics of an image data set, some features can be more effective than others when performing similarity search. Starting from this observation, we propose a technique that predicts the effective...

  15. Searching the protein structure database for ligand-binding site similarities using CPASS v.2

    Caprez Adam

    2011-01-01

    Full Text Available Abstract Background A recent analysis of protein sequences deposited in the NCBI RefSeq database indicates that ~8.5 million protein sequences are encoded in prokaryotic and eukaryotic genomes, where ~30% are explicitly annotated as "hypothetical" or "uncharacterized" protein. Our Comparison of Protein Active-Site Structures (CPASS v.2 database and software compares the sequence and structural characteristics of experimentally determined ligand binding sites to infer a functional relationship in the absence of global sequence or structure similarity. CPASS is an important component of our Functional Annotation Screening Technology by NMR (FAST-NMR protocol and has been successfully applied to aid the annotation of a number of proteins of unknown function. Findings We report a major upgrade to our CPASS software and database that significantly improves its broad utility. CPASS v.2 is designed with a layered architecture to increase flexibility and portability that also enables job distribution over the Open Science Grid (OSG to increase speed. Similarly, the CPASS interface was enhanced to provide more user flexibility in submitting a CPASS query. CPASS v.2 now allows for both automatic and manual definition of ligand-binding sites and permits pair-wise, one versus all, one versus list, or list versus list comparisons. Solvent accessible surface area, ligand root-mean square difference, and Cβ distances have been incorporated into the CPASS similarity function to improve the quality of the results. The CPASS database has also been updated. Conclusions CPASS v.2 is more than an order of magnitude faster than the original implementation, and allows for multiple simultaneous job submissions. Similarly, the CPASS database of ligand-defined binding sites has increased in size by ~ 38%, dramatically increasing the likelihood of a positive search result. The modification to the CPASS similarity function is effective in reducing CPASS similarity scores

  16. On a Probabilistic Approach to Determining the Similarity between Boolean Search Request Formulations.

    Radecki, Tadeusz

    1982-01-01

    Presents and discusses the results of research into similarity measures for search request formulations which employ Boolean combinations of index terms. The use of a weighting mechanism to indicate the importance of attributes in a search formulation is described. A 16-item reference list is included. (JL)

  17. Comparative study on Authenticated Sub Graph Similarity Search in Outsourced Graph Database

    N. D. Dhamale; Prof. S. R. Durugkar

    2015-01-01

    Today security is very important in the database system. Advanced database systems face a great challenge raised by the emergence of massive, complex structural data in bioinformatics, chem-informatics, and many other applications. Since exact matching is often too restrictive, similarity search of complex structures becomes a vital operation that must be supported efficiently. The Subgraph similarity search is used in graph databases to retrieve graphs whose subgraphs...

  18. Effects of Part-based Similarity on Visual Search: The Frankenbear Experiment

    Alexander, Robert G.; Zelinsky, Gregory J.

    2012-01-01

    Do the target-distractor and distractor-distractor similarity relationships known to exist for simple stimuli extend to real-world objects, and are these effects expressed in search guidance or target verification? Parts of photorealistic distractors were replaced with target parts to create four levels of target-distractor similarity under heterogenous and homogenous conditions. We found that increasing target-distractor similarity and decreasing distractor-distractor similarity impaired sea...

  19. A Theoretical Framework for Defining Similarity Measures for Boolean Search Request Formulations, Including Some Experimental Results.

    Radecki, Tadeusz

    1985-01-01

    Reports research results into a methodology for determining similarity between queries characterized by Boolean search request formulations and discusses similarity measures for Boolean combinations of index terms. Rationale behind these measures is outlined, and conditions ensuring their equivalence are identified. Results of an experiment…

  20. SAPIR - Executing complex similarity queries over multi layer P2P search structures

    Falchi, Fabrizio; Batko, Michal

    2009-01-01

    This deliverable reports the activities conducted within Task 5.4 "Executing complex similarity queries over multi layer P2P search structures" of the SAPIR project. In particular the deliverable discusses complex similarity queries issues and the implementation of the query processing over the P2P indexing. The document is accompanied by a zip file containing the javadoc for MUFIN.

  1. Development of an accurate 3D blood vessel searching system using NIR light

    Mizuno, Yoshifumi; Katayama, Tsutao; Nakamachi, Eiji

    2010-02-01

    Health monitoring system (HMS) and drug delivery system (DDS) require accurate puncture by needle for automatic blood sampling. In this study, we develop a miniature and high accurate automatic 3D blood vessel searching system. The size of detecting system is 40x25x10 mm. Our searching system use Near-Infrared (NIR) LEDs, CMOS camera modules and image processing units. We employ the stereo method for searching system to determine 3D blood vessel location. Blood vessel visualization system adopts hemoglobin's absorption characterization of NIR light. NIR LED is set behind the finger and it irradiates Near Infrared light for the finger. CMOS camera modules are set in front of the finger and it captures clear blood vessel images. Two dimensional location of the blood vessel is detected by luminance distribution of the image and its depth is calculated by the stereo method. 3D blood vessel location is automatically detected by our image processing system. To examine the accuracy of our detecting system, we carried out experiments using finger phantoms with blood vessel diameters, 0.5, 0.75, 1.0mm, at the depths, 0.5 ~ 2.0 mm, under the artificial tissue surface. Experimental results of depth obtained by our detecting system showed good agreements with given depths, and the availability of this system is confirmed.

  2. SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters

    Lefkowitz Elliot J

    2004-10-01

    Full Text Available Abstract Background Large-scale sequence comparison is a powerful tool for biological inference in modern molecular biology. Comparing new sequences to those in annotated databases is a useful source of functional and structural information about these sequences. Using software such as the basic local alignment search tool (BLAST or HMMPFAM to identify statistically significant matches between newly sequenced segments of genetic material and those in databases is an important task for most molecular biologists. Searching algorithms are intrinsically slow and data-intensive, especially in light of the rapid growth of biological sequence databases due to the emergence of high throughput DNA sequencing techniques. Thus, traditional bioinformatics tools are impractical on PCs and even on dedicated UNIX servers. To take advantage of larger databases and more reliable methods, high performance computation becomes necessary. Results We describe the implementation of SS-Wrapper (Similarity Search Wrapper, a package of wrapper applications that can parallelize similarity search applications on a Linux cluster. Our wrapper utilizes a query segmentation-search (QS-search approach to parallelize sequence database search applications. It takes into consideration load balancing between each node on the cluster to maximize resource usage. QS-search is designed to wrap many different search tools, such as BLAST and HMMPFAM using the same interface. This implementation does not alter the original program, so newly obtained programs and program updates should be accommodated easily. Benchmark experiments using QS-search to optimize BLAST and HMMPFAM showed that QS-search accelerated the performance of these programs almost linearly in proportion to the number of CPUs used. We have also implemented a wrapper that utilizes a database segmentation approach (DS-BLAST that provides a complementary solution for BLAST searches when the database is too large to fit into

  3. MEASURING THE PERFORMANCE OF SIMILARITY PROPAGATION IN AN SEMANTIC SEARCH ENGINE

    S. K. Jayanthi

    2013-10-01

    Full Text Available In the current scenario, web page result personalization is playing a vital role. Nearly 80 % of the users expect the best results in the first page itself without having any persistence to browse longer in URL mode. This research work focuses on two main themes: Semantic web search through online and Domain based search through offline. The first part is to find an effective method which allows grouping similar results together using BookShelf Data Structure and organizing the various clusters. The second one is focused on the academic domain based search through offline. This paper focuses on finding documents which are similar and how Vector space can be used to solve it. So more weightage is given for the principles and working methodology of similarity propagation. Cosine similarity measure is used for finding the relevancy among the documents.

  4. Comparative study on Authenticated Sub Graph Similarity Search in Outsourced Graph Database

    N. D. Dhamale

    2015-11-01

    Full Text Available Today security is very important in the database system. Advanced database systems face a great challenge raised by the emergence of massive, complex structural data in bioinformatics, chem-informatics, and many other applications. Since exact matching is often too restrictive, similarity search of complex structures becomes a vital operation that must be supported efficiently. The Subgraph similarity search is used in graph databases to retrieve graphs whose subgraphs are similar to a given query graph. It has been proven successful in a wide range of applications including bioinformatics and chem-informatics, etc. Due to the cost of providing efficient similarity search services on everincreasing graph data, database outsourcing is apparently an appealing solution to database owners. In this paper, we are studying on authentication techniques that follow the popular filtering-and-verification framework. An authentication-friendly metric index called GMTree. Specifically, we transform the similarity search into a search in a graph metric space and derive small verification objects (VOs to-be-transmitted to query clients. To further optimize GMTree, we are studying on a sampling-based pivot selection method and an authenticated version of MCS computation.

  5. Efficient Retrieval of Images for Search Engine by Visual Similarity and Re Ranking

    Viswa S S

    2013-06-01

    Full Text Available Nowadays, web scale image search engines (e.g. Google Image Search, Microsoft Live Image Search rely almost purely on surrounding text features. Users type keywords in hope of finding a certain type of images. The search engine returns thousands of images ranked by the text keywords extracted from the surrounding text. However, many of returned images are noisy, disorganized, or irrelevant. Even Google and Microsoft have no Visual Information for searching of images. Using visual information to re rank and improve text based image search results is the idea. This improves the precision of the text based image search ranking by incorporating the information conveyed by the visual modality. The typical assumption that the top- images in the text-based search result are equally relevant is relaxed by linking the relevance of the images to their initial rank positions. Then, a number of images from the initial search result are employed as the prototypes that serve to visually represent the query and that are subsequently used to construct meta re rankers .i.e. The most relevant images are found by visual similarity and the average scores are calculated. By applying different meta re rankers to an image from the initial result, re ranking scores are generated, which are then used to find the new rank position for an image in the re ranked search result. Human supervision is introduced to learn the model weights offline, prior to the online re ranking process. While model learning requires manual labelling of the results for a few queries, the resulting model is query independent and therefore applicable to any other query. The experimental results on a representative web image search dataset comprising 353 queries demonstrate that the proposed method outperforms the existing supervised and unsupervised Re ranking approaches. Moreover, it improves the performance over the text-based image search engine by more than 25.48%.

  6. Improving protein structure similarity searches using domain boundaries based on conserved sequence information

    Madej Tom; Wang Yanli; Thompson Kenneth; Bryant Stephen H

    2009-01-01

    Abstract Background The identification of protein domains plays an important role in protein structure comparison. Domain query size and composition are critical to structure similarity search algorithms such as the Vector Alignment Search Tool (VAST), the method employed for computing related protein structures in NCBI Entrez system. Currently, domains identified on the basis of structural compactness are used for VAST computations. In this study, we have investigated how alternative definit...

  7. Efficient and accurate nearest neighbor and closest pair search in high-dimensional space

    Tao, Yufei

    2010-07-01

    Nearest Neighbor (NN) search in high-dimensional space is an important problem in many applications. From the database perspective, a good solution needs to have two properties: (i) it can be easily incorporated in a relational database, and (ii) its query cost should increase sublinearly with the dataset size, regardless of the data and query distributions. Locality-Sensitive Hashing (LSH) is a well-known methodology fulfilling both requirements, but its current implementations either incur expensive space and query cost, or abandon its theoretical guarantee on the quality of query results. Motivated by this, we improve LSH by proposing an access method called the Locality-Sensitive B-tree (LSB-tree) to enable fast, accurate, high-dimensional NN search in relational databases. The combination of several LSB-trees forms a LSB-forest that has strong quality guarantees, but improves dramatically the efficiency of the previous LSH implementation having the same guarantees. In practice, the LSB-tree itself is also an effective index which consumes linear space, supports efficient updates, and provides accurate query results. In our experiments, the LSB-tree was faster than: (i) iDistance (a famous technique for exact NN search) by two orders ofmagnitude, and (ii) MedRank (a recent approximate method with nontrivial quality guarantees) by one order of magnitude, and meanwhile returned much better results. As a second step, we extend our LSB technique to solve another classic problem, called Closest Pair (CP) search, in high-dimensional space. The long-term challenge for this problem has been to achieve subquadratic running time at very high dimensionalities, which fails most of the existing solutions. We show that, using a LSB-forest, CP search can be accomplished in (worst-case) time significantly lower than the quadratic complexity, yet still ensuring very good quality. In practice, accurate answers can be found using just two LSB-trees, thus giving a substantial

  8. A comparison of field-based similarity searching methods: CatShape, FBSS, and ROCS.

    Moffat, Kirstin; Gillet, Valerie J; Whittle, Martin; Bravi, Gianpaolo; Leach, Andrew R

    2008-04-01

    Three field-based similarity methods are compared in retrospective virtual screening experiments. The methods are the CatShape module of CATALYST, ROCS, and an in-house program developed at the University of Sheffield called FBSS. The programs are used in both rigid and flexible searches carried out in the MDL Drug Data Report. UNITY 2D fingerprints are also used to provide a comparison with a more traditional approach to similarity searching, and similarity based on simple whole-molecule properties is used to provide a baseline for the more sophisticated searches. Overall, UNITY 2D fingerprints and ROCS with the chemical force field option gave comparable performance and were superior to the shape-only 3D methods. When the flexible methods were compared with the rigid methods, it was generally found that the flexible methods gave slightly better results than their respective rigid methods; however, the increased performance did not justify the additional computational cost required. PMID:18351728

  9. Ligand scaffold hopping combining 3D maximal substructure search and molecular similarity

    Petitjean Michel

    2009-08-01

    Full Text Available Abstract Background Virtual screening methods are now well established as effective to identify hit and lead candidates and are fully integrated in most drug discovery programs. Ligand-based approaches make use of physico-chemical, structural and energetics properties of known active compounds to search large chemical libraries for related and novel chemotypes. While 2D-similarity search tools are known to be fast and efficient, the use of 3D-similarity search methods can be very valuable to many research projects as integration of "3D knowledge" can facilitate the identification of not only related molecules but also of chemicals possessing distant scaffolds as compared to the query and therefore be more inclined to scaffolds hopping. To date, very few methods performing this task are easily available to the scientific community. Results We introduce a new approach (LigCSRre to the 3D ligand similarity search of drug candidates. It combines a 3D maximum common substructure search algorithm independent on atom order with a tunable description of atomic compatibilities to prune the search and increase its physico-chemical relevance. We show, on 47 experimentally validated active compounds across five protein targets having different specificities, that for single compound search, the approach is able to recover on average 52% of the co-actives in the top 1% of the ranked list which is better than gold standards of the field. Moreover, the combination of several runs on a single protein target using different query active compounds shows a remarkable improvement in enrichment. Such Results demonstrate LigCSRre as a valuable tool for ligand-based screening. Conclusion LigCSRre constitutes a new efficient and generic approach to the 3D similarity screening of small compounds, whose flexible design opens the door to many enhancements. The program is freely available to the academics for non-profit research at: http://bioserv.rpbs.univ-paris-diderot.fr/LigCSRre.html.

  10. Similarity-based search of model organism, disease and drug effect phenotypes

    Hoehndorf, Robert

    2015-02-19

    Background: Semantic similarity measures over phenotype ontologies have been demonstrated to provide a powerful approach for the analysis of model organism phenotypes, the discovery of animal models of human disease, novel pathways, gene functions, druggable therapeutic targets, and determination of pathogenicity. Results: We have developed PhenomeNET 2, a system that enables similarity-based searches over a large repository of phenotypes in real-time. It can be used to identify strains of model organisms that are phenotypically similar to human patients, diseases that are phenotypically similar to model organism phenotypes, or drug effect profiles that are similar to the phenotypes observed in a patient or model organism. PhenomeNET 2 is available at http://aber-owl.net/phenomenet. Conclusions: Phenotype-similarity searches can provide a powerful tool for the discovery and investigation of molecular mechanisms underlying an observed phenotypic manifestation. PhenomeNET 2 facilitates user-defined similarity searches and allows researchers to analyze their data within a large repository of human, mouse and rat phenotypes.

  11. Twin Similarities in Holland Types as Shown by Scores on the Self-Directed Search

    Chauvin, Ida; McDaniel, Janelle R.; Miller, Mark J.; King, James M.; Eddlemon, Ondie L. M.

    2012-01-01

    This study examined the degree of similarity between scores on the Self-Directed Search from one set of identical twins. Predictably, a high congruence score was found. Results from a biographical sheet are discussed as well as implications of the results for career counselors.

  12. Accurate corresponding point search using sphere-attribute-image for statistical bone model generation

    Statistical deformable model based two-dimensional/three-dimensional (2-D/3-D) registration is a promising method for estimating the position and shape of patient bone in the surgical space. Since its accuracy depends on the statistical model capacity, we propose a method for accurately generating a statistical bone model from a CT volume. Our method employs the Sphere-Attribute-Image (SAI) and has improved the accuracy of corresponding point search in statistical model generation. At first, target bone surfaces are extracted as SAIs from the CT volume. Then the textures of SAIs are classified to some regions using Maximally-stable-extremal-regions methods. Next, corresponding regions are determined using Normalized cross-correlation (NCC). Finally, corresponding points in each corresponding region are determined using NCC. The application of our method to femur bone models was performed, and worked well in the experiments. (author)

  13. Similarity and heterogeneity effects in visual search are mediated by "segmentability".

    Utochkin, Igor S; Yurevich, Maria A

    2016-07-01

    The heterogeneity of our visual environment typically reduces the speed with which a singleton target can be found. Visual search theories explain this phenomenon via nontarget similarities and dissimilarities that affect grouping, perceptual noise, and so forth. In this study, we show that increasing the heterogeneity of a display can facilitate rather than inhibit visual search for size and orientation singletons when heterogeneous features smoothly fill the transition between highly distinguishable nontargets. We suggest that this smooth transition reduces the "segmentability" of dissimilar items to otherwise separate subsets, causing the visual system to treat them as a near-homogenous set standing apart from a singleton. (PsycINFO Database Record PMID:26784002

  14. Manifold Learning for Multivariate Variable-Length Sequences With an Application to Similarity Search.

    Ho, Shen-Shyang; Dai, Peng; Rudzicz, Frank

    2016-06-01

    Multivariate variable-length sequence data are becoming ubiquitous with the technological advancement in mobile devices and sensor networks. Such data are difficult to compare, visualize, and analyze due to the nonmetric nature of data sequence similarity measures. In this paper, we propose a general manifold learning framework for arbitrary-length multivariate data sequences driven by similarity/distance (parameter) learning in both the original data sequence space and the learned manifold. Our proposed algorithm transforms the data sequences in a nonmetric data sequence space into feature vectors in a manifold that preserves the data sequence space structure. In particular, the feature vectors in the manifold representing similar data sequences remain close to one another and far from the feature points corresponding to dissimilar data sequences. To achieve this objective, we assume a semisupervised setting where we have knowledge about whether some of data sequences are similar or dissimilar, called the instance-level constraints. Using this information, one learns the similarity measure for the data sequence space and the distance measures for the manifold. Moreover, we describe an approach to handle the similarity search problem given user-defined instance level constraints in the learned manifold using a consensus voting scheme. Experimental results on both synthetic data and real tropical cyclone sequence data are presented to demonstrate the feasibility of our manifold learning framework and the robustness of performing similarity search in the learned manifold. PMID:25781959

  15. SW#db: GPU-Accelerated Exact Sequence Similarity Database Search.

    Matija Korpar

    Full Text Available In recent years we have witnessed a growth in sequencing yield, the number of samples sequenced, and as a result-the growth of publicly maintained sequence databases. The increase of data present all around has put high requirements on protein similarity search algorithms with two ever-opposite goals: how to keep the running times acceptable while maintaining a high-enough level of sensitivity. The most time consuming step of similarity search are the local alignments between query and database sequences. This step is usually performed using exact local alignment algorithms such as Smith-Waterman. Due to its quadratic time complexity, alignments of a query to the whole database are usually too slow. Therefore, the majority of the protein similarity search methods prior to doing the exact local alignment apply heuristics to reduce the number of possible candidate sequences in the database. However, there is still a need for the alignment of a query sequence to a reduced database. In this paper we present the SW#db tool and a library for fast exact similarity search. Although its running times, as a standalone tool, are comparable to the running times of BLAST, it is primarily intended to be used for exact local alignment phase in which the database of sequences has already been reduced. It uses both GPU and CPU parallelization and was 4-5 times faster than SSEARCH, 6-25 times faster than CUDASW++ and more than 20 times faster than SSW at the time of writing, using multiple queries on Swiss-prot and Uniref90 databases.

  16. Wikipedia Chemical Structure Explorer: substructure and similarity searching of molecules from Wikipedia

    Ertl, Peter; Patiny, Luc; Sander, Thomas; Rufener, Christian; Zasso, Michaël

    2015-01-01

    Background Wikipedia, the world’s largest and most popular encyclopedia is an indispensable source of chemistry information. It contains among others also entries for over 15,000 chemicals including metabolites, drugs, agrochemicals and industrial chemicals. To provide an easy access to this wealth of information we decided to develop a substructure and similarity search tool for chemical structures referenced in Wikipedia. Results We extracted chemical structures from entries in Wikipedia an...

  17. Target enhanced 2D similarity search by using explicit biological activity annotations and profiles

    Yu, Xiang; Geer, Lewis Y.; Han, Lianyi; Bryant, Stephen H

    2015-01-01

    Background The enriched biological activity information of compounds in large and freely-accessible chemical databases like the PubChem Bioassay Database has become a powerful research resource for the scientific research community. Currently, 2D fingerprint based conventional similarity search (CSS) is the most common widely used approach for database screening, but it does not typically incorporate the relative importance of fingerprint bits to biological activity. Results In this study, a ...

  18. Protein similarity search with subset seeds on a dedicated reconfigurable hardware

    Peterlongo, Pierre; Noé, Laurent; Lavenier, Dominique; Georges, Gilles; Jacques, Julien; Kucherov, Gregory; Giraud, Mathieu

    2007-01-01

    Genome sequencing of numerous species raises the need of complete genome comparison with precise and fast similarity searches. Today, advanced seed-based techniques (spaced seeds, multiple seeds, subset seeds) provide better sensitivity/specificity ratios. We present an implementation of such a seed-based technique onto parallel specialized hardware embedding reconfigurable architecture (FPGA), where the FPGA is tightly connected to large capacity Flash memories. This parallel system allows l...

  19. Database searching for compounds with similar biological activity using short binary bit string representations of molecules.

    Xue, L; Godden, J W; Bajorath, J

    1999-01-01

    In an effort to identify biologically active molecules in compound databases, we have investigated similarity searching using short binary bit strings with a maximum of 54 bit positions. These "minifingerprints" (MFPs) were designed to account for the presence or absence of structural fragments and/or aromatic character, flexibility, and hydrogen-bonding capacity of molecules. MFP design was based on an analysis of distributions of molecular descriptors and structural fragments in two large compound collections. The performance of different MFPs and a reference fingerprint was tested by systematic "one-against-all" similarity searches of molecules in a database containing 364 compounds with different biological activities. For each fingerprint, the most effective similarity cutoff value was determined. An MFP accounting for only 32 structural fragments showed less than 2% false positive similarity matches and correctly assigned on average approximately 40% of the compounds with the same biological activity to a query molecule. Inclusion of three numerical two-dimensional (2D) molecular descriptors increased the performance by 15%. This MFP performed better than a complex 2D fingerprint. At a similarity cutoff value of 0.85, the 2D fingerprint totally eliminated false positives but recognized less than 10% of the compounds within the same activity class. PMID:10529986

  20. WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTERN RETRIEVAL ALGORITHM

    Pushpa C N

    2013-02-01

    Full Text Available Semantic Similarity measures plays an important role in information retrieval, natural language processing and various tasks on web such as relation extraction, community mining, document clustering, and automatic meta-data extraction. In this paper, we have proposed a Pattern Retrieval Algorithm [PRA] to compute the semantic similarity measure between the words by combining both page count method and web snippets method. Four association measures are used to find semantic similarity between words in page count method using web search engines. We use a Sequential Minimal Optimization (SMO support vector machines (SVM to find the optimal combination of page counts-based similarity scores and top-ranking patterns from the web snippets method. The SVM is trained to classify synonymous word-pairs and nonsynonymous word-pairs. The proposed approach aims to improve the Correlation values, Precision, Recall, and F-measures, compared to the existing methods. The proposed algorithm outperforms by 89.8 % of correlation value.

  1. Semantic similarity measures in the biomedical domain by leveraging a web search engine.

    Hsieh, Sheau-Ling; Chang, Wen-Yung; Chen, Chi-Huang; Weng, Yung-Ching

    2013-07-01

    Various researches in web related semantic similarity measures have been deployed. However, measuring semantic similarity between two terms remains a challenging task. The traditional ontology-based methodologies have a limitation that both concepts must be resided in the same ontology tree(s). Unfortunately, in practice, the assumption is not always applicable. On the other hand, if the corpus is sufficiently adequate, the corpus-based methodologies can overcome the limitation. Now, the web is a continuous and enormous growth corpus. Therefore, a method of estimating semantic similarity is proposed via exploiting the page counts of two biomedical concepts returned by Google AJAX web search engine. The features are extracted as the co-occurrence patterns of two given terms P and Q, by querying P, Q, as well as P AND Q, and the web search hit counts of the defined lexico-syntactic patterns. These similarity scores of different patterns are evaluated, by adapting support vector machines for classification, to leverage the robustness of semantic similarity measures. Experimental results validating against two datasets: dataset 1 provided by A. Hliaoutakis; dataset 2 provided by T. Pedersen, are presented and discussed. In dataset 1, the proposed approach achieves the best correlation coefficient (0.802) under SNOMED-CT. In dataset 2, the proposed method obtains the best correlation coefficient (SNOMED-CT: 0.705; MeSH: 0.723) with physician scores comparing with measures of other methods. However, the correlation coefficients (SNOMED-CT: 0.496; MeSH: 0.539) with coder scores received opposite outcomes. In conclusion, the semantic similarity findings of the proposed method are close to those of physicians' ratings. Furthermore, the study provides a cornerstone investigation for extracting fully relevant information from digitizing, free-text medical records in the National Taiwan University Hospital database. PMID:25055314

  2. Gene network homology in prokaryotes using a similarity search approach: queries of quorum sensing signal transduction.

    David N Quan

    Full Text Available Bacterial cell-cell communication is mediated by small signaling molecules known as autoinducers. Importantly, autoinducer-2 (AI-2 is synthesized via the enzyme LuxS in over 80 species, some of which mediate their pathogenicity by recognizing and transducing this signal in a cell density dependent manner. AI-2 mediated phenotypes are not well understood however, as the means for signal transduction appears varied among species, while AI-2 synthesis processes appear conserved. Approaches to reveal the recognition pathways of AI-2 will shed light on pathogenicity as we believe recognition of the signal is likely as important, if not more, than the signal synthesis. LMNAST (Local Modular Network Alignment Similarity Tool uses a local similarity search heuristic to study gene order, generating homology hits for the genomic arrangement of a query gene sequence. We develop and apply this tool for the E. coli lac and LuxS regulated (Lsr systems. Lsr is of great interest as it mediates AI-2 uptake and processing. Both test searches generated results that were subsequently analyzed through a number of different lenses, each with its own level of granularity, from a binary phylogenetic representation down to trackback plots that preserve genomic organizational information. Through a survey of these results, we demonstrate the identification of orthologs, paralogs, hitchhiking genes, gene loss, gene rearrangement within an operon context, and also horizontal gene transfer (HGT. We found a variety of operon structures that are consistent with our hypothesis that the signal can be perceived and transduced by homologous protein complexes, while their regulation may be key to defining subsequent phenotypic behavior.

  3. Efficient Retrieval of Images for Search Engine by Visual Similarity and Re Ranking

    Viswa S S

    2013-01-01

    Nowadays, web scale image search engines (e.g. Google Image Search, Microsoft Live Image Search) rely almost purely on surrounding text features. Users type keywords in hope of finding a certain type of images. The search engine returns thousands of images ranked by the text keywords extracted from the surrounding text. However, many of returned images are noisy, disorganized, or irrelevant. Even Google and Microsoft have no Visual Information for searching of images. Using visual information...

  4. PHOG-BLAST – a new generation tool for fast similarity search of protein families

    Mironov Andrey A

    2006-06-01

    Full Text Available Abstract Background The need to compare protein profiles frequently arises in various protein research areas: comparison of protein families, domain searches, resolution of orthology and paralogy. The existing fast algorithms can only compare a protein sequence with a protein sequence and a profile with a sequence. Algorithms to compare profiles use dynamic programming and complex scoring functions. Results We developed a new algorithm called PHOG-BLAST for fast similarity search of profiles. This algorithm uses profile discretization to convert a profile to a finite alphabet and utilizes hashing for fast search. To determine the optimal alphabet, we analyzed columns in reliable multiple alignments and obtained column clusters in the 20-dimensional profile space by applying a special clustering procedure. We show that the clustering procedure works best if its parameters are chosen so that 20 profile clusters are obtained which can be interpreted as ancestral amino acid residues. With these clusters, only less than 2% of columns in multiple alignments are out of clusters. We tested the performance of PHOG-BLAST vs. PSI-BLAST on three well-known databases of multiple alignments: COG, PFAM and BALIBASE. On the COG database both algorithms showed the same performance, on PFAM and BALIBASE PHOG-BLAST was much superior to PSI-BLAST. PHOG-BLAST required 10–20 times less computer memory and computation time than PSI-BLAST. Conclusion Since PHOG-BLAST can compare multiple alignments of protein families, it can be used in different areas of comparative proteomics and protein evolution. For example, PHOG-BLAST helped to build the PHOG database of phylogenetic orthologous groups. An essential step in building this database was comparing protein complements of different species and orthologous groups of different taxons on a personal computer in reasonable time. When it is applied to detect weak similarity between protein families, PHOG-BLAST is less

  5. HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.

    O'Driscoll, Aisling; Belogrudov, Vladislav; Carroll, John; Kropp, Kai; Walsh, Paul; Ghazal, Peter; Sleator, Roy D

    2015-04-01

    The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples. PMID:25625550

  6. Breast cancer stories on the internet: improving search facilities to help patients find stories of similar others

    Overberg, Regina Ingrid

    2013-01-01

    The primary aim of this thesis is to gain insight into which search facilities for spontaneously published stories facilitate breast cancer patients in finding stories by other patients in a similar situation. According to the narrative approach, social comparison theory, and social cognitive theory, reading stories about similar others may have the most positive impact. The research followed a user-centred design: users of search facilities (i.e., patients who want to read stories written by...

  7. Identification of protein biochemical functions by similarity search using the molecular surface database eF-site

    Kinoshita, Kengo; Nakamura, Haruki

    2003-01-01

    The identification of protein biochemical functions based on their three-dimensional structures is strongly required in the post-genome-sequencing era. We have developed a new method to identify and predict protein biochemical functions using the similarity information of molecular surface geometries and electrostatic potentials on the surfaces. Our prediction system consists of a similarity search method based on a clique search algorithm and the molecular surface database eF-site (electrost...

  8. PSimScan: algorithm and utility for fast protein similarity search.

    Anna Kaznadzey

    Full Text Available In the era of metagenomics and diagnostics sequencing, the importance of protein comparison methods of boosted performance cannot be overstated. Here we present PSimScan (Protein Similarity Scanner, a flexible open source protein similarity search tool which provides a significant gain in speed compared to BLASTP at the price of controlled sensitivity loss. The PSimScan algorithm introduces a number of novel performance optimization methods that can be further used by the community to improve the speed and lower hardware requirements of bioinformatics software. The optimization starts at the lookup table construction, then the initial lookup table-based hits are passed through a pipeline of filtering and aggregation routines of increasing computational complexity. The first step in this pipeline is a novel algorithm that builds and selects 'similarity zones' aggregated from neighboring matches on small arrays of adjacent diagonals. PSimScan performs 5 to 100 times faster than the standard NCBI BLASTP, depending on chosen parameters, and runs on commodity hardware. Its sensitivity and selectivity at the slowest settings are comparable to the NCBI BLASTP's and decrease with the increase of speed, yet stay at the levels reasonable for many tasks. PSimScan is most advantageous when used on large collections of query sequences. Comparing the entire proteome of Streptocuccus pneumoniae (2,042 proteins to the NCBI's non-redundant protein database of 16,971,855 records takes 6.5 hours on a moderately powerful PC, while the same task with the NCBI BLASTP takes over 66 hours. We describe innovations in the PSimScan algorithm in considerable detail to encourage bioinformaticians to improve on the tool and to use the innovations in their own software development.

  9. The Cost of Search for Multiple Targets: Effects of Practice and Target Similarity

    Menneer, Tamaryn; Cave, Kyle R.; Donnelly, Nick

    2009-01-01

    With the use of X-ray images, performance in the simultaneous search for two target categories was compared with performance in two independent searches, one for each category. In all cases, displays contained one target at most. Dual-target search, for both categories simultaneously, produced a cost in accuracy, although the magnitude of this…

  10. Early Visual Tagging: Effects of Target-Distractor Similarity and Old Age on Search, Subitization, and Counting

    Watson, Derrick G.; Maylor, Elizabeth A.; Allen, Gareth E. J.; Bruce, Lucy A. M.

    2007-01-01

    Three experiments examined the effects of target-distractor (T-D) similarity and old age on the efficiency of searching for single targets and enumerating multiple targets. Experiment 1 showed that increasing T-D similarity selectively reduced the efficiency of enumerating small (less than 4) numerosities (subitizing) but had little effect on…

  11. Application of 3D Zernike descriptors to shape-based ligand similarity searching

    Venkatraman Vishwesh

    2009-12-01

    Full Text Available Abstract Background The identification of promising drug leads from a large database of compounds is an important step in the preliminary stages of drug design. Although shape is known to play a key role in the molecular recognition process, its application to virtual screening poses significant hurdles both in terms of the encoding scheme and speed. Results In this study, we have examined the efficacy of the alignment independent three-dimensional Zernike descriptor (3DZD for fast shape based similarity searching. Performance of this approach was compared with several other methods including the statistical moments based ultrafast shape recognition scheme (USR and SIMCOMP, a graph matching algorithm that compares atom environments. Three benchmark datasets are used to thoroughly test the methods in terms of their ability for molecular classification, retrieval rate, and performance under the situation that simulates actual virtual screening tasks over a large pharmaceutical database. The 3DZD performed better than or comparable to the other methods examined, depending on the datasets and evaluation metrics used. Reasons for the success and the failure of the shape based methods for specific cases are investigated. Based on the results for the three datasets, general conclusions are drawn with regard to their efficiency and applicability. Conclusion The 3DZD has unique ability for fast comparison of three-dimensional shape of compounds. Examples analyzed illustrate the advantages and the room for improvements for the 3DZD.

  12. Efficient Retrieval of Images for Search Engine by Visual Similarity and Re Ranking

    Viswa S S

    2013-06-01

    Full Text Available Nowadays, web scale image search engines (e.g.Google Image Search, Microsoft Live ImageSearch rely almost purely on surrounding textfeatures. Users type keywords in hope of finding acertain type of images. The search engine returnsthousands of images ranked by the text keywordsextracted from the surrounding text. However,many of returned images are noisy, disorganized, orirrelevant. Even Google and Microsoft have noVisual Information for searching of images. Usingvisual information to re rank and improve textbased image search results is the idea. Thisimproves the precision of the text based imagesearch ranking by incorporating the informationconveyed by the visual modality.The typicalassumption that the top-images in the text-basedsearch result are equally relevant is relaxed bylinking the relevance of the images to their initialrank positions. Then, a number of images from theinitial search result are employed as the prototypesthat serve to visually represent the query and thatare subsequently used to construct meta re rankers.i.e. The most relevant images are found by visualsimilarity and the average scores are calculated. Byapplying different meta re rankers to an image fromthe initial result, re ranking scores are generated,which are then used to find the new rank positionfor an image in the re ranked search result.Humansupervision is introduced to learn the model weightsoffline, prior to the online re ranking process. Whilemodel learning requires manual labelling of theresults for a few queries, the resulting model isquery independent and therefore applicable to anyother query. The experimental results on arepresentative web image search dataset comprising353 queries demonstrate that the proposed methodoutperforms the existing supervised andunsupervised Re ranking approaches. Moreover, itimproves the performance over the text-based imagesearch engine by morethan 25.48%

  13. RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data

    Zhao, Yongan; Tang, Haixu; Ye, Yuzhen

    2011-01-01

    Summary: With the wide application of next-generation sequencing (NGS) techniques, fast tools for protein similarity search that scale well to large query datasets and large databases are highly desirable. In a previous work, we developed RAPSearch, an algorithm that achieved a ~20–90-fold speedup relative to BLAST while still achieving similar levels of sensitivity for short protein fragments derived from NGS data. RAPSearch, however, requires a substantial memory footprint to identify align...

  14. Accurate Image Search using Local Descriptors into a Compact Image Representation

    Soumia Benkrama

    2013-01-01

    Full Text Available Progress in image retrieval by using low-level features, such as colors, textures and shapes, the performance is still unsatisfied as there are existing gaps between low-level features and high-level semantic concepts. In this work, we present an improved implementation for the bag of visual words approach. We propose a image retrieval system based on bag-of-features (BoF model by using scale invariant feature transform (SIFT and speeded up robust features (SURF. In literature SIFT and SURF give of good results. Based on this observation, we decide to use a bag-of-features approach over quaternion zernike moments (QZM. We compare the results of SIFT and SURF with those of QZM. We propose an indexing method for content based search task that aims to retrieve collection of images and returns a ranked list of objects in response to a query image. Experimental results with the Coil-100 and corel-1000 image database, demonstrate that QZM produces a better performance than known representations (SIFT and SURF.

  15. In Search of an Accurate Evaluation of Intrahepatic Cholestasis of Pregnancy

    Manuela Martinefski

    2012-01-01

    Full Text Available Until now, biochemical parameter for diagnosis of intrahepatic cholestasis of pregnancy (ICP mostly used is the rise of total serum bile acids (TSBA above the upper normal limit of 11 μM. However, differential diagnosis is very difficult since overlapped values calculated on bile acids determinations, are observed in different conditions of pregnancy including the benign condition of pruritus gravidarum. The aim of this work was to determine the better markers in ICP for a precise diagnosis together with parameters associated with severity of symptoms and treatment evaluation. Serum bile acid profiles were evaluated using capillary electrophoresis in 38 healthy pregnant women and 32 ICP patients and it was calculated the sensitivity, specificity, accuracy, predictive values and the relationships of certain individual bile acids in pregnant women in order to replace TSBA determinations. The evaluation of the results shows that LCA and UDCA/LCA ratio provided information for a more complete and accurate diagnosis and evaluation of ICP than calculation of solely TSBA levels in pregnant women.

  16. Breast cancer stories on the internet : improving search facilities to help patients find stories of similar others

    Overberg, Regina Ingrid

    2013-01-01

    The primary aim of this thesis is to gain insight into which search facilities for spontaneously published stories facilitate breast cancer patients in finding stories by other patients in a similar situation. According to the narrative approach, social comparison theory, and social cognitive theory

  17. BlastMultAl, a Blast Extension for Similarity Searching with Alignment Graphs

    Nicodème, Pierre

    1996-01-01

    We describe a new method of processing similarity queries of a proteic multiple alignment with a set (database) of protein sequences, or similarity queries of a protein sequence with a set of protein alignments. We use a representation of multiple alignments as alignment-graphs. Comparisons with different classical methods is made. This new method allows the detection of subtle similarities which are not found by the other methods. It has direct applications for similarities querying with the...

  18. SHOP: receptor-based scaffold hopping by GRID-based similarity searches

    Bergmann, Rikke; Liljefors, Tommy; Sørensen, Morten D;

    2009-01-01

    find known active CDK2 scaffolds in a database. Additionally, SHOP was used for suggesting new inhibitors of p38 MAP kinase. Four p38 complexes were used to perform six scaffold searches. Several new scaffolds were suggested, and the resulting compounds were successfully docked into the query proteins....

  19. Finding and Reusing Learning Materials with Multimedia Similarity Search and Social Networks

    Little, Suzanne; Ferguson, Rebecca; Ruger, Stefan

    2012-01-01

    The authors describe how content-based multimedia search technologies can be used to help learners find new materials and learning pathways by identifying semantic relationships between educational resources in a social learning network. This helps users--both learners and educators--to explore and find material to support their learning aims.…

  20. FSim: A Novel Functional Similarity Search Algorithm and Tool for Discovering Functionally Related Gene Products

    2014-01-01

    Background. During the analysis of genomics data, it is often required to quantify the functional similarity of genes and their products based on the annotation information from gene ontology (GO) with hierarchical structure. A flexible and user-friendly way to estimate the functional similarity of genes utilizing GO annotation is therefore highly desired. Results. We proposed a novel algorithm using a level coefficient-weighted model to measure the functional similarity of gene products base...

  1. Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction

    Wichterich, Marc; Assent, Ira; Philipp, Kranen;

    2008-01-01

    The Earth Mover's Distance (EMD) was developed in computer vision as a flexible similarity model that utilizes similarities in feature space to define a high quality similarity measure in feature representation space. It has been successfully adopted in a multitude of applications with low to...... dimensionality reduction techniques for the EMD in a filter-and-refine architecture for efficient lossless retrieval. Thorough experimental evaluation on real world data sets demonstrates a substantial reduction of the number of expensive high-dimensional EMD computations and thus remarkably faster response...

  2. A Commodity Information Search Model of E-Commerce Search Engine Based on Semantic Similarity and Multi-Attribute Decision Method

    Ziming Zeng

    2010-01-01

    The paper presented an intelligent commodity information search model, which integrates semantic retrieval andmulti-attribute decision method. First, semantic similarity is computed by constructing semantic vector-space, inorder to realize the semantic consistency between retrieved result and customer’s query. Besides, TOPSISmethod is also utilized to construct the comparison mechanism of commodity by calculating the utility value ofeach retrieved commodity. Finally, the experiment is conduct...

  3. Proposal for a Similar Question Search System on a Q&A Site

    Katsutoshi Kanamori

    2014-06-01

    Full Text Available There is a service to help Internet users obtain answers to specific questions when they visit a Q&A site. A Q&A site is very useful for the Internet user, but posted questions are often not answered immediately. This delay in answering occurs because in most cases another site user is answering the question manually. In this study, we propose a system that can present a question that is similar to a question posted by a user. An advantage of this system is that a user can refer to an answer to a similar question. This research measures the similarity of a candidate question based on word and dependency parsing. In an experiment, we examined the effectiveness of the proposed system for questions actually posted on the Q&A site. The result indicates that the system can show the questioner the answer to a similar question. However, the system still has a number of aspects that should be improved.

  4. Managing Biomedical Image Metadata for Search and Retrieval of Similar Images

    Korenblum, Daniel; Rubin, Daniel; Napel, Sandy; Cesar RODRIGUEZ; Beaulieu, Chris

    2010-01-01

    Radiology images are generally disconnected from the metadata describing their contents, such as imaging observations (“semantic” metadata), which are usually described in text reports that are not directly linked to the images. We developed a system, the Biomedical Image Metadata Manager (BIMM) to (1) address the problem of managing biomedical image metadata and (2) facilitate the retrieval of similar images using semantic feature metadata. Our approach allows radiologists, researchers, and ...

  5. Integrating structure- and ligand-based virtual screening: comparison of individual, parallel, and fused molecular docking and similarity search calculations on multiple targets.

    Tan, Lu; Geppert, Hanna; Sisay, Mihiret T; Gütschow, Michael; Bajorath, Jürgen

    2008-10-01

    Similarity searching is often used to preselect compounds for docking, thereby decreasing the size of screening databases. However, integrated structure- and ligand-based screening schemes are rare at present. Docking and similarity search calculations using 2D fingerprints were carried out in a comparative manner on nine target enzymes, for which significant numbers of diverse inhibitors could be obtained. In the absence of knowledge-based docking constraints and target-directed parameter optimisation, fingerprint searching displayed a clear preference over docking calculations. Alternative combinations of docking and similarity search results were investigated and found to further increase compound recall of individual methods in a number of instances. When the results of similarity searching and docking were combined, parallel selection of candidate compounds from individual rankings was generally superior to rank fusion. We suggest that complementary results from docking and similarity searching can be captured by integrated compound selection schemes. PMID:18651695

  6. Web Similarity

    Cohen, Andrew; Vitányi, Paul

    2015-01-01

    Normalized web distance (NWD) is a similarity or normalized semantic distance based on the World Wide Web or any other large electronic database, for instance Wikipedia, and a search engine that returns reliable aggregate page counts. For sets of search terms the NWD gives a similarity on a scale from 0 (identical) to 1 (completely different). The NWD approximates the similarity according to all (upper semi)computable properties. We develop the theory and give applications. The derivation of ...

  7. SimSearch : a new variant of dynamic programming based on distance series for optimal and near-optimal similarity discovery in biological sequences

    Sérgio DEUSDADO; Carvalho, Paulo

    2009-01-01

    In this paper, we propose SimSearch, an algorithm implementing a new variant of dynamic programming based on distance series for optimal and near-optimal similarity discovery in biological sequences. The initial phase of SimSearch is devoted to fulfil the binary similarity matrices by signalling the distances between occurrences of the same symbol. The scoring scheme is further applied, when analysed the maximal extension of the pattern. Employing bit parallelism to analyse the global similar...

  8. Improving performance of content-based image retrieval schemes in searching for similar breast mass regions: an assessment

    This study aims to assess three methods commonly used in content-based image retrieval (CBIR) schemes and investigate the approaches to improve scheme performance. A reference database involving 3000 regions of interest (ROIs) was established. Among them, 400 ROIs were randomly selected to form a testing dataset. Three methods, namely mutual information, Pearson's correlation and a multi-feature-based k-nearest neighbor (KNN) algorithm, were applied to search for the 15 'the most similar' reference ROIs to each testing ROI. The clinical relevance and visual similarity of searching results were evaluated using the areas under receiver operating characteristic (ROC) curves (AZ) and average mean square difference (MSD) of the mass boundary spiculation level ratings between testing and selected ROIs, respectively. The results showed that the AZ values were 0.893 ± 0.009, 0.606 ± 0.021 and 0.699 ± 0.026 for the use of KNN, mutual information and Pearson's correlation, respectively. The AZ values increased to 0.724 ± 0.017 and 0.787 ± 0.016 for mutual information and Pearson's correlation when using ROIs with the size adaptively adjusted based on actual mass size. The corresponding MSD values were 2.107 ± 0.718, 2.301 ± 0.733 and 2.298 ± 0.743. The study demonstrates that due to the diversity of medical images, CBIR schemes using multiple image features and mass size-based ROIs can achieve significantly improved performance.

  9. Developing Molecular Interaction Database and Searching for Similar Pathways (MOLECULAR BIOLOGY AND INFORMATION-Biological Information Science)

    Kawashima, Shuichi; Katayama, Toshiaki; Kanehisa, Minoru

    1998-01-01

    We have developed a database named BRITE, which contains knowledge of interacting molecules and/or genes concering cell cycle and early development. Here, we report an overview of the database and the method of automatic search for functionally common sub-pathways between two biological pathways in BRITE.

  10. Topology-based document similarity search algorithm%一种基于文档拓扑的相似性搜索算法

    杨艳; 朱戈; 范文彬

    2011-01-01

    Searching for similar documents from the large number of documents quickly and efficiently is an important and time-consuming problem.The existing algorithms first find the candidate document set, and then sort them based on a document related evaluation to identify the most relevant ones.A topology-based document similarity search algorithm--Hub-Nis put forward, and the document similarity search problem is transformed into graph search problem, applying the pruning techniques, reducing the scope of scanned documents, and significantly improving retrieval efficiency.lt proves to be effective and feasible through experiment.%从海量文档中快速有效地搜索到相似文档是一个重要且耗时的问题.现有的文档相似性搜索算法是先找出候选文档集,再对候选文档进行相关性排序,找出最相关的文档.提出了一种基于文档拓扑的相似性搜索算法-Hub-N,将文档相似性搜索问题转化为图搜索问题,应用相应的剪枝技术,缩小了扫描文档的范围,提高了搜索效率.通过实验验证了算法的有效性和可行性.

  11. Novel DOCK clique driven 3D similarity database search tools for molecule shape matching and beyond: adding flexibility to the search for ligand kin.

    Good, Andrew C

    2007-10-01

    With readily available CPU power and copious disk storage, it is now possible to undertake rapid comparison of 3D properties derived from explicit ligand overlay experiments. With this in mind, shape software tools originally devised in the 1990s are revisited, modified and applied to the problem of ligand database shape comparison. The utility of Connolly surface data is highlighted using the program MAKESITE, which leverages surface normal data to a create ligand shape cast. This cast is applied directly within DOCK, allowing the program to be used unmodified as a shape searching tool. In addition, DOCK has undergone multiple modifications to create a dedicated ligand shape comparison tool KIN. Scoring has been altered to incorporate the original incarnation of Gaussian function derived shape description based on STO-3G atomic electron density. In addition, a tabu-like search refinement has been added to increase search speed by removing redundant starting orientations produced during clique matching. The ability to use exclusion regions, again based on Gaussian shape overlap, has also been integrated into the scoring function. The use of both DOCK with MAKESITE and KIN in database screening mode is illustrated using a published ligand shape virtual screening template. The advantages of using a clique-driven search paradigm are highlighted, including shape optimization within a pharmacophore constrained framework, and easy incorporation of additional scoring function modifications. The potential for further development of such methods is also discussed. PMID:17482856

  12. PHASE-RESOLVED INFRARED SPECTROSCOPY AND PHOTOMETRY OF V1500 CYGNI, AND A SEARCH FOR SIMILAR OLD CLASSICAL NOVAE

    We present phase-resolved near-infrared photometry and spectroscopy of the classical nova (CN) V1500 Cyg to explore whether cyclotron emission is present in this system. While the spectroscopy do not indicate the presence of discrete cyclotron harmonic emission, the light curves suggest that a sizable fraction of its near-infrared fluxes are due to this component. The light curves of V1500 Cyg appear to remain dominated by emission from the heated face of the secondary star in this system. We have used infrared spectroscopy and photometry to search for other potential magnetic systems among old CNe. We have found that the infrared light curves of V1974 Cyg superficially resemble those of V1500 Cyg, suggesting a highly irradiated companion. The old novae V446 Her and QV Vul have light curves with large amplitude variations like those seen in polars, suggesting they might have magnetic primaries. We extract photometry for 79 old novae from the Two Micron All Sky Survey Point Source Catalog and use those data to derive the mean, un-reddened infrared colors of quiescent novae. We also extract WISE data for these objects and find that 45 of them were detected. Surprisingly, a number of these systems were detected in the WISE 22 μm band. While two of those objects produced significant dust shells (V705 Cas and V445 Pup), the others did not. It appears that line emission from their ionized ejected shells is the most likely explanation for those detections

  13. Improving gene expression similarity measurement using pathway-based analytic dimension

    2009-01-01

    Background Gene expression similarity measuring methods were developed and applied to search rapidly growing public microarray databases. However, current expression similarity measuring methods need to be improved to accurately measure similarity between gene expression profiles from different platforms or different experiments. Results We devised new gene expression similarity measuring method based on pathway information. In short, newly devised method measure similarity between gene expre...

  14. QAR数据多维子序列的相似性搜索%Similarity search for multidimensional QAR data subsequence

    杨慧; 张国振

    2013-01-01

    High dimensionality of QAR and the uncertain relevance among them which make the method to do the similarity search for time series in the low dimensionality are no longer applicable in such situation. Taking into account the specificity of the civil aviation industry, with the similarity search for QAR to ascertain the plane faults requires a special definition of the similarity. In this paper, expertise and analytic hierarchy process algorithm are combined to be used to calculate the weightiness of different dimensionalities for the plane fault. It translates the QAR data with the symbolic method, and then builds a k-d tree index, which makes it possible to do the similarity search on multidimensional QAR data subsequences. Shape and distance are used toghther to define similarity. The high precision and the low cost are proved by the experiments in this paper.%QAR数据的高维度以及维度之间不确定的相互关联性,使得原有低维空间上度量时间序列的相似性的方法不再适用,另一方面由于民航行业的特殊性,利用QAR数据进行相似性搜索来确定飞行故障,对相似性的定义也有特殊的要求.通过专家经验结合一种层次分析算法来确定飞行故障所关联的属性维度的重要性,对QAR数据的多维子序列进行符号化表示,并利用k-d树的特殊性质建立索引,使QAR数据多维子序列的快速相似性搜索成为可能,结合形状和距离对相似性进行定义和度量,实验证明查找速度快,准确度较为满意.

  15. SPOT-Ligand: Fast and effective structure-based virtual screening by binding homology search according to ligand and receptor similarity.

    Yang, Yuedong; Zhan, Jian; Zhou, Yaoqi

    2016-07-01

    Structure-based virtual screening usually involves docking of a library of chemical compounds onto the functional pocket of the target receptor so as to discover novel classes of ligands. However, the overall success rate remains low and screening a large library is computationally intensive. An alternative to this "ab initio" approach is virtual screening by binding homology search. In this approach, potential ligands are predicted based on similar interaction pairs (similarity in receptors and ligands). SPOT-Ligand is an approach that integrates ligand similarity by Tanimoto coefficient and receptor similarity by protein structure alignment program SPalign. The method was found to yield a consistent performance in DUD and DUD-E docking benchmarks even if model structures were employed. It improves over docking methods (DOCK6 and AUTODOCK Vina) and has a performance comparable to or better than other binding-homology methods (FINDsite and PoLi) with higher computational efficiency. The server is available at http://sparks-lab.org. © 2016 Wiley Periodicals, Inc. PMID:27074979

  16. Similarity Search in Document Collections

    Jordanov, Dimitar Dimitrov

    2009-01-01

    Hlavním cílem této práce je odhadnout výkonnost volně šířeni balík  Sémantický Vektory a třída MoreLikeThis z balíku Apache Lucene. Tato práce nabízí porovnání těchto dvou přístupů a zavádí metody, které mohou vést ke zlepšení kvality vyhledávání.

  17. Design of a bioactive small molecule that targets the myotonic dystrophy type 1 RNA via an RNA motif-ligand database and chemical similarity searching.

    Parkesh, Raman; Childs-Disney, Jessica L; Nakamori, Masayuki; Kumar, Amit; Wang, Eric; Wang, Thomas; Hoskins, Jason; Tran, Tuan; Housman, David; Thornton, Charles A; Disney, Matthew D

    2012-03-14

    Myotonic dystrophy type 1 (DM1) is a triplet repeating disorder caused by expanded CTG repeats in the 3'-untranslated region of the dystrophia myotonica protein kinase (DMPK) gene. The transcribed repeats fold into an RNA hairpin with multiple copies of a 5'CUG/3'GUC motif that binds the RNA splicing regulator muscleblind-like 1 protein (MBNL1). Sequestration of MBNL1 by expanded r(CUG) repeats causes splicing defects in a subset of pre-mRNAs including the insulin receptor, the muscle-specific chloride ion channel, sarco(endo)plasmic reticulum Ca(2+) ATPase 1, and cardiac troponin T. Based on these observations, the development of small-molecule ligands that target specifically expanded DM1 repeats could be of use as therapeutics. In the present study, chemical similarity searching was employed to improve the efficacy of pentamidine and Hoechst 33258 ligands that have been shown previously to target the DM1 triplet repeat. A series of in vitro inhibitors of the RNA-protein complex were identified with low micromolar IC(50)'s, which are >20-fold more potent than the query compounds. Importantly, a bis-benzimidazole identified from the Hoechst query improves DM1-associated pre-mRNA splicing defects in cell and mouse models of DM1 (when dosed with 1 mM and 100 mg/kg, respectively). Since Hoechst 33258 was identified as a DM1 binder through analysis of an RNA motif-ligand database, these studies suggest that lead ligands targeting RNA with improved biological activity can be identified by using a synergistic approach that combines analysis of known RNA-ligand interactions with chemical similarity searching. PMID:22300544

  18. Compression-based similarity

    Vitányi, Paul

    2011-01-01

    First we consider pair-wise distances for literal objects consisting of finite binary files. These files are taken to contain all of their meaning, like genomes or books. The distances are based on compression of the objects concerned, normalized, and can be viewed as similarity distances. Second, we consider pair-wise distances between names of objects, like "red" or "christianity." In this case the distances are based on searches of the Internet. Such a search can be performed by any search...

  19. Neural circuits of eye movements during performance of the visual exploration task, which is similar to the responsive search score task, in schizophrenia patients and normal subjects

    Abnormal exploratory eye movements have been studied as a biological marker for schizophrenia. Using functional MRI (fMRI), we investigated brain activations of 12 healthy and 8 schizophrenic subjects during performance of a visual exploration task that is similar to the responsive search score task to clarify the neural basis of the abnormal exploratory eye movement. Performance data, such as the number of eye movements, the reaction time, and the percentage of correct answers showed no significant differences between the two groups. Only the normal subjects showed activations at the bilateral thalamus and the left anterior medial frontal cortex during the visual exploration tasks. In contrast, only the schizophrenic subjects showed activations at the right anterior cingulate gyms during the same tasks. The activation at the different locations between the two groups, the left anterior medial frontal cortex in normal subjects and the right anterior cingulate gyrus in schizophrenia subjects, was explained by the feature of the visual tasks. Hypoactivation at the bilateral thalamus supports a dysfunctional filtering theory of schizophrenia. (author)

  20. Textual Spatial Cosine Similarity

    Crocetti, Giancarlo

    2015-01-01

    When dealing with document similarity many methods exist today, like cosine similarity. More complex methods are also available based on the semantic analysis of textual information, which are computationally expensive and rarely used in the real time feeding of content as in enterprise-wide search environments. To address these real-time constraints, we developed a new measure of document similarity called Textual Spatial Cosine Similarity, which is able to detect similitude at the semantic ...

  1. Similarity Search in Data Stream with Adaptive Segmental Approximations%基于适应性分段估计的数据流相似性搜索

    吴枫; 仲妍; 吴泉源; 贾焰; 杨树强

    2009-01-01

    Similarity search has attracted many researchers from various communities (real-time stock quotes, network security, sensor networks). Due to the infinite, continuous, fast and real-time properties of the data from these communities, a method is needed for online similarity search in data stream. This paper first proposes the lower bound function LB_seg_WF_(global) for DTW (dynamic time warping) in the presence of global warping constraints and LB_seg_WF for DTW without global warping constraints, which are not applied to any index structures. They are segmented DTW techniques, and can be applied to sequences and queries of varying lengths in data stream. Next, several tighter lower bounds are proposed to improve the approximate degree of the LB_seg_WF_(global) and LB_seg_WF. Finally, to deal with the possible continuously non-effective problem of LB_seg_WF_(global) or LB_seg_WF in data stream, it is believed that lower-bound LB_WF_(global) (in the presence of global warping constraints) and lower-bound LB_WF, upper-bound UB_WF (without global warping constraints) can fast estimate DTW and hence reduce a lot of redundant computations by incrementally computing. The theoretical analysis and statistical experiments confirm the validity of the proposed methods.%相似性搜索在股票交易行情、网络安全、传感器网络等众多领域应用广泛.由于这些领域中产生的数据具有无限的、连续的、快速的、实时的特性,所以需要适合数据流上的在线相似性搜索算法.首先,在具有或不具有全局约束条件下,分别提出了没有索引结构的DTW(dynamic time warping)下限函数LB_seg_WF_(global)和LB_seg_WF,它们是一种分段DTW技术,能够处理数据流上的非等长序列间在线相似性匹配问题.然后,为了进一步提高LB_seg_WF_(global)和LB_seg_WF的近似程度,提出了一系列的改进方法.最后,针对流上使用LB_seg_WF_(global)或LB_seg_WF可能会出现连续失效的情况,分别提

  2. Concept Search

    Giunchiglia, Fausto; Kharkevich, Uladzimir; Zaihrayeu, Ilya

    2008-01-01

    In this paper we present a novel approach, called Concept Search, which extends syntactic search, i.e., search based on the computation of string similarity between words, with semantic search, i.e., search based on the computation of semantic relations between concepts. The key idea of Concept Search is to operate on complex concepts and to maximally exploit the semantic information available, reducing to syntactic search only when necessary, i.e., when no semantic information is available. ...

  3. Modal Similarity

    Vigo , Dr. Ronaldo

    2009-01-01

    Just as Boolean rules define Boolean categories, the Boolean operators define higher-order Boolean categories referred to as modal categories. We examine the similarity order between these categories and the standard category of logical identity (i.e. the modal category defined by the biconditional or equivalence operator). Our goal is 4-fold: first, to introduce a similarity measure for determining this similarity order; second, to show that such a measure is a good predictor of the similari...

  4. Combination of 2D/3D Ligand-Based Similarity Search in Rapid Virtual Screening from Multimillion Compound Repositories. Selection and Biological Evaluation of Potential PDE4 and PDE5 Inhibitors

    Krisztina Dobi

    2014-05-01

    Full Text Available Rapid in silico selection of target focused libraries from commercial repositories is an attractive and cost effective approach. If structures of active compounds are available rapid 2D similarity search can be performed on multimillion compound databases but the generated library requires further focusing by various 2D/3D chemoinformatics tools. We report here a combination of the 2D approach with a ligand-based 3D method (Screen3D which applies flexible matching to align reference and target compounds in a dynamic manner and thus to assess their structural and conformational similarity. In the first case study we compared the 2D and 3D similarity scores on an existing dataset derived from the biological evaluation of a PDE5 focused library. Based on the obtained similarity metrices a fusion score was proposed. The fusion score was applied to refine the 2D similarity search in a second case study where we aimed at selecting and evaluating a PDE4B focused library. The application of this fused 2D/3D similarity measure led to an increase of the hit rate from 8.5% (1st round, 47% inhibition at 10 µM to 28.5% (2nd round at 50% inhibition at 10 µM and the best two hits had 53 nM inhibitory activities.

  5. Cognitive residues of similarity

    OToole, Stephanie; Keane, Mark T.

    2013-01-01

    What are the cognitive after-effects of making a similarity judgement? What, cognitively, is left behind and what effect might these residues have on subsequent processing? In this paper, we probe for such after-effects using a visual search task, performed after a task in which pictures of real-world objects were compared. So, target objects were first presented in a comparison task (e.g., rate the similarity of this object to another) thus, presumably, modifying some of their features befor...

  6. Including Biological Literature Improves Homology Search

    Chang, Jeffrey T.; Raychaudhuri, Soumya; Altman, Russ B

    2001-01-01

    Annotating the tremendous amount of sequence information being generated requires accurate automated methods for recognizing homology. Although sequence similarity is only one of many indicators of evolutionary homology, it is often the only one used. Here we find that supplementing sequence similarity with information from biomedical literature is successful in increasing the accuracy of homology search results. We modified the PSI-BLAST algorithm to use literature similarity in each iterati...

  7. Gene functional similarity search tool (GFSST)

    Russo James J; Sheng Huitao; Zhang Jinghui; Zhang Peisen; Osborne Brian; Buetow Kenneth

    2006-01-01

    Abstract Background With the completion of the genome sequences of human, mouse, and other species and the advent of high throughput functional genomic research technologies such as biomicroarray chips, more and more genes and their products have been discovered and their functions have begun to be understood. Increasing amounts of data about genes, gene products and their functions have been stored in databases. To facilitate selection of candidate genes for gene-disease research, genetic as...

  8. Personalized Search

    AUTHOR|(SzGeCERN)749939

    2015-01-01

    As the volume of electronically available information grows, relevant items become harder to find. This work presents an approach to personalizing search results in scientific publication databases. This work focuses on re-ranking search results from existing search engines like Solr or ElasticSearch. This work also includes the development of Obelix, a new recommendation system used to re-rank search results. The project was proposed and performed at CERN, using the scientific publications available on the CERN Document Server (CDS). This work experiments with re-ranking using offline and online evaluation of users and documents in CDS. The experiments conclude that the personalized search result outperform both latest first and word similarity in terms of click position in the search result for global search in CDS.

  9. Applying ligands profiling using multiple extended electron distribution based field templates and feature trees similarity searching in the discovery of new generation of urea-based antineoplastic kinase inhibitors.

    Eman M Dokla

    Full Text Available This study provides a comprehensive computational procedure for the discovery of novel urea-based antineoplastic kinase inhibitors while focusing on diversification of both chemotype and selectivity pattern. It presents a systematic structural analysis of the different binding motifs of urea-based kinase inhibitors and the corresponding configurations of the kinase enzymes. The computational model depends on simultaneous application of two protocols. The first protocol applies multiple consecutive validated virtual screening filters including SMARTS, support vector-machine model (ROC = 0.98, Bayesian model (ROC = 0.86 and structure-based pharmacophore filters based on urea-based kinase inhibitors complexes retrieved from literature. This is followed by hits profiling against different extended electron distribution (XED based field templates representing different kinase targets. The second protocol enables cancericidal activity verification by using the algorithm of feature trees (Ftrees similarity searching against NCI database. Being a proof-of-concept study, this combined procedure was experimentally validated by its utilization in developing a novel series of urea-based derivatives of strong anticancer activity. This new series is based on 3-benzylbenzo[d]thiazol-2(3H-one scaffold which has interesting chemical feasibility and wide diversification capability. Antineoplastic activity of this series was assayed in vitro against NCI 60 tumor-cell lines showing very strong inhibition of GI(50 as low as 0.9 uM. Additionally, its mechanism was unleashed using KINEX™ protein kinase microarray-based small molecule inhibitor profiling platform and cell cycle analysis showing a peculiar selectivity pattern against Zap70, c-src, Mink1, csk and MeKK2 kinases. Interestingly, it showed activity on syk kinase confirming the recent studies finding of the high activity of diphenyl urea containing compounds against this kinase. Allover, the new series

  10. A cross-species analysis method to analyze animal models' similarity to human's disease state

    Yu Shuhao; Zheng Lulu; Li Yun; Li Chunyan; Ma Chenchen; Li Yixue; Li Xuan; Hao Pei

    2012-01-01

    Abstract Background Animal models are indispensable tools in studying the cause of human diseases and searching for the treatments. The scientific value of an animal model depends on the accurate mimicry of human diseases. The primary goal of the current study was to develop a cross-species method by using the animal models' expression data to evaluate the similarity to human diseases' and assess drug molecules' efficiency in drug research. Therefore, we hoped to reveal that it is feasible an...

  11. Memory support for desktop search

    Chen, Yi; Kelly, Liadh; Jones, Gareth J.F.

    2010-01-01

    The user's memory plays a very important role in desktop search. A search query with insufficiently or inaccurately recalled information may make the search dramatically less effective. In this paper, we discuss three approaches to support user’s memory during desktop search. These include extended types of well remembered search options, the use of past search queries and results, and search from similar items. We will also introduce our search system which incorporates these featur...

  12. Accurate Finite Difference Algorithms

    Goodrich, John W.

    1996-01-01

    Two families of finite difference algorithms for computational aeroacoustics are presented and compared. All of the algorithms are single step explicit methods, they have the same order of accuracy in both space and time, with examples up to eleventh order, and they have multidimensional extensions. One of the algorithm families has spectral like high resolution. Propagation with high order and high resolution algorithms can produce accurate results after O(10(exp 6)) periods of propagation with eight grid points per wavelength.

  13. Accurate backgrounds to Higgs production at the LHC

    Kauer, N

    2007-01-01

    Corrections of 10-30% for backgrounds to the H --> WW --> l^+l^-\\sla{p}_T search in vector boson and gluon fusion at the LHC are reviewed to make the case for precise and accurate theoretical background predictions.

  14. Custom Search Engines: Tools & Tips

    Notess, Greg R.

    2008-01-01

    Few have the resources to build a Google or Yahoo! from scratch. Yet anyone can build a search engine based on a subset of the large search engines' databases. Use Google Custom Search Engine or Yahoo! Search Builder or any of the other similar programs to create a vertical search engine targeting sites of interest to users. The basic steps to…

  15. Persistent Homology and Partial Similarity of Shapes

    Di Fabio, Barbara; Landi, Claudia

    2011-01-01

    The ability to perform shape retrieval based not only on full similarity, but also partial similarity is a key property for any content-based search engine. We prove that persistence diagrams can reveal a partial similarity between two shapes by showing a common subset of points. This can be explained using the Mayer-Vietoris formulas that we develop for ordinary, relative and extended persistent homology. An experiment outlines the potential of persistence diagrams as shape descriptors in re...

  16. Are Defect Profile Similarity Criteria Different Than Velocity Profile Similarity Criteria for the Turbulent Boundary Layer?

    Weyburne, David

    2015-01-01

    The use of the defect profile instead of the experimentally observed velocity profile for the search for similarity parameters has become firmly imbedded in the turbulent boundary layer literature. However, a search of the literature reveals that there are no theoretical reasons for this defect profile preference over the more traditional velocity profile. In the report herein, we use the flow governing equation approach to develop similarity criteria for the two profiles. Results show that t...

  17. Image Tracking for the High Similarity Drug Tablets Based on Light Intensity Reflective Energy and Artificial Neural Network

    Zhongwei Liang; Liang Zhou; Xiaochu Liu; Xiaogang Wang

    2014-01-01

    It is obvious that tablet image tracking exerts a notable influence on the efficiency and reliability of high-speed drug mass production, and, simultaneously, it also emerges as a big difficult problem and targeted focus during production monitoring in recent years, due to the high similarity shape and random position distribution of those objectives to be searched for. For the purpose of tracking tablets accurately in random distribution, through using surface fitting approach and transition...

  18. Finding Protein and Nucleotide Similarities with FASTA.

    Pearson, William R

    2016-01-01

    The FASTA programs provide a comprehensive set of rapid similarity searching tools (fasta36, fastx36, tfastx36, fasty36, tfasty36), similar to those provided by the BLAST package, as well as programs for slower, optimal, local, and global similarity searches (ssearch36, ggsearch36), and for searching with short peptides and oligonucleotides (fasts36, fastm36). The FASTA programs use an empirical strategy for estimating statistical significance that accommodates a range of similarity scoring matrices and gap penalties, improving alignment boundary accuracy and search sensitivity. The FASTA programs can produce "BLAST-like" alignment and tabular output, for ease of integration into existing analysis pipelines, and can search small, representative databases, and then report results for a larger set of sequences, using links from the smaller dataset. The FASTA programs work with a wide variety of database formats, including mySQL and postgreSQL databases. The programs also provide a strategy for integrating domain and active site annotations into alignments and highlighting the mutational state of functionally critical residues. These protocols describe how to use the FASTA programs to characterize protein and DNA sequences, using protein:protein, protein:DNA, and DNA:DNA comparisons. © 2016 by John Wiley & Sons, Inc. PMID:27010337

  19. Clustering by Pattern Similarity

    Hai-xun Wang; Jian Pei

    2008-01-01

    The task of clustering is to identify classes of similar objects among a set of objects. The definition of similarity varies from one clustering model to another. However, in most of these models the concept of similarity is often based on such metrics as Manhattan distance, Euclidean distance or other Lp distances. In other words, similar objects must have close values in at least a set of dimensions. In this paper, we explore a more general type of similarity. Under the pCluster model we proposed, two objects are similar if they exhibit a coherent pattern on a subset of dimensions. The new similarity concept models a wide range of applications. For instance, in DNA microarray analysis, the expression levels of two genes may rise and fall synchronously in response to a set of environmental stimuli. Although the magnitude of their expression levels may not be close, the patterns they exhibit can be very much alike. Discovery of such clusters of genes is essential in revealing significant connections in gene regulatory networks. E-commerce applications, such as collaborative filtering, can also benefit from the new model, because it is able to capture not only the closeness of values of certain leading indicators but also the closeness of (purchasing, browsing, etc.) patterns exhibited by the customers. In addition to the novel similarity model, this paper also introduces an effective and efficient algorithm to detect such clusters, and we perform tests on several real and synthetic data sets to show its performance.

  20. Automatic face alignment by maximizing similarity score

    Boom, Bas; Spreeuwers, Luuk; Veldhuis, Raymond; Fred, A.; Jain, A. K.

    2007-01-01

    Accurate face registration is of vital importance to the performance of a face recognition algorithm. We propose a face registration method which searches for the optimal alignment by maximizing the score of a face recognition algorithm. In this paper we investigate the practical usability of our face registration method. Experiments show that our registration method achieves better results in face verification than the landmark based registration method. We even obtain face verification resu...

  1. Niche Genetic Algorithm with Accurate Optimization Performance

    LIU Jian-hua; YAN De-kun

    2005-01-01

    Based on crowding mechanism, a novel niche genetic algorithm was proposed which can record evolutionary direction dynamically during evolution. After evolution, the solutions's precision can be greatly improved by means of the local searching along the recorded direction. Simulation shows that this algorithm can not only keep population diversity but also find accurate solutions. Although using this method has to take more time compared with the standard GA, it is really worth applying to some cases that have to meet a demand for high solution precision.

  2. The semantic similarity ensemble

    Andrea Ballatore

    2013-12-01

    Full Text Available Computational measures of semantic similarity between geographic terms provide valuable support across geographic information retrieval, data mining, and information integration. To date, a wide variety of approaches to geo-semantic similarity have been devised. A judgment of similarity is not intrinsically right or wrong, but obtains a certain degree of cognitive plausibility, depending on how closely it mimics human behavior. Thus selecting the most appropriate measure for a specific task is a significant challenge. To address this issue, we make an analogy between computational similarity measures and soliciting domain expert opinions, which incorporate a subjective set of beliefs, perceptions, hypotheses, and epistemic biases. Following this analogy, we define the semantic similarity ensemble (SSE as a composition of different similarity measures, acting as a panel of experts having to reach a decision on the semantic similarity of a set of geographic terms. The approach is evaluated in comparison to human judgments, and results indicate that an SSE performs better than the average of its parts. Although the best member tends to outperform the ensemble, all ensembles outperform the average performance of each ensemble's member. Hence, in contexts where the best measure is unknown, the ensemble provides a more cognitively plausible approach.

  3. Gender similarities and differences.

    Hyde, Janet Shibley

    2014-01-01

    Whether men and women are fundamentally different or similar has been debated for more than a century. This review summarizes major theories designed to explain gender differences: evolutionary theories, cognitive social learning theory, sociocultural theory, and expectancy-value theory. The gender similarities hypothesis raises the possibility of theorizing gender similarities. Statistical methods for the analysis of gender differences and similarities are reviewed, including effect sizes, meta-analysis, taxometric analysis, and equivalence testing. Then, relying mainly on evidence from meta-analyses, gender differences are reviewed in cognitive performance (e.g., math performance), personality and social behaviors (e.g., temperament, emotions, aggression, and leadership), and psychological well-being. The evidence on gender differences in variance is summarized. The final sections explore applications of intersectionality and directions for future research. PMID:23808917

  4. Cluster Tree Based Hybrid Document Similarity Measure

    M. Varshana Devi

    2015-10-01

    Full Text Available similarity measure is established to measure the hybrid similarity. In cluster tree, the hybrid similarity measure can be calculated for the random data even it may not be the co-occurred and generate different views. Different views of tree can be combined and choose the one which is significant in cost. A method is proposed to combine the multiple views. Multiple views are represented by different distance measures into a single cluster. Comparing the cluster tree based hybrid similarity with the traditional statistical methods it gives the better feasibility for intelligent based search. It helps in improving the dimensionality reduction and semantic analysis.

  5. Constructive Similarity of Soils

    Koudelka, Petr

    Singapore : Design, CRC a iTEK CMS Web solutions, 2012 - (Phoon, K.; Beer, M.; Quek, S.; Pang, S.), s. 206-211 ISBN 978-981-07-2218-0. [APS on Structural Reliability and Its Application – Sustainable Civil Infrastructures /5./. Singapore (SG), 23.05.2012-25.05.2012] Grant ostatní: GA ČR(CZ) GAP105/11/1160 Institutional support: RVO:68378297 Keywords : model similarity * database of soil properties * soil similarity characteristic * statistical analysis * ultimate limit states Subject RIV: JM - Building Engineering

  6. Music Retrieval based on Melodic Similarity

    Typke, R.

    2007-01-01

    This thesis introduces a method for measuring melodic similarity for notated music such as MIDI files. This music search algorithm views music as sets of notes that are represented as weighted points in the two-dimensional space of time and pitch. Two point sets can be compared by calculating how mu

  7. Information Extraction Using Distant Supervision and Semantic Similarities

    PARK, Y.

    2016-02-01

    Full Text Available Information extraction is one of the main research tasks in natural language processing and text mining that extracts useful information from unstructured sentences. Information extraction techniques include named entity recognition, relation extraction, and co-reference resolution. Among them, relation extraction refers to a task that extracts semantic relations between entities such as personal and geographic names in documents. This is an important research area, which is used in knowledge base construction and question and answering systems. This study presents relation extraction using a distant supervision learning technique among semi-supervised learning methods, which have been spotlighted in recent years to reduce human manual work and costs required for supervised learning. That is, this study proposes a method that can improve relation extraction by improving a distant supervision learning technique by applying a clustering method to create a learning corpus and semantic analysis for relation extraction that is difficult to identify using existing distant supervision. Through comparison experiments of various semantic similarity comparison methods, similarity calculation methods that are useful to relation extraction using distant supervision are searched, and a large number of accurate relation triples can be extracted using the proposed structural advantages and semantic similarity comparison.

  8. Similarity of molecular shape.

    Meyer, A Y; Richards, W G

    1991-10-01

    The similarity of one molecule to another has usually been defined in terms of electron densities or electrostatic potentials or fields. Here it is expressed as a function of the molecular shape. Formulations of similarity (S) reduce to very simple forms, thus rendering the computerised calculation straightforward and fast. 'Elements of similarity' are identified, in the same spirit as 'elements of chirality', except that the former are understood to be variable rather than present-or-absent. Methods are presented which bypass the time-consuming mathematical optimisation of the relative orientation of the molecules. Numerical results are presented and examined, with emphasis on the similarity of isomers. At the extreme, enantiomeric pairs are considered, where it is the dissimilarity (D = 1 - S) that is of consequence. We argue that chiral molecules can be graded by dissimilarity, and show that D is the shape-analog of the 'chirality coefficient', with the simple form of the former opening up numerical access to the latter. PMID:1770379

  9. The Qualitative Similarity Hypothesis

    Paul, Peter V.; Lee, Chongmin

    2010-01-01

    Evidence is presented for the qualitative similarity hypothesis (QSH) with respect to children and adolescents who are d/Deaf or hard of hearing. The primary focus is on the development of English language and literacy skills, and some information is provided on the acquisition of English as a second language. The QSH is briefly discussed within…

  10. Limiting Similarity Revisited

    Szabo, P; Meszena, G.

    2005-01-01

    We reinvestigate the validity of the limiting similarity principle via numerical simulations of the Lotka-Volterra model. A Gaussian competition kernel is employed to describe decreasing competition with increasing difference in a one-dimensional phenotype variable. The simulations are initiated by a large number of species, evenly distributed along the phenotype axis. Exceptionally, the Gaussian carrying capacity supports coexistence of all species, initially present. In case of any other, d...

  11. The Application of Similar Image Retrieval in Electronic Commerce

    YuPing Hu

    2014-01-01

    Full Text Available Traditional online shopping platform (OSP, which searches product information by keywords, faces three problems: indirect search mode, large search space, and inaccuracy in search results. For solving these problems, we discuss and research the application of similar image retrieval in electronic commerce. Aiming at improving the network customers’ experience and providing merchants with the accuracy of advertising, we design a reasonable and extensive electronic commerce application system, which includes three subsystems: image search display subsystem, image search subsystem, and product information collecting subsystem. This system can provide seamless connection between information platform and OSP, on which consumers can automatically and directly search similar images according to the pictures from information platform. At the same time, it can be used to provide accuracy of internet marketing for enterprises. The experiment shows the efficiency of constructing the system.

  12. The application of similar image retrieval in electronic commerce.

    Hu, YuPing; Yin, Hua; Han, Dezhi; Yu, Fei

    2014-01-01

    Traditional online shopping platform (OSP), which searches product information by keywords, faces three problems: indirect search mode, large search space, and inaccuracy in search results. For solving these problems, we discuss and research the application of similar image retrieval in electronic commerce. Aiming at improving the network customers' experience and providing merchants with the accuracy of advertising, we design a reasonable and extensive electronic commerce application system, which includes three subsystems: image search display subsystem, image search subsystem, and product information collecting subsystem. This system can provide seamless connection between information platform and OSP, on which consumers can automatically and directly search similar images according to the pictures from information platform. At the same time, it can be used to provide accuracy of internet marketing for enterprises. The experiment shows the efficiency of constructing the system. PMID:24883411

  13. A new approach for finding semantic similar scientific articles

    Masumeh Islami Nasab; Reza Javidan

    2015-01-01

    Calculating article similarities enables users to find similar articles and documents in a collection of articles. Two similar documents are extremely helpful for text applications such as document-to-document similarity search, plagiarism checker, text mining for repetition, and text filtering. This paper proposes a new method for calculating the semantic similarities of articles. WordNet is used to find word semantic associations. The proposed technique first compares the similarity of each...

  14. An efficient and accurate 3D displacements tracking strategy for digital volume correlation

    Pan, Bing

    2014-07-01

    Owing to its inherent computational complexity, practical implementation of digital volume correlation (DVC) for internal displacement and strain mapping faces important challenges in improving its computational efficiency. In this work, an efficient and accurate 3D displacement tracking strategy is proposed for fast DVC calculation. The efficiency advantage is achieved by using three improvements. First, to eliminate the need of updating Hessian matrix in each iteration, an efficient 3D inverse compositional Gauss-Newton (3D IC-GN) algorithm is introduced to replace existing forward additive algorithms for accurate sub-voxel displacement registration. Second, to ensure the 3D IC-GN algorithm that converges accurately and rapidly and avoid time-consuming integer-voxel displacement searching, a generalized reliability-guided displacement tracking strategy is designed to transfer accurate and complete initial guess of deformation for each calculation point from its computed neighbors. Third, to avoid the repeated computation of sub-voxel intensity interpolation coefficients, an interpolation coefficient lookup table is established for tricubic interpolation. The computational complexity of the proposed fast DVC and the existing typical DVC algorithms are first analyzed quantitatively according to necessary arithmetic operations. Then, numerical tests are performed to verify the performance of the fast DVC algorithm in terms of measurement accuracy and computational efficiency. The experimental results indicate that, compared with the existing DVC algorithm, the presented fast DVC algorithm produces similar precision and slightly higher accuracy at a substantially reduced computational cost. © 2014 Elsevier Ltd.

  15. The qualitative similarity hypothesis.

    Paul, Peter V; Lee, Chongmin

    2010-01-01

    Evidence is presented for the qualitative similarity hypothesis (QSH) with respect to children and adolescents who are d/Deaf or hard of hearing. The primary focus is on the development of English language and literacy skills, and some information is provided on the acquisition of English as a second language. The QSH is briefly discussed within the purview of two groups of cognitive models: those that emphasize the cognitive development of individuals and those that pertain to disciplinary or knowledge structures. It is argued that the QSH has scientific merit with implications for classroom instruction. Future research should examine the validity of the QSH in other disciplines such as mathematics and science and should include perspectives from social as well as cognitive models. PMID:20415280

  16. Self Similar Optical Fiber

    Lai, Zheng-Xuan

    This research proposes Self Similar optical fiber (SSF) as a new type of optical fiber. It has a special core that consists of self similar structure. Such a structure is obtained by following the formula for generating iterated function systems (IFS) in Fractal Theory. The resulted SSF can be viewed as a true fractal object in optical fibers. In addition, the method of fabricating SSF makes it possible to generate desired structures exponentially in numbers, whereas it also allows lower scale units in the structure to be reduced in size exponentially. The invention of SSF is expected to greatly ease the production of optical fiber when a large number of small hollow structures are needed in the core of the optical fiber. This dissertation will analyze the core structure of SSF based on fractal theory. Possible properties from the structural characteristics and the corresponding applications are explained. Four SSF samples were obtained through actual fabrication in a laboratory environment. Different from traditional conductive heating fabrication system, I used an in-house designed furnace that incorporated a radiation heating method, and was equipped with automated temperature control system. The obtained samples were examined through spectrum tests. Results from the tests showed that SSF does have the optical property of delivering light in a certain wavelength range. However, SSF as a new type of optical fiber requires a systematic research to find out the theory that explains its structure and the associated optical properties. The fabrication and quality of SSF also needs to be improved for product deployment. As a start of this extensive research, this dissertation work opens the door to a very promising new area in optical fiber research.

  17. The place of highly accurate methods by RNAA in metrology

    With the introduction of physical metrological concepts to chemical analysis which require that the result should be accompanied by uncertainty statement written down in terms of Sl units, several researchers started to consider lD-MS as the only method fulfilling this requirement. However, recent publications revealed that in certain cases also some expert laboratories using lD-MS and analyzing the same material, produced results for which their uncertainty statements did not overlap, what theoretically should not have taken place. This shows that no monopoly is good in science and it would be desirable to widen the set of methods acknowledged as primary in inorganic trace analysis. Moreover, lD-MS cannot be used for monoisotopic elements. The need for searching for other methods having similar metrological quality as the lD-MS seems obvious. In this paper, our long-time experience on devising highly accurate ('definitive') methods by RNAA for the determination of selected trace elements in biological materials is reviewed. The general idea of definitive methods based on combination of neutron activation with the highly selective and quantitative isolation of the indicator radionuclide by column chromatography followed by gamma spectrometric measurement is reminded and illustrated by examples of the performance of such methods when determining Cd, Co, Mo, etc. lt is demonstrated that such methods are able to provide very reliable results with very low levels of uncertainty traceable to Sl units

  18. Concept Search: Semantics Enabled Information Retrieval

    Giunchiglia, Fausto; Kharkevich, Uladzimir; Zaihrayeu, Ilya

    2010-01-01

    In this paper we present a novel approach, called Concept Search, which extends syntactic search, i.e., search based on the computation of string similarity between words, with semantic search, i.e., search based on the computation of semantic relations between concepts. The key idea of Concept Search is to operate on complex concepts and to maximally exploit the semantic information available, reducing to syntactic search only when necessary, i.e., when no semantic information is available. ...

  19. Accurate guitar tuning by cochlear implant musicians.

    Thomas Lu

    Full Text Available Modern cochlear implant (CI users understand speech but find difficulty in music appreciation due to poor pitch perception. Still, some deaf musicians continue to perform with their CI. Here we show unexpected results that CI musicians can reliably tune a guitar by CI alone and, under controlled conditions, match simultaneously presented tones to <0.5 Hz. One subject had normal contralateral hearing and produced more accurate tuning with CI than his normal ear. To understand these counterintuitive findings, we presented tones sequentially and found that tuning error was larger at ∼ 30 Hz for both subjects. A third subject, a non-musician CI user with normal contralateral hearing, showed similar trends in performance between CI and normal hearing ears but with less precision. This difference, along with electric analysis, showed that accurate tuning was achieved by listening to beats rather than discriminating pitch, effectively turning a spectral task into a temporal discrimination task.

  20. Accurate guitar tuning by cochlear implant musicians.

    Lu, Thomas; Huang, Juan; Zeng, Fan-Gang

    2014-01-01

    Modern cochlear implant (CI) users understand speech but find difficulty in music appreciation due to poor pitch perception. Still, some deaf musicians continue to perform with their CI. Here we show unexpected results that CI musicians can reliably tune a guitar by CI alone and, under controlled conditions, match simultaneously presented tones to <0.5 Hz. One subject had normal contralateral hearing and produced more accurate tuning with CI than his normal ear. To understand these counterintuitive findings, we presented tones sequentially and found that tuning error was larger at ∼ 30 Hz for both subjects. A third subject, a non-musician CI user with normal contralateral hearing, showed similar trends in performance between CI and normal hearing ears but with less precision. This difference, along with electric analysis, showed that accurate tuning was achieved by listening to beats rather than discriminating pitch, effectively turning a spectral task into a temporal discrimination task. PMID:24651081

  1. Towards accurate emergency response behavior

    Nuclear reactor operator emergency response behavior has persisted as a training problem through lack of information. The industry needs an accurate definition of operator behavior in adverse stress conditions, and training methods which will produce the desired behavior. Newly assembled information from fifty years of research into human behavior in both high and low stress provides a more accurate definition of appropriate operator response, and supports training methods which will produce the needed control room behavior. The research indicates that operator response in emergencies is divided into two modes, conditioned behavior and knowledge based behavior. Methods which assure accurate conditioned behavior, and provide for the recovery of knowledge based behavior, are described in detail

  2. Similarity search and data mining techniques for advanced database systems.

    Pryakhin, Alexey

    2006-01-01

    Modern automated methods for measurement, collection, and analysis of data in industry and science are providing more and more data with drastically increasing structure complexity. On the one hand, this growing complexity is justified by the need for a richer and more precise description of real-world objects, on the other hand it is justified by the rapid progress in measurement and analysis techniques that allow the user a versatile exploration of objects. In order to manage the huge volum...

  3. Time Searching for Similar Binary Vectors in Associative Memory

    Frolov, A. A.; Húsek, Dušan; Rachkovskij, D.

    2006-01-01

    Roč. 42, č. 5 (2006), s. 615-623. ISSN 1060-0396 R&D Projects: GA MŠk(CZ) 1M0567 Institutional research plan: CEZ:AV0Z10300504 Keywords : associative memory * neural network * Hopfield network * binary vector * indexing * hashing Subject RIV: BB - Applied Statistics, Operational Research

  4. Efficient Similarity Retrieval in Music Databases

    Ruxanda, Maria Magdalena; Jensen, Christian Søndergaard

    2006-01-01

    Audio music is increasingly becoming available in digital form, and the digital music collections of individuals continue to grow. Addressing the need for effective means of retrieving music from such collections, this paper proposes new techniques for content-based similarity search. Each music...... object is modeled as a time sequence of high-dimensional feature vectors, and dynamic time warping (DTW) is used as the similarity measure. To accomplish this, the paper extends techniques for time-series-length reduction and lower bounding of DTW distance to the multi-dimensional case. Further, the...

  5. Professional Microsoft search fast search, Sharepoint search, and search server

    Bennett, Mark; Kehoe, Miles; Voskresenskaya, Natalya

    2010-01-01

    Use Microsoft's latest search-based technology-FAST search-to plan, customize, and deploy your search solutionFAST is Microsoft's latest intelligent search-based technology that boasts robustness and an ability to integrate business intelligence with Search. This in-depth guide provides you with advanced coverage on FAST search and shows you how to use it to plan, customize, and deploy your search solution, with an emphasis on SharePoint 2010 and Internet-based search solutions.With a particular appeal for anyone responsible for implementing and managing enterprise search, this book presents t

  6. Accurate determination of antenna directivity

    Dich, Mikael

    1997-01-01

    The derivation of a formula for accurate estimation of the total radiated power from a transmitting antenna for which the radiated power density is known in a finite number of points on the far-field sphere is presented. The main application of the formula is determination of directivity from power...

  7. Integrated Semantic Similarity Model Based on Ontology

    LIU Ya-Jun; ZHAO Yun

    2004-01-01

    To solve the problem of the inadequacy of semantic processing in the intelligent question answering system, an integrated semantic similarity model which calculates the semantic similarity using the geometric distance and information content is presented in this paper.With the help of interrelationship between concepts, the information content of concepts and the strength of the edges in the ontology network, we can calculate the semantic similarity between two concepts and provide information for the further calculation of the semantic similarity between user's question and answers in knowlegdge base.The results of the experiments on the prototype have shown that the semantic problem in natural language processing can also be solved with the help of the knowledge and the abundant semantic information in ontology.More than 90% accuracy with less than 50 ms average searching time in the intelligent question answering prototype system based on ontology has been reached.The result is very satisfied.

  8. The similarity principle - on using models correctly

    Landberg, L.; Mortensen, N.G.; Rathmann, O.;

    2003-01-01

    This paper will present some guiding principles on the most accurate use of the WAsP program in particular, but the principle can be applied to the use of any linear model which predicts some quantity at one location based on another. We have felt a need to lay out these principles out explicitly......, due to the many, many users and the uses (and misuses) of the WAsP program. Put simply, the similarity principle states that one should chose a predictor site which – in as many ways as possible – is similar to the predicted site....

  9. Search Cloud

    ... https://www.nlm.nih.gov/medlineplus/cloud.html Search Cloud To use the sharing features on this page, please enable JavaScript. Share the MedlinePlus search cloud with your users by embedding our search ...

  10. Search Tips

    ... do not need to use AND because the search engine automatically finds resources containing all of your search ... Use as a wildcard when you want the search engine to fill in the blank for you; you ...

  11. Search Cloud

    ... this page: https://medlineplus.gov/cloud.html Search Cloud To use the sharing features on this page, ... Top 110 zoster vaccine Share the MedlinePlus search cloud with your users by embedding our search cloud ...

  12. Accurate pose estimation for forensic identification

    Merckx, Gert; Hermans, Jeroen; Vandermeulen, Dirk

    2010-04-01

    In forensic authentication, one aims to identify the perpetrator among a series of suspects or distractors. A fundamental problem in any recognition system that aims for identification of subjects in a natural scene is the lack of constrains on viewing and imaging conditions. In forensic applications, identification proves even more challenging, since most surveillance footage is of abysmal quality. In this context, robust methods for pose estimation are paramount. In this paper we will therefore present a new pose estimation strategy for very low quality footage. Our approach uses 3D-2D registration of a textured 3D face model with the surveillance image to obtain accurate far field pose alignment. Starting from an inaccurate initial estimate, the technique uses novel similarity measures based on the monogenic signal to guide a pose optimization process. We will illustrate the descriptive strength of the introduced similarity measures by using them directly as a recognition metric. Through validation, using both real and synthetic surveillance footage, our pose estimation method is shown to be accurate, and robust to lighting changes and image degradation.

  13. Notions of similarity for computational biology models

    Waltemath, Dagmar

    2016-03-21

    Computational models used in biology are rapidly increasing in complexity, size, and numbers. To build such large models, researchers need to rely on software tools for model retrieval, model combination, and version control. These tools need to be able to quantify the differences and similarities between computational models. However, depending on the specific application, the notion of similarity may greatly vary. A general notion of model similarity, applicable to various types of models, is still missing. Here, we introduce a general notion of quantitative model similarities, survey the use of existing model comparison methods in model building and management, and discuss potential applications of model comparison. To frame model comparison as a general problem, we describe a theoretical approach to defining and computing similarities based on different model aspects. Potentially relevant aspects of a model comprise its references to biological entities, network structure, mathematical equations and parameters, and dynamic behaviour. Future similarity measures could combine these model aspects in flexible, problem-specific ways in order to mimic users\\' intuition about model similarity, and to support complex model searches in databases.

  14. SProt: sphere-based protein structure similarity algorithm

    2011-01-01

    Background Similarity search in protein databases is one of the most essential issues in computational proteomics. With the growing number of experimentally resolved protein structures, the focus shifted from sequences to structures. The area of structure similarity forms a big challenge since even no standard definition of optimal structure similarity exists in the field. Results We propose a protein structure similarity measure called SProt. SProt concentrates on high-quality modeling of lo...

  15. Measuring Personalization of Web Search

    Hannak, Aniko; Sapiezynski, Piotr; Kakhki, Arash Molavi;

    2013-01-01

    Web search is an integral part of our daily lives. Recently, there has been a trend of personalization in Web search, where different users receive different results for the same search query. The increasing personalization is leading to concerns about Filter Bubble effects, where certain users...... are simply unable to access information that the search engines’ algorithm decidesis irrelevant. Despitetheseconcerns, there has been little quantification of the extent of personalization in Web search today, or the user attributes that cause it. In light of this situation, we make three contributions....... First, we develop a methodology for measuring personalization in Web search results. While conceptually simple, there are numerous details that our methodology must handle in order to accurately attribute differences in search results to personalization. Second, we apply our methodology to 200 users...

  16. Fast and accurate marker-based projective registration method for uncalibrated transmission electron microscope tilt series

    This paper presents a fast and accurate marker-based automatic registration technique for aligning uncalibrated projections taken from a transmission electron microscope (TEM) with different tilt angles and orientations. Most of the existing TEM image alignment methods estimate the similarity between images using the projection model with least-squares metric and guess alignment parameters by computationally expensive nonlinear optimization schemes. Approaches based on the least-squares metric which is sensitive to outliers may cause misalignment since automatic tracking methods, though reliable, can produce a few incorrect trajectories due to a large number of marker points. To decrease the influence of outliers, we propose a robust similarity measure using the projection model with a Gaussian weighting function. This function is very effective in suppressing outliers that are far from correct trajectories and thus provides a more robust metric. In addition, we suggest a fast search strategy based on the non-gradient Powell's multidimensional optimization scheme to speed up optimization as only meaningful parameters are considered during iterative projection model estimation. Experimental results show that our method brings more accurate alignment with less computational cost compared to conventional automatic alignment methods.

  17. Search Patterns

    Morville, Peter

    2010-01-01

    What people are saying about Search Patterns "Search Patterns is a delight to read -- very thoughtful and thought provoking. It's the most comprehensive survey of designing effective search experiences I've seen." --Irene Au, Director of User Experience, Google "I love this book! Thanks to Peter and Jeffery, I now know that search (yes, boring old yucky who cares search) is one of the coolest ways around of looking at the world." --Dan Roam, author, The Back of the Napkin (Portfolio Hardcover) "Search Patterns is a playful guide to the practical concerns of search interface design. It cont

  18. Stochastic Self-Similar and Fractal Universe

    Iovane, G; Tortoriello, F S

    2004-01-01

    The structures formation of the Universe appears as if it were a classically self-similar random process at all astrophysical scales. An agreement is demonstrated for the present hypotheses of segregation with a size of astrophysical structures by using a comparison between quantum quantities and astrophysical ones. We present the observed segregated Universe as the result of a fundamental self-similar law, which generalizes the Compton wavelength relation. It appears that the Universe has a memory of its quantum origin as suggested by R.Penrose with respect to quasi-crystal. A more accurate analysis shows that the present theory can be extended from the astrophysical to the nuclear scale by using generalized (stochastically) self-similar random process. This transition is connected to the relevant presence of the electromagnetic and nuclear interactions inside the matter. In this sense, the presented rule is correct from a subatomic scale to an astrophysical one. We discuss the near full agreement at organic...

  19. Accurate ab initio spin densities

    Boguslawski, Katharina; Legeza, Örs; Reiher, Markus

    2012-01-01

    We present an approach for the calculation of spin density distributions for molecules that require very large active spaces for a qualitatively correct description of their electronic structure. Our approach is based on the density-matrix renormalization group (DMRG) algorithm to calculate the spin density matrix elements as basic quantity for the spatially resolved spin density distribution. The spin density matrix elements are directly determined from the second-quantized elementary operators optimized by the DMRG algorithm. As an analytic convergence criterion for the spin density distribution, we employ our recently developed sampling-reconstruction scheme [J. Chem. Phys. 2011, 134, 224101] to build an accurate complete-active-space configuration-interaction (CASCI) wave function from the optimized matrix product states. The spin density matrix elements can then also be determined as an expectation value employing the reconstructed wave function expansion. Furthermore, the explicit reconstruction of a CA...

  20. Accurate Modeling of Advanced Reflectarrays

    Zhou, Min

    of the incident field, the choice of basis functions, and the technique to calculate the far-field. Based on accurate reference measurements of two offset reflectarrays carried out at the DTU-ESA Spherical NearField Antenna Test Facility, it was concluded that the three latter factors are particularly important...... to the conventional phase-only optimization technique (POT), the geometrical parameters of the array elements are directly optimized to fulfill the far-field requirements, thus maintaining a direct relation between optimization goals and optimization variables. As a result, better designs can be obtained compared...... using the GDOT to demonstrate its capabilities. To verify the accuracy of the GDOT, two offset contoured beam reflectarrays that radiate a high-gain beam on a European coverage have been designed and manufactured, and subsequently measured at the DTU-ESA Spherical Near-Field Antenna Test Facility...

  1. Accurate thickness measurement of graphene

    Shearer, Cameron J.; Slattery, Ashley D.; Stapleton, Andrew J.; Shapter, Joseph G.; Gibson, Christopher T.

    2016-03-01

    Graphene has emerged as a material with a vast variety of applications. The electronic, optical and mechanical properties of graphene are strongly influenced by the number of layers present in a sample. As a result, the dimensional characterization of graphene films is crucial, especially with the continued development of new synthesis methods and applications. A number of techniques exist to determine the thickness of graphene films including optical contrast, Raman scattering and scanning probe microscopy techniques. Atomic force microscopy (AFM), in particular, is used extensively since it provides three-dimensional images that enable the measurement of the lateral dimensions of graphene films as well as the thickness, and by extension the number of layers present. However, in the literature AFM has proven to be inaccurate with a wide range of measured values for single layer graphene thickness reported (between 0.4 and 1.7 nm). This discrepancy has been attributed to tip-surface interactions, image feedback settings and surface chemistry. In this work, we use standard and carbon nanotube modified AFM probes and a relatively new AFM imaging mode known as PeakForce tapping mode to establish a protocol that will allow users to accurately determine the thickness of graphene films. In particular, the error in measuring the first layer is reduced from 0.1-1.3 nm to 0.1-0.3 nm. Furthermore, in the process we establish that the graphene-substrate adsorbate layer and imaging force, in particular the pressure the tip exerts on the surface, are crucial components in the accurate measurement of graphene using AFM. These findings can be applied to other 2D materials.

  2. P2P Concept Search: Some Preliminary Results

    Giunchiglia, Fausto; Kharkevich, Uladzimir; Noori, S.R.H

    2009-01-01

    Concept Search extends syntactic search, i.e., search based on the computation of string similarity between words, with semantic search, i.e., search based on the computation of semantic relations between complex concepts. It allows us to deal with ambiguity of natural language. P2P Concept Search extends Concept Search by allowing distributed semantic search over structured P2P network. The key idea is to exploit distributed, rather than centralized, background knowledge and indices.

  3. Predicting user click behaviour in search engine advertisements

    Daryaie Zanjani, Mohammad; Khadivi, Shahram

    2015-10-01

    According to the specific requirements and interests of users, search engines select and display advertisements that match user needs and have higher probability of attracting users' attention based on their previous search history. New objects such as user, advertisement or query cause a deterioration of precision in targeted advertising due to their lack of history. This article surveys this challenge. In the case of new objects, we first extract similar observed objects to the new object and then we use their history as the history of new object. Similarity between objects is measured based on correlation, which is a relation between user and advertisement when the advertisement is displayed to the user. This method is used for all objects, so it has helped us to accurately select relevant advertisements for users' queries. In our proposed model, we assume that similar users behave in a similar manner. We find that users with few queries are similar to new users. We will show that correlation between users and advertisements' keywords is high. Thus, users who pay attention to advertisements' keywords, click similar advertisements. In addition, users who pay attention to specific brand names might have similar behaviours too.

  4. Functional Similarity and Interpersonal Attraction.

    Neimeyer, Greg J.; Neimeyer, Robert A.

    1981-01-01

    Students participated in dyadic disclosure exercises over a five-week period. Results indicated members of high functional similarity dyads evidenced greater attraction to one another than did members of low functional similarity dyads. "Friendship" pairs of male undergraduates displayed greater functional similarity than did "nominal" pairs from…

  5. Contextual Factors for Finding Similar Experts

    Hofmann, Katja; Balog, Krisztian; Bogers, Toine;

    2010-01-01

    -seeking models, are rarely taken into account. In this article, we extend content-based expert-finding approaches with contextual factors that have been found to influence human expert finding. We focus on a task of science communicators in a knowledge-intensive environment, the task of finding similar experts......-centered perspective. The main focus has been on developing content-based algorithms similar to document search. These algorithms identify matching experts primarily on the basis of the textual content of documents with which experts are associated. Other factors, such as the ones identified by expertise......, given an example expert. Our approach combines expertise-seeking and retrieval research. First, we conduct a user study to identify contextual factors that may play a role in the studied task and environment. Then, we design expert retrieval models to capture these factors. We combine these with content-based...

  6. A More Accurate Fourier Transform

    Courtney, Elya

    2015-01-01

    Fourier transform methods are used to analyze functions and data sets to provide frequencies, amplitudes, and phases of underlying oscillatory components. Fast Fourier transform (FFT) methods offer speed advantages over evaluation of explicit integrals (EI) that define Fourier transforms. This paper compares frequency, amplitude, and phase accuracy of the two methods for well resolved peaks over a wide array of data sets including cosine series with and without random noise and a variety of physical data sets, including atmospheric $\\mathrm{CO_2}$ concentrations, tides, temperatures, sound waveforms, and atomic spectra. The FFT uses MIT's FFTW3 library. The EI method uses the rectangle method to compute the areas under the curve via complex math. Results support the hypothesis that EI methods are more accurate than FFT methods. Errors range from 5 to 10 times higher when determining peak frequency by FFT, 1.4 to 60 times higher for peak amplitude, and 6 to 10 times higher for phase under a peak. The ability t...

  7. Improved Search Techniques

    Albornoz, Caleb Ronald

    2012-01-01

    Thousands of millions of documents are stored and updated daily in the World Wide Web. Most of the information is not efficiently organized to build knowledge from the stored data. Nowadays, search engines are mainly used by users who rely on their skills to look for the information needed. This paper presents different techniques search engine users can apply in Google Search to improve the relevancy of search results. According to the Pew Research Center, the average person spends eight hours a month searching for the right information. For instance, a company that employs 1000 employees wastes $2.5 million dollars on looking for nonexistent and/or not found information. The cost is very high because decisions are made based on the information that is readily available to use. Whenever the information necessary to formulate an argument is not available or found, poor decisions may be made and mistakes will be more likely to occur. Also, the survey indicates that only 56% of Google users feel confident with their current search skills. Moreover, just 76% of the information that is available on the Internet is accurate.

  8. Analytical Searching.

    Pappas, Marjorie L.

    1995-01-01

    Discusses analytical searching, a process that enables searchers of electronic resources to develop a planned strategy by combining words or phrases with Boolean operators. Defines simple and complex searching, and describes search strategies developed with Boolean logic and truncation. Provides guidelines for teaching students analytical…

  9. Relativistic mergers of black hole binaries have large, similar masses, low spins and are circular

    Amaro-Seoane, Pau; Chen, Xian

    2016-05-01

    Gravitational waves are a prediction of general relativity, and with ground-based detectors now running in their advanced configuration, we will soon be able to measure them directly for the first time. Binaries of stellar-mass black holes are among the most interesting sources for these detectors. Unfortunately, the many different parameters associated with the problem make it difficult to promptly produce a large set of waveforms for the search in the data stream. To reduce the number of templates to develop, one must restrict some of the physical parameters to a certain range of values predicted by either (electromagnetic) observations or theoretical modelling. In this work, we show that `hyperstellar' black holes (HSBs) with masses 30 ≲ MBH/M⊙ ≲ 100, i.e black holes significantly larger than the nominal 10 M⊙, will have an associated low value for the spin, i.e. a similar masses. We also address the distribution of the eccentricities of HSB binaries in dense stellar systems using a large suite of three-body scattering experiments that include binary-single interactions and long-lived hierarchical systems with a highly accurate integrator, including relativistic corrections up to O(1/c^5). We find that most sources in the detector band will have nearly zero eccentricities. This correlation between large, similar masses, low spin and low eccentricity will help to accelerate the searches for gravitational-wave signals.

  10. A new adaptive fast motion estimation algorithm based on local motion similarity degree (LMSD)

    LIU Long; HAN Chongzhao; BAI Yan

    2005-01-01

    In the motion vector field adaptive search technique (MVFAST) and the predictive motion vector field adaptive search technique (PMVFAST), the size of the largest motion vector from the three adjacent blocks (left, top, top-right) is compared with the threshold to select different search scheme. But a suitable search center and search pattern will not be selected in the adaptive search technique when the adjacent motion vectors are not coherent in local region. This paper presents an efficient adaptive search algorithm. The motion vector variation degree (MVVD) is considered a reasonable factor for adaptive search selection. By the relationship between local motion similarity degree (LMSD) and the variation degree of motion vector (MVVD), the motion vectors are classified as three categories according to corresponding LMSD; then different proposed search schemes are adopted for motion estimation. The experimental results show that the proposed algorithm has a significant computational speedup compared with MVFAST and PMVFAST algorithms, and offers a similar, even better performance.

  11. A COMPARISON OF SEMANTIC SIMILARITY MODELS IN EVALUATING CONCEPT SIMILARITY

    Q. X. Xu

    2012-08-01

    Full Text Available The semantic similarities are important in concept definition, recognition, categorization, interpretation, and integration. Many semantic similarity models have been established to evaluate semantic similarities of objects or/and concepts. To find out the suitability and performance of different models in evaluating concept similarities, we make a comparison of four main types of models in this paper: the geometric model, the feature model, the network model, and the transformational model. Fundamental principles and main characteristics of these models are introduced and compared firstly. Land use and land cover concepts of NLCD92 are employed as examples in the case study. The results demonstrate that correlations between these models are very high for a possible reason that all these models are designed to simulate the similarity judgement of human mind.

  12. Learning Multi-modal Similarity

    McFee, Brian

    2010-01-01

    In many applications involving multi-media data, the definition of similarity between items is integral to several key tasks, e.g., nearest-neighbor retrieval, classification, and recommendation. Data in such regimes typically exhibits multiple modalities, such as acoustic and visual content of video. Integrating such heterogeneous data to form a holistic similarity space is therefore a key challenge to be overcome in many real-world applications. We present a novel multiple kernel learning technique for integrating heterogeneous data into a single, unified similarity space. Our algorithm learns an optimal ensemble of kernel transfor- mations which conform to measurements of human perceptual similarity, as expressed by relative comparisons. To cope with the ubiquitous problems of subjectivity and inconsistency in multi- media similarity, we develop graph-based techniques to filter similarity measurements, resulting in a simplified and robust training procedure.

  13. Roget's Thesaurus and Semantic Similarity

    Jarmasz, Mario; Szpakowicz, Stan

    2012-01-01

    We have implemented a system that measures semantic similarity using a computerized 1987 Roget's Thesaurus, and evaluated it by performing a few typical tests. We compare the results of these tests with those produced by WordNet-based similarity measures. One of the benchmarks is Miller and Charles' list of 30 noun pairs to which human judges had assigned similarity measures. We correlate these measures with those computed by several NLP systems. The 30 pairs can be traced back to Rubenstein ...

  14. Aggregated search: a new information retrieval paradigm

    Kopliku, Arlind; Pinel-Sauvagnat, Karen; Boughanem, Mohand

    2014-01-01

    Traditional search engines return ranked lists of search results. It is up to the user to scroll this list, scan within different documents and assemble information that fulfill his/her information need. Aggregated search represents a new class of approaches where the information is not only retrieved but also assembled. This is the current evolution in Web search, where diverse content (images, videos, ...) and relational content (similar entities, features) are included in search results. I...

  15. Similarity-Based Prediction of Travel Times for Vehicles Traveling on Known Routes

    Tiesyte, Dalia; Jensen, Christian Søndergaard

    2008-01-01

    , historical data in combination with real-time data may be used to predict the future travel times of vehicles more accurately, thus improving the experience of the users who rely on such information. We propose a Nearest-Neighbor Trajectory (NNT) technique that identifies the historical trajectory that is......The use of centralized, real-time position tracking is proliferating in the areas of logistics and public transportation. Real-time positions can be used to provide up-to-date information to a variety of users, and they can also be accumulated for uses in subsequent data analyses. In particular...... trajectories of vehicles that travel along known routes. In empirical studies with real data from buses, we evaluate how well the proposed distance functions are capable of predicting future vehicle movements. Second, we propose a main-memory index structure that enables incremental similarity search and that...

  16. Towards a more accurate concept of fuels

    Full text: The introduction of LEU in Atucha and the approval of CARA show an advancement of the Argentine power stations fuels, which stimulate and show a direction to follow. In the first case, the use of enriched U fuel relax an important restriction related to neutronic economy; that means that it is possible to design less penalized fuels using more Zry. The second case allows a decrease in the lineal power of the rods, enabling a better performance of the fuel in normal and also in accident conditions. In this work we wish to emphasize this last point, trying to find a design in which the surface power of the rod is diminished. Hence, in accident conditions owing to lack of coolant, the cladding tube will not reach temperatures that will produce oxidation, with the corresponding H2 formation and with plasticity enough to form blisters, which will obstruct the reflooding and hydration that will produce fragility and rupture of the cladding tube, with the corresponding radioactive material dispersion. This work is oriented to find rods designs with quasi rectangular geometry to lower the surface power of the rods, in order to obtain a lower central temperature of the rod. Thus, critical temperatures will not be reached in case of lack of coolant. This design is becoming a reality after PPFAE's efforts in search of cladding tubes fabrication with different circumferential values, rectangular in particular. This geometry, with an appropriate pellet design, can minimize the pellet-cladding interaction and, through the accurate width election, non rectified pellets could be used. This means an important economy in pellets production, as well as an advance in the fabrication of fuels in gloves box and hot cells in the future. The sequence to determine critical geometrical parameters is described and some rod dispositions are explored

  17. Search and Recommendation

    Bogers, Toine

    2014-01-01

    In just a little over half a century, the field of information retrieval has experienced spectacular growth and success, with IR applications such as search engines becoming a billion-dollar industry in the past decades. Recommender systems have seen an even more meteoric rise to success with wide......-scale application by companies like Amazon, Facebook, and Netflix. But are search and recommendation really two different fields of research that address different problems with different sets of algorithms in papers published at distinct conferences? In my talk, I want to argue that search and recommendation are...... more similar than they have been treated in the past decade. By looking more closely at the tasks and problems that search and recommendation try to solve, at the algorithms used to solve these problems and at the way their performance is evaluated, I want to show that there is no clear black and white...

  18. Personalized recommendation with corrected similarity

    Personalized recommendation has attracted a surge of interdisciplinary research. Especially, similarity-based methods in applications of real recommendation systems have achieved great success. However, the computations of similarities are overestimated or underestimated, in particular because of the defective strategy of unidirectional similarity estimation. In this paper, we solve this drawback by leveraging mutual correction of forward and backward similarity estimations, and propose a new personalized recommendation index, i.e., corrected similarity based inference (CSI). Through extensive experiments on four benchmark datasets, the results show a greater improvement of CSI in comparison with these mainstream baselines. And a detailed analysis is presented to unveil and understand the origin of such difference between CSI and mainstream indices. (paper)

  19. Quantifying the similarities within fold space.

    Harrison, Andrew; Pearl, Frances; Mott, Richard; Thornton, Janet; Orengo, Christine

    2002-11-01

    We have used GRATH, a graph-based structure comparison algorithm, to map the similarities between the different folds observed in the CATH domain structure database. Statistical analysis of the distributions of the fold similarities has allowed us to assess the significance for any similarity. Therefore we have examined whether it is best to represent folds as discrete entities or whether, in fact, a more accurate model would be a continuum wherein folds overlap via common motifs. To do this we have introduced a new statistical measure of fold similarity, termed gregariousness. For a particular fold, gregariousness measures how many other folds have a significant structural overlap with that fold, typically comprising 40% or more of the larger structure. Gregarious folds often contain commonly occurring super-secondary structural motifs, such as beta-meanders, greek keys, alpha-beta plait motifs or alpha-hairpins, which are matching similar motifs in other folds. Apart from one example, all the most gregarious folds matching 20% or more of the other folds in the database, are alpha-beta proteins. They also occur in highly populated architectural regions of fold space, adopting sandwich-like arrangements containing two or more layers of alpha-helices and beta-strands.Domains that exhibit a low gregariousness, are those that have very distinctive folds, with few common motifs or motifs that are packed in unusual arrangements. Most of the superhelices exhibit low gregariousness despite containing some commonly occurring super-secondary structural motifs. In these folds, these common motifs are combined in an unusual way and represent a small proportion of the fold (<10%). Our results suggest that fold space may be considered as continuous for some architectural arrangements (e.g. alpha-beta sandwiches), in that super-secondary motifs can be used to link neighbouring fold groups. However, in other regions of fold space much more discrete topologies are observed with

  20. 38 CFR 4.46 - Accurate measurement.

    2010-07-01

    ... 38 Pensions, Bonuses, and Veterans' Relief 1 2010-07-01 2010-07-01 false Accurate measurement. 4... RATING DISABILITIES Disability Ratings The Musculoskeletal System § 4.46 Accurate measurement. Accurate measurement of the length of stumps, excursion of joints, dimensions and location of scars with respect...

  1. Self-similar aftershock rates

    Davidsen, Jörn

    2016-01-01

    In many important systems exhibiting crackling noise --- intermittent avalanche-like relaxation response with power-law and, thus, self-similar distributed event sizes --- the "laws" for the rate of activity after large events are not consistent with the overall self-similar behavior expected on theoretical grounds. This is in particular true for the case of seismicity and a satisfying solution to this paradox has remained outstanding. Here, we propose a generalized description of the aftershock rates which is both self-similar and consistent with all other known self-similar features. Comparing our theoretical predictions with high resolution earthquake data from Southern California we find excellent agreement, providing in particular clear evidence for a unified description of aftershocks and foreshocks. This may offer an improved way of time-dependent seismic hazard assessment and earthquake forecasting.

  2. Learning Multi-modal Similarity

    McFee, Brian; Lanckriet, Gert

    2010-01-01

    In many applications involving multi-media data, the definition of similarity between items is integral to several key tasks, e.g., nearest-neighbor retrieval, classification, and recommendation. Data in such regimes typically exhibits multiple modalities, such as acoustic and visual content of video. Integrating such heterogeneous data to form a holistic similarity space is therefore a key challenge to be overcome in many real-world applications. We present a novel multiple kernel learning t...

  3. Similarity measures for protein ensembles

    Lindorff-Larsen, Kresten; Ferkinghoff-Borg, Jesper

    2009-01-01

    Analyses of similarities and changes in protein conformation can provide important information regarding protein function and evolution. Many scores, including the commonly used root mean square deviation, have therefore been developed to quantify the similarities of different protein conformations...... synthetic example from molecular dynamics simulations. We then apply the algorithms to revisit the problem of ensemble averaging during structure determination of proteins, and find that an ensemble refinement method is able to recover the correct distribution of conformations better than standard single...

  4. SELF-SIMILAR TRAFFIC GENERATOR

    Linawati Linawati; I Made Suartika

    2009-01-01

    Network traffic generator can be produced using OPNET. OPNET generates the traffic as explicit traffic or background traffic. This paper demonstrates generating traffic in OPNET 7.0 as background traffic. The traffi generator that was simulated is self-similar traffic with different Hurst parameter. The simulation results proved that OPNET with background traffic function can be as a qualified self-similar traffic generator. These results can help in investigating and analysing network perfor...

  5. Molecular similarity of MDR inhibitors

    Simon Gibbons; Mire Zloh

    2004-01-01

    Abstract: The molecular similarity of multidrug resistance (MDR) inhibitors was evaluated using the point centred atom charge approach in an attempt to find some common features of structurally unrelated inhibitors. A series of inhibitors of bacterial MDR were studied and there is a high similarity between these in terms of their shape, presence and orientation of aromatic ring moieties. A comparison of the lipophilic properties of these molecules has also been conducted suggesting that this ...

  6. HOW DISSIMILARLY SIMILAR ARE BIOSIMILARS?

    Ramshankar Vijayalakshmi; Kesavan Sabitha; Krishnamurthy Arvind

    2012-01-01

    Recently Biopharmaceuticals are the new chemotherapeutical agents that are called as “Biosimilars” or “follow on protein products” by the European Medicines Agency (EMA) and the American regulatory agencies (Food and Drug Administration) respectively. Biosimilars are extremely similar to the reference molecule but not identical, however close their similarities may be. A regulatory framework is therefore in place to assess the application for marketing authorisation of biosimilars. When a bi...

  7. Method of similarity for cavitation

    The knowledge of possible cavitation in subassembly nozzles of the fast reactor core implies the realization of a fluid dynamic model test. We propose a method of similarity based on the non-dimensionalization of the equation of motion for viscous capillarity fluid issued from the Cahn and Hilliard model. Taking into account the dissolved gas effect, a condition of compatibility is determined. This condition must be respected by the fluid in experiment, along with the scaling between the two similar flows. (author)

  8. Supervised Learning with Similarity Functions

    Kar, Purushottam; Jain, Prateek

    2012-01-01

    We address the problem of general supervised learning when data can only be accessed through an (indefinite) similarity function between data points. Existing work on learning with indefinite kernels has concentrated solely on binary/multi-class classification problems. We propose a model that is generic enough to handle any supervised learning task and also subsumes the model previously proposed for classification. We give a "goodness" criterion for similarity functions w.r.t. a given superv...

  9. Faceted Search

    Tunkelang, Daniel

    2009-01-01

    We live in an information age that requires us, more than ever, to represent, access, and use information. Over the last several decades, we have developed a modern science and technology for information retrieval, relentlessly pursuing the vision of a "memex" that Vannevar Bush proposed in his seminal article, "As We May Think." Faceted search plays a key role in this program. Faceted search addresses weaknesses of conventional search approaches and has emerged as a foundation for interactive information retrieval. User studies demonstrate that faceted search provides more

  10. Are search committees really searching?

    Hoffmeir, Patricia A

    2003-02-01

    Academic chair searches are admittedly a labor-intensive process, but they are made more difficult and often lead to less-than-optimal outcomes because search committees spend their time "advertising," "looking," but not truly searching for academic chairs. At the onset, certain "realities" must be acknowledged, including (1) understanding that unless your organization is renowned in the specialty for which you are conducting the search, candidates won't be pounding at your door for a job, (2) searches that fail to include an overall assessment of the department in question are likely to miss the mark, (3) chairs must have demonstrated not only clinical expertise but also business savvy, (4) the best candidate is not necessarily someone who is already a department chair, (5) when it comes to chair searches, it's a buyer's market, and (6) the search process is inextricably linked to the success of the search. Key to the process of conducting an academic chair search are the judicious formation of the search committee; committee members' willingness to do their homework, attend all committee meeting, and keep the committee's activities confidential; crafting, not revising, the current job description for the open chair position; interviewing viable candidates rather than all candidates and adhering to a coordinated interviewing process; and evaluating internal and external candidates according to the same parameters. PMID:12584089

  11. Representation is representation of similarities.

    Edelman, S

    1998-08-01

    Advanced perceptual systems are faced with the problem of securing a principled (ideally, veridical) relationship between the world and its internal representation. I propose a unified approach to visual representation, addressing the need for superordinate and basic-level categorization and for the identification of specific instances of familiar categories. According to the proposed theory, a shape is represented internally by the responses of a small number of tuned modules, each broadly selective for some reference shape, whose similarity to the stimulus it measures. This amounts to embedding the stimulus in a low-dimensional proximal shape space spanned by the outputs of the active modules. This shape space supports representations of distal shape similarities that are veridical as Shepard's (1968) second-order isomorphisms (i.e., correspondence between distal and proximal similarities among shapes, rather than between distal shapes and their proximal representations). Representation in terms of similarities to reference shapes supports processing (e.g., discrimination) of shapes that are radically different from the reference ones, without the need for the computationally problematic decomposition into parts required by other theories. Furthermore, a general expression for similarity between two stimuli, based on comparisons to reference shapes, can be used to derive models of perceived similarity ranging from continuous, symmetric, and hierarchical ones, as in multidimensional scaling (Shepard 1980), to discrete and nonhierarchical ones, as in the general contrast models (Shepard & Arabie 1979; Tversky 1977). PMID:10097019

  12. Hash: a program to accurately predict protein H{sup {alpha}} shifts from neighboring backbone shifts

    Zeng Jianyang, E-mail: zengjy@gmail.com [Tsinghua University, Institute for Interdisciplinary Information Sciences (China); Zhou Pei [Duke University Medical Center, Department of Biochemistry (United States); Donald, Bruce Randall [Duke University, Department of Computer Science (United States)

    2013-01-15

    Chemical shifts provide not only peak identities for analyzing nuclear magnetic resonance (NMR) data, but also an important source of conformational information for studying protein structures. Current structural studies requiring H{sup {alpha}} chemical shifts suffer from the following limitations. (1) For large proteins, the H{sup {alpha}} chemical shifts can be difficult to assign using conventional NMR triple-resonance experiments, mainly due to the fast transverse relaxation rate of C{sup {alpha}} that restricts the signal sensitivity. (2) Previous chemical shift prediction approaches either require homologous models with high sequence similarity or rely heavily on accurate backbone and side-chain structural coordinates. When neither sequence homologues nor structural coordinates are available, we must resort to other information to predict H{sup {alpha}} chemical shifts. Predicting accurate H{sup {alpha}} chemical shifts using other obtainable information, such as the chemical shifts of nearby backbone atoms (i.e., adjacent atoms in the sequence), can remedy the above dilemmas, and hence advance NMR-based structural studies of proteins. By specifically exploiting the dependencies on chemical shifts of nearby backbone atoms, we propose a novel machine learning algorithm, called Hash, to predict H{sup {alpha}} chemical shifts. Hash combines a new fragment-based chemical shift search approach with a non-parametric regression model, called the generalized additive model, to effectively solve the prediction problem. We demonstrate that the chemical shifts of nearby backbone atoms provide a reliable source of information for predicting accurate H{sup {alpha}} chemical shifts. Our testing results on different possible combinations of input data indicate that Hash has a wide rage of potential NMR applications in structural and biological studies of proteins.

  13. Capacity Planning for Vertical Search Engines

    Badue, Claudine; Almeida, Virgilio; Baeza-Yates, Ricardo; Ribeiro-Neto, Berthier; Ziviani, Artur; Ziviani, Nivio

    2010-01-01

    Vertical search engines focus on specific slices of content, such as the Web of a single country or the document collection of a large corporation. Despite this, like general open web search engines, they are expensive to maintain, expensive to operate, and hard to design. Because of this, predicting the response time of a vertical search engine is usually done empirically through experimentation, requiring a costly setup. An alternative is to develop a model of the search engine for predicting performance. However, this alternative is of interest only if its predictions are accurate. In this paper we propose a methodology for analyzing the performance of vertical search engines. Applying the proposed methodology, we present a capacity planning model based on a queueing network for search engines with a scale typically suitable for the needs of large corporations. The model is simple and yet reasonably accurate and, in contrast to previous work, considers the imbalance in query service times among homogeneous...

  14. Similarity measures for face recognition

    Vezzetti, Enrico

    2015-01-01

    Face recognition has several applications, including security, such as (authentication and identification of device users and criminal suspects), and in medicine (corrective surgery and diagnosis). Facial recognition programs rely on algorithms that can compare and compute the similarity between two sets of images. This eBook explains some of the similarity measures used in facial recognition systems in a single volume. Readers will learn about various measures including Minkowski distances, Mahalanobis distances, Hansdorff distances, cosine-based distances, among other methods. The book also summarizes errors that may occur in face recognition methods. Computer scientists "facing face" and looking to select and test different methods of computing similarities will benefit from this book. The book is also useful tool for students undertaking computer vision courses.

  15. A Novel Personalized Web Search Model

    ZHU Zhengyu; XU Jingqiu; TIAN Yunyan; REN Xiang

    2007-01-01

    A novel personalized Web search model is proposed.The new system, as a middleware between a user and a Web search engine, is set up on the client machine. It can learn a user's preference implicitly and then generate the user profile automatically. When the user inputs query keywords, the system can automatically generate a few personalized expansion words by computing the term-term associations according to the current user profile, and then these words together with the query keywords are submitted to a popular search engine such as Yahoo or Google.These expansion words help to express accurately the user's search intention. The new Web search model can make a common search engine personalized, that is, the search engine can return different search results to different users who input the same keywords. The experimental results show the feasibility and applicability of the presented work.

  16. Similarity Measures for Comparing Biclusterings.

    Horta, Danilo; Campello, Ricardo J G B

    2014-01-01

    The comparison of ordinary partitions of a set of objects is well established in the clustering literature, which comprehends several studies on the analysis of the properties of similarity measures for comparing partitions. However, similarity measures for clusterings are not readily applicable to biclusterings, since each bicluster is a tuple of two sets (of rows and columns), whereas a cluster is only a single set (of rows). Some biclustering similarity measures have been defined as minor contributions in papers which primarily report on proposals and evaluation of biclustering algorithms or comparative analyses of biclustering algorithms. The consequence is that some desirable properties of such measures have been overlooked in the literature. We review 14 biclustering similarity measures. We define eight desirable properties of a biclustering measure, discuss their importance, and prove which properties each of the reviewed measures has. We show examples drawn and inspired from important studies in which several biclustering measures convey misleading evaluations due to the absence of one or more of the discussed properties. We also advocate the use of a more general comparison approach that is based on the idea of transforming the original problem of comparing biclusterings into an equivalent problem of comparing clustering partitions with overlapping clusters. PMID:26356865

  17. Approaches to Sequence Similarity Representation

    Sokolov, Artem; Rachkovskij, Dmitri

    2006-01-01

    We discuss several approaches to similarity preserving coding of symbol sequences and possible connections of their distributed versions to metric embeddings. Interpreting sequence representation methods with embeddings can help develop an approach to their analysis and may lead to discovering useful properties.

  18. A square from similar rectangles

    Dorichenko, Sergey; Skopenkov, Mikhail

    2013-01-01

    In the present popular science paper we determine when a square can be dissected into rectangles similar to a given rectangle. The approach to the question is based on a physical interpretation using electrical networks. Only secondary school background is assumed in the paper.

  19. HOW DISSIMILARLY SIMILAR ARE BIOSIMILARS?

    Ramshankar Vijayalakshmi

    2012-05-01

    Full Text Available Recently Biopharmaceuticals are the new chemotherapeutical agents that are called as “Biosimilars” or “follow on protein products” by the European Medicines Agency (EMA and the American regulatory agencies (Food and Drug Administration respectively. Biosimilars are extremely similar to the reference molecule but not identical, however close their similarities may be. A regulatory framework is therefore in place to assess the application for marketing authorisation of biosimilars. When a biosimilar is similar to the reference biopharmaceutical in terms of safety, quality, and efficacy, it can be registered. It is important to document data from clinical trials with a view of similar safety and efficacy. If the development time for a generic medicine is around 3 years, a biosimilar takes about 6-9 years. Generic medicines need to demonstrate bioequivalence only unlike biosimilars that need to conduct phase I and Phase III clinical trials. In this review, different biosimilars that are already being used successfully in the field on Oncology is discussed. Their similarity, differences and guidelines to be followed before a clinically informed decision to be taken, is discussed. More importantly the regulatory guidelines that are operational in India with a work flow of making a biosimilar with relevant dos and dont’s are discussed. For a large populous country like India, where with improved treatments in all sectors including oncology, our ageing population is increasing. For the health care of this sector, we need more newer, cheaper and effective biosimilars in the market. It becomes therefore important to understand the regulatory guidelines and steps to come up with more biosimilars for the existing population and also more information is mandatory for the practicing clinicians to translate these effectively into clinical practice.

  20. Practical fulltext search in medical records

    Vít Volšička

    2015-09-01

    Full Text Available Performing a search through previously existing documents, including medical reports, is an integral part of acquiring new information and educational processes. Unfortunately, finding relevant information is not always easy, since many documents are saved in free text formats, thereby making it difficult to search through them. A full-text search is a viable solution for searching through documents. The full-text search makes it possible to efficiently search through large numbers of documents and to find those that contain specific search phrases in a short time. All leading database systems currently offer full-text search, but some do not support the complex morphology of the Czech language. Apache Solr provides full support options and some full-text libraries. This programme provides the good support of the Czech language in the basic installation, and a wide range of settings and options for its deployment over any platform. The library had been satisfactorily tested using real data from the hospitals. Solr provided useful, fast, and accurate searches. However, there is still a need to make adjustments in order to receive effective search results, particularly by correcting typographical errors made not only in the text, but also when entering words in the search box and creating a list of frequently used abbreviations and synonyms for more accurate results.

  1. Search Engines Selection Based on Relevance Terms%基于相关术语集的搜索引擎选择

    欧洁

    2003-01-01

    Metasearch can effectively search distributed immense electronic resources. It is built on top of severalsearch engines, providing user with uniform access to these engines. Metasearch first passes user's query to underly-ing useful search engines, and then collects and reorganizes the results from the search engines used. It is calledsearch engines selection when metasearch selects underlying useful search engines. In this paper, we present a statis-tical method based on relevance terms to estimate the usefulness of a search engine for any given query, which is suit-able for both Boolean query and vector query. Experimental results indicate that the proposed estimation method isquite accurate, especially when the critical similarity is high between the query and the results.

  2. Accurate hydrocarbon estimates attained with radioactive isotope

    To make accurate economic evaluations of new discoveries, an oil company needs to know how much gas and oil a reservoir contains. The porous rocks of these reservoirs are not completely filled with gas or oil, but contain a mixture of gas, oil and water. It is extremely important to know what volume percentage of this water--called connate water--is contained in the reservoir rock. The percentage of connate water can be calculated from electrical resistivity measurements made downhole. The accuracy of this method can be improved if a pure sample of connate water can be analyzed or if the chemistry of the water can be determined by conventional logging methods. Because of the similarity of the mud filtrate--the water in a water-based drilling fluid--and the connate water, this is not always possible. If the oil company cannot distinguish between connate water and mud filtrate, its oil-in-place calculations could be incorrect by ten percent or more. It is clear that unless an oil company can be sure that a sample of connate water is pure, or at the very least knows exactly how much mud filtrate it contains, its assessment of the reservoir's water content--and consequently its oil or gas content--will be distorted. The oil companies have opted for the Repeat Formation Tester (RFT) method. Label the drilling fluid with small doses of tritium--a radioactive isotope of hydrogen--and it will be easy to detect and quantify in the sample

  3. Implementation Of ROCK Clustering Algorithm For The Optimization Of Query Searching Time

    Ashwina Tyagi

    2012-05-01

    Full Text Available Clustering is a data mining technique of grouping similar type of data or queries together which helps in identifying similar subject areas. The major problem is to identify heterogeneous subjectareas where frequent queries are asked. There are number of agglomerative clustering algorithms which are used to cluster the data. The problem with these algorithms is that they make use of distance measures to calculate similarity. So the best suited algorithm for clustering the categorical data is Robust Clustering Using Links (ROCK [1] algorithm because it uses Jaccard coefficient instead of using the distance measures to find the similarity between the data or documents to classify the clusters. The mechanism for classifying the clusters based on the similarity measure shall be used over a given set of data. This method will make clusters of the data corresponding to different subject areas so that a priorknowledge about similarity can be maintained which in turn will help to discover accurate and consistent clusters and will reduce the query response time. The main objective of our work is to implement ROCK [1] and to decrease the query response time by searching the documents in the resulted clusters instead of searching the whole database. This technique actually reduces the searching time of documents from the database.

  4. Combinatorial Approaches to Accurate Identification of Orthologous Genes

    Shi, Guanqun

    2011-01-01

    The accurate identification of orthologous genes across different species is a critical and challenging problem in comparative genomics and has a wide spectrum of biological applications including gene function inference, evolutionary studies and systems biology. During the past several years, many methods have been proposed for ortholog assignment based on sequence similarity, phylogenetic approaches, synteny information, and genome rearrangement. Although these methods share many commonly a...

  5. Retrieval of similar chess positions

    Ganguly, Debasis; LEVELING, JOHANNES; Jones, Gareth J.F.

    2014-01-01

    We address the problem of retrieving chess game positions similar to a given query position from a collection of archived chess games. We investigate this problem from an information retrieval (IR) perspective. The advantage of our proposed IR-based approach is that it allows using the standard inverted organization of stored chess positions, leading to an ecient retrieval. Moreover, in contrast to retrieving exactly identical board positions, the IR-based approach is able to provide approxim...

  6. Interfacial Molecular Searching Using Forager Dynamics

    Monserud, Jon H.; Schwartz, Daniel K.

    2016-03-01

    Many biological and technological systems employ efficient non-Brownian intermittent search strategies where localized searches alternate with long flights. Coincidentally, molecular species exhibit intermittent behavior at the solid-liquid interface, where periods of slow motion are punctuated by fast flights through the liquid phase. Single-molecule tracking was used here to observe the interfacial search process of DNA for complementary DNA. Measured search times were qualitatively consistent with an intermittent-flight model, and ˜10 times faster than equivalent Brownian searches, suggesting that molecular searches for reactive sites benefit from similar efficiencies as biological organisms.

  7. SPATIO-TEXTUAL SIMILARITY JOIN

    Ch Shylaja and Supreethi K.P

    2015-07-01

    Full Text Available Data mining is the process of discovering interesting patterns and knowledge from large amounts of data. Spatial databases store large space related data, such as maps, preprocessed remote sensing or medical imaging data. Modern mobile phones and mobile devices are equipped with GPS devices; this is the reason for the Location based services to gain significant attention. These Location based services generate large amounts of spatio- textual data which contain both spatial location and textual description. The spatiotextual objects have different representations because of deviations in GPS or due to different user descriptions. This calls for the need of efficient methods to integrate spatio-textual data. Spatio-textual similarity join meets this need. Spatio-textual similarity join: Given two sets of spatio-textual data, it finds all the similar pairs. Filter and refine framework will be developed to device the algorithms. The prefix filter technique will be extended to generate spatial and textual signatures and inverted indexes will be built on top of these signatures. Candidate pairs will be found using these indexes. Finally the candidate pairs will be refined to get the result. MBR-prefix based signature will be used to prune dissimilar objects. Hybrid signature will be used to support spatial and textual pruning simultaneously.

  8. Roget's Thesaurus and Semantic Similarity

    Jarmasz, Mario

    2012-01-01

    We have implemented a system that measures semantic similarity using a computerized 1987 Roget's Thesaurus, and evaluated it by performing a few typical tests. We compare the results of these tests with those produced by WordNet-based similarity measures. One of the benchmarks is Miller and Charles' list of 30 noun pairs to which human judges had assigned similarity measures. We correlate these measures with those computed by several NLP systems. The 30 pairs can be traced back to Rubenstein and Goodenough's 65 pairs, which we have also studied. Our Roget's-based system gets correlations of .878 for the smaller and .818 for the larger list of noun pairs; this is quite close to the .885 that Resnik obtained when he employed humans to replicate the Miller and Charles experiment. We further evaluate our measure by using Roget's and WordNet to answer 80 TOEFL, 50 ESL and 300 Reader's Digest questions: the correct synonym must be selected amongst a group of four words. Our system gets 78.75%, 82.00% and 74.33% of ...

  9. Landscape similarity, retrieval, and machine mapping of physiographic units

    Jasiewicz, Jaroslaw; Netzel, Pawel; Stepinski, Tomasz F.

    2014-09-01

    We introduce landscape similarity - a numerical measure that assesses affinity between two landscapes on the basis of similarity between the patterns of their constituent landform elements. Such a similarity function provides core technology for a landscape search engine - an algorithm that parses the topography of a study area and finds all places with landscapes broadly similar to a landscape template. A landscape search can yield answers to a query in real time, enabling a highly effective means to explore large topographic datasets. In turn, a landscape search facilitates auto-mapping of physiographic units within a study area. The country of Poland serves as a test bed for these novel concepts. The topography of Poland is given by a 30 m resolution DEM. The geomorphons method is applied to this DEM to classify the topography into ten common types of landform elements. A local landscape is represented by a square tile cut out of a map of landform elements. A histogram of cell-pair features is used to succinctly encode the composition and texture of a pattern within a local landscape. The affinity between two local landscapes is assessed using the Wave-Hedges similarity function applied to the two corresponding histograms. For a landscape search the study area is organized into a lattice of local landscapes. During the search the algorithm calculates the similarity between each local landscape and a given query. Our landscape search for Poland is implemented as a GeoWeb application called TerraEx-Pl and is available at http://sil.uc.edu/. Given a sample, or a number of samples, from a target physiographic unit the landscape search delineates this unit using the principles of supervised machine learning. Repeating this procedure for all units yields a complete physiographic map. The application of this methodology to topographic data of Poland results in the delineation of nine physiographic units. The resultant map bears a close resemblance to a conventional

  10. Turning Search into Knowledge Management.

    Kaufman, David

    2002-01-01

    Discussion of knowledge management for electronic data focuses on creating a high quality similarity ranking algorithm. Topics include similarity ranking and unstructured data management; searching, categorization, and summarization of documents; query evaluation; considering sentences in addition to keywords; and vector models. (LRW)

  11. Similarity Join Size Estimation using Locality Sensitive Hashing

    Lee, Hongrae; Shim, Kyuseok

    2011-01-01

    Similarity joins are important operations with a broad range of applications. In this paper, we study the problem of vector similarity join size estimation (VSJ). It is a generalization of the previously studied set similarity join size estimation (SSJ) problem and can handle more interesting cases such as TF-IDF vectors. One of the key challenges in similarity join size estimation is that the join size can change dramatically depending on the input similarity threshold. We propose a sampling based algorithm that uses the Locality-Sensitive-Hashing (LSH) scheme. The proposed algorithm LSH-SS uses an LSH index to enable effective sampling even at high thresholds. We compare the proposed technique with random sampling and the state-of-the-art technique for SSJ (adapted to VSJ) and demonstrate LSH-SS offers more accurate estimates at both high and low similarity thresholds and small variance using real-world data sets.

  12. Ontology driven Pre and Post Ranking based Information Retrieval in Web Search Engines

    2012-06-01

    Full Text Available With the tremendous growth of World Wide Web, it has become necessary to organize the information in such a way that it will make easier for the end users to find the information they want efficiently and accurately. This requires a pre-ranking of the underlying similar documents after the formation of the index. Thereafter the ranking of the search results in response to a query takes place which provides relevant results to user. This paper proposes an ontology driven pre ranking of the documents with identical context and hence post ranking of the search results using keyword matching of the expanded query terms and document keywords in the pre-ranked search results.

  13. Mechanisms for similarity based cooperation

    Traulsen, A.

    2008-06-01

    Cooperation based on similarity has been discussed since Richard Dawkins introduced the term “green beard” effect. In these models, individuals cooperate based on an aribtrary signal (or tag) such as the famous green beard. Here, two different models for such tag based cooperation are analysed. As neutral drift is important in both models, a finite population framework is applied. The first model, which we term “cooperative tags” considers a situation in which groups of cooperators are formed by some joint signal. Defectors adopting the signal and exploiting the group can lead to a breakdown of cooperation. In this case, conditions are derived under which the average abundance of the more cooperative strategy exceeds 50%. The second model considers a situation in which individuals start defecting towards others that are not similar to them. This situation is termed “defective tags”. It is shown that in this case, individuals using tags to cooperate exclusively with their own kind dominate over unconditional cooperators.

  14. A Short Survey of Document Structure Similarity Algorithms

    Buttler, D

    2004-02-27

    This paper provides a brief survey of document structural similarity algorithms, including the optimal Tree Edit Distance algorithm and various approximation algorithms. The approximation algorithms include the simple weighted tag similarity algorithm, Fourier transforms of the structure, and a new application of the shingle technique to structural similarity. We show three surprising results. First, the Fourier transform technique proves to be the least accurate of any of approximation algorithms, while also being slowest. Second, optimal Tree Edit Distance algorithms may not be the best technique for clustering pages from different sites. Third, the simplest approximation to structure may be the most effective and efficient mechanism for many applications.

  15. Internet Search Engines

    Fatmaa El Zahraa Mohamed Abdou

    2004-01-01

    A general study about the internet search engines, the study deals main 7 points; the differance between search engines and search directories, components of search engines, the percentage of sites covered by search engines, cataloging of sites, the needed time for sites appearance in search engines, search capabilities, and types of search engines.

  16. Internet Search Engines

    Fatmaa El Zahraa Mohamed Abdou

    2004-09-01

    Full Text Available A general study about the internet search engines, the study deals main 7 points; the differance between search engines and search directories, components of search engines, the percentage of sites covered by search engines, cataloging of sites, the needed time for sites appearance in search engines, search capabilities, and types of search engines.

  17. Performance Indexes: Similarities and Differences

    André Machado Caldeira

    2013-06-01

    Full Text Available The investor of today is more rigorous on monitoring a financial assets portfolio. He no longer thinks only in terms of the expected return (one dimension, but in terms of risk-return (two dimensions. Thus new perception is more complex, since the risk measurement can vary according to anyone’s perception; some use the standard deviation for that, others disagree with this measure by proposing others. In addition to this difficulty, there is the problem of how to consider these two dimensions. The objective of this essay is to study the main performance indexes through an empirical study in order to verify the differences and similarities for some of the selected assets. One performance index proposed in Caldeira (2005 shall be included in this analysis.

  18. Features Based Text Similarity Detection

    Kent, Chow Kok

    2010-01-01

    As the Internet help us cross cultural border by providing different information, plagiarism issue is bound to arise. As a result, plagiarism detection becomes more demanding in overcoming this issue. Different plagiarism detection tools have been developed based on various detection techniques. Nowadays, fingerprint matching technique plays an important role in those detection tools. However, in handling some large content articles, there are some weaknesses in fingerprint matching technique especially in space and time consumption issue. In this paper, we propose a new approach to detect plagiarism which integrates the use of fingerprint matching technique with four key features to assist in the detection process. These proposed features are capable to choose the main point or key sentence in the articles to be compared. Those selected sentence will be undergo the fingerprint matching process in order to detect the similarity between the sentences. Hence, time and space usage for the comparison process is r...

  19. Laboratory Building for Accurate Determination of Plutonium

    2008-01-01

    <正>The accurate determination of plutonium is one of the most important assay techniques of nuclear fuel, also the key of the chemical measurement transfer and the base of the nuclear material balance. An

  20. A similarity-based data warehousing environment for medical images.

    Teixeira, Jefferson William; Annibal, Luana Peixoto; Felipe, Joaquim Cezar; Ciferri, Ricardo Rodrigues; Ciferri, Cristina Dutra de Aguiar

    2015-11-01

    A core issue of the decision-making process in the medical field is to support the execution of analytical (OLAP) similarity queries over images in data warehousing environments. In this paper, we focus on this issue. We propose imageDWE, a non-conventional data warehousing environment that enables the storage of intrinsic features taken from medical images in a data warehouse and supports OLAP similarity queries over them. To comply with this goal, we introduce the concept of perceptual layer, which is an abstraction used to represent an image dataset according to a given feature descriptor in order to enable similarity search. Based on this concept, we propose the imageDW, an extended data warehouse with dimension tables specifically designed to support one or more perceptual layers. We also detail how to build an imageDW and how to load image data into it. Furthermore, we show how to process OLAP similarity queries composed of a conventional predicate and a similarity search predicate that encompasses the specification of one or more perceptual layers. Moreover, we introduce an index technique to improve the OLAP query processing over images. We carried out performance tests over a data warehouse environment that consolidated medical images from exams of several modalities. The results demonstrated the feasibility and efficiency of our proposed imageDWE to manage images and to process OLAP similarity queries. The results also demonstrated that the use of the proposed index technique guaranteed a great improvement in query processing. PMID:26414378

  1. Alaska, Gulf spills share similarities

    The accidental Exxon Valdez oil spill in Alaska and the deliberate dumping of crude oil into the Persian Gulf as a tactic of war contain both glaring differences and surprising similarities. Public reaction and public response was much greater to the Exxon Valdez spill in pristine Prince William Sound than to the war-related tragedy in the Persian Gulf. More than 12,000 workers helped in the Alaskan cleanup; only 350 have been involved in Kuwait. But in both instances, environmental damages appear to be less than anticipated. Natures highly effective self-cleansing action is primarily responsible for minimizing the damages. One positive action growing out of the two incidents is increased international cooperation and participation in oil-spill clean-up efforts. In 1990, in the aftermath of the Exxon Valdez spill, 94 nations signed an international accord on cooperation in future spills. The spills can be historic environmental landmarks leading to creation of more sophisticated response systems worldwide

  2. Relativistic Self-similar Disks

    Cai, M J; Cai, Mike J.; Shu, Frank H.

    2002-01-01

    We formulate and solve by semi-analytic means the axisymmetric equilibria of relativistic self-similar disks of infinitesimal vertical thickness. These disks are supported in the horizontal directions against their self-gravity by a combination of isothermal (two-dimensional) pressure and a flat rotation curve. The dragging of inertial frames restricts possible solutions to rotation speeds that are always less than 0.438 times the speed of light, a result first obtained by Lynden-Bell and Pineault in 1978 for a cold disk. We show that prograde circular orbits of massive test particles exist and are stable for all of our model disks, but retrograde circular orbits cannot be maintained with particle velocities less than the speed of light once the disk develops an ergoregion. We also compute photon trajectories, planar and non-planar, in the resulting spacetime, for disks with and without ergoregions. We find that all photon orbits, except for a set of measure zero, tend to be focused by the gravity of the flat...

  3. Accurate discrimination of conserved coding and non-coding regions through multiple indicators of evolutionary dynamics

    Pesole Graziano

    2009-09-01

    Full Text Available Abstract Background The conservation of sequences between related genomes has long been recognised as an indication of functional significance and recognition of sequence homology is one of the principal approaches used in the annotation of newly sequenced genomes. In the context of recent findings that the number non-coding transcripts in higher organisms is likely to be much higher than previously imagined, discrimination between conserved coding and non-coding sequences is a topic of considerable interest. Additionally, it should be considered desirable to discriminate between coding and non-coding conserved sequences without recourse to the use of sequence similarity searches of protein databases as such approaches exclude the identification of novel conserved proteins without characterized homologs and may be influenced by the presence in databases of sequences which are erroneously annotated as coding. Results Here we present a machine learning-based approach for the discrimination of conserved coding sequences. Our method calculates various statistics related to the evolutionary dynamics of two aligned sequences. These features are considered by a Support Vector Machine which designates the alignment coding or non-coding with an associated probability score. Conclusion We show that our approach is both sensitive and accurate with respect to comparable methods and illustrate several situations in which it may be applied, including the identification of conserved coding regions in genome sequences and the discrimination of coding from non-coding cDNA sequences.

  4. A New Generalized Similarity-Based Topic Distillation Algorithm

    ZHOU Hongfang; DANG Xiaohui

    2007-01-01

    The procedure of hypertext induced topic search based on a semantic relation model is analyzed, and the reason for the topic drift of HITS algorithm was found to prove that Web pages are projected to a wrong latent semantic basis. A new concept-generalized similarity is introduced and, based on this, a new topic distillation algorithm GSTDA(generalized similarity based topic distillation algorithm) was presented to improve the quality of topic distillation. GSTDA was applied not only to avoid the topic drift, but also to explore relative topics to user query. The experimental results on 10 queries show that GSTDA reduces topic drift rate by 10% to 58% compared to that of HITS(hypertext induced topic search) algorithm, and discovers several relative topics to queries that have multiple meanings.

  5. Vaccine-related internet search activity predicts H1N1 and HPV vaccine coverage: implications for vaccine acceptance.

    Kalichman, Seth C; Kegler, Christopher

    2015-01-01

    The Internet is a primary source for health-related information, and Internet search activity is associated with infectious disease outbreaks. The authors hypothesized that Internet search activity for vaccine-related information would predict vaccination coverage. They examined Internet search activity for H1N1 and human papilloma virus (HPV) disease and vaccine information in relation to H1N1 and HPV vaccine uptake. Google Insight for Search was used to assess the volume of Internet search queries for H1N1- and vaccine-related terms in the United States in 2009, the year of the H1N1 pandemic. Vaccine coverage data were also obtained from the Centers for Disease Control and Prevention at the state level for H1N1 vaccinations in 2009. These same measures were collected at the state level for HPV- and vaccine-related search terms in 2010 as well as HPV vaccine uptake in that year. Analyses showed that the search terms H1N1 and vaccine were correlated with H1N1 vaccine uptake; ordinal regression found the H1N1 search term was independently associated with H1N1 vaccine coverage. Similarly, the correlation between vaccine search volume and HPV coverage was significant; ordinal regression showed the search term vaccine independently predicted HPV vaccination coverage. This is among the first studies to show that Internet search activity is associated with vaccination coverage. The Internet should be exploited as an opportunity to dispel vaccine misinformation by providing accurate information to support vaccine decision making. PMID:25222149

  6. Accurate numerical solution of compressible, linear stability equations

    Malik, M. R.; Chuang, S.; Hussaini, M. Y.

    1982-01-01

    The present investigation is concerned with a fourth order accurate finite difference method and its application to the study of the temporal and spatial stability of the three-dimensional compressible boundary layer flow on a swept wing. This method belongs to the class of compact two-point difference schemes discussed by White (1974) and Keller (1974). The method was apparently first used for solving the two-dimensional boundary layer equations. Attention is given to the governing equations, the solution technique, and the search for eigenvalues. A general purpose subroutine is employed for solving a block tridiagonal system of equations. The computer time can be reduced significantly by exploiting the special structure of two matrices.

  7. Invariant Image Watermarking Using Accurate Zernike Moments

    Ismail A. Ismail

    2010-01-01

    Full Text Available problem statement: Digital image watermarking is the most popular method for image authentication, copyright protection and content description. Zernike moments are the most widely used moments in image processing and pattern recognition. The magnitudes of Zernike moments are rotation invariant so they can be used just as a watermark signal or be further modified to carry embedded data. The computed Zernike moments in Cartesian coordinate are not accurate due to geometrical and numerical error. Approach: In this study, we employed a robust image-watermarking algorithm using accurate Zernike moments. These moments are computed in polar coordinate, where both approximation and geometric errors are removed. Accurate Zernike moments are used in image watermarking and proved to be robust against different kind of geometric attacks. The performance of the proposed algorithm is evaluated using standard images. Results: Experimental results show that, accurate Zernike moments achieve higher degree of robustness than those approximated ones against rotation, scaling, flipping, shearing and affine transformation. Conclusion: By computing accurate Zernike moments, the embedded bits watermark can be extracted at low error rate.

  8. An assessment of orthographic similarity measures for several African languages

    Keet, C. Maria

    2016-01-01

    Natural Language Interfaces and tools such as spellcheckers and Web search in one's own language are known to be useful in ICT-mediated communication. Most languages in Southern Africa are under-resourced, however. Therefore, it would be very useful if both the generic and the few language-specific NLP tools could be reused or easily adapted across languages. This depends on the notion, and extent, of similarity between the languages. We assess this from the angle of orthography and corpora. ...

  9. Fundamentals of database indexing and searching

    Bhattacharya, Arnab

    2014-01-01

    Fundamentals of Database Indexing and Searching presents well-known database searching and indexing techniques. It focuses on similarity search queries, showing how to use distance functions to measure the notion of dissimilarity.After defining database queries and similarity search queries, the book organizes the most common and representative index structures according to their characteristics. The author first describes low-dimensional index structures, memory-based index structures, and hierarchical disk-based index structures. He then outlines useful distance measures and index structures

  10. A study of Consistency in the Selection of Search Terms and Search Concepts: A Case Study in National Taiwan University

    Mu-hsuan Huang

    2001-12-01

    Full Text Available This article analyzes the consistency in the selection of search terms and search contents of college and graduate students in National Taiwan University when they are using PsycLIT CD-ROM database. 31 students conducted pre-assigned searches, doing 59 searches generating 609 search terms. The study finds the consistency in selection of search terms of first level is 22.14% and second level is 35%. These results are similar with others’ researches. About the consistency in search concepts, no matter the overlaps of searched articles or judge relevant articles are lower than other researches. [Article content in Chinese

  11. Quantum search followed by classical search versus quantum search alone

    Sousa, P. R. M.; Mendes, F. V.; Ramos, R. V.

    2015-01-01

    In this work, we show that the usage of a quantum gate that gives extra information about the solution searched permits to improve the performance of the search algorithm by switching from quantum to classical search in the appropriated moment. A comparison to the case where only quantum search is used is also realized.

  12. Improved Scatter Search Using Cuckoo Search

    Ahmed T.Sadiq Al-Obaidi

    2013-01-01

    The Scatter Search (SS) is a deterministic strategy that has been applied successfully to some combinatorial and continuous optimization problems. Cuckoo Search (CS) is heuristic search algorithm which is inspired by the reproduction strategy of cuckoos. This paper presents enhanced scatter search algorithm using CS algorithm. The improvement provides Scatter Search with random exploration for search space of problem and more of diversity and intensification for promising solutions. The origi...

  13. Accurate atomic data for industrial plasma applications

    Griesmann, U.; Bridges, J.M.; Roberts, J.R.; Wiese, W.L.; Fuhr, J.R. [National Inst. of Standards and Technology, Gaithersburg, MD (United States)

    1997-12-31

    Reliable branching fraction, transition probability and transition wavelength data for radiative dipole transitions of atoms and ions in plasma are important in many industrial applications. Optical plasma diagnostics and modeling of the radiation transport in electrical discharge plasmas (e.g. in electrical lighting) depend on accurate basic atomic data. NIST has an ongoing experimental research program to provide accurate atomic data for radiative transitions. The new NIST UV-vis-IR high resolution Fourier transform spectrometer has become an excellent tool for accurate and efficient measurements of numerous transition wavelengths and branching fractions in a wide wavelength range. Recently, the authors have also begun to employ photon counting techniques for very accurate measurements of branching fractions of weaker spectral lines with the intent to improve the overall accuracy for experimental branching fractions to better than 5%. They have now completed their studies of transition probabilities of Ne I and Ne II. The results agree well with recent calculations and for the first time provide reliable transition probabilities for many weak intercombination lines.

  14. More accurate picture of human body organs

    Computerized tomography and nucler magnetic resonance tomography (NMRT) are revolutionary contributions to radiodiagnosis because they allow to obtain a more accurate image of human body organs. The principles are described of both methods. Attention is mainly devoted to NMRT which has clinically only been used for three years. It does not burden the organism with ionizing radiation. (Ha)

  15. Search for $\

    Astier, Pierre; Baldisseri, Alberto; Baldo-Ceolin, Massimilla; Banner, M; Bassompierre, Gabriel; Benslama, K; Besson, N; Bird, I; Blumenfeld, B; Bobisut, F; Bouchez, J; Boyd, S; Bueno, A G; Bunyatov, S; Camilleri, L L; Cardini, A; Cattaneo, Paolo Walter; Cavasinni, V; Cervera-Villanueva, A; Challis, R C; Chukanov, A; Collazuol, G; Conforto, G; Conta, C; Contalbrigo, M; Cousins, R; Daniels, D; Degaudenzi, H M; Del Prete, T; De Santo, A; Dignan, T; Di Lella, L; do Couto e Silva, E; Dumarchez, J; Ellis, M; Feldman, G J; Ferrari, R; Ferrère, D; Flaminio, Vincenzo; Fraternali, M; Gaillard, J M; Gangler, E; Geiser, A; Geppert, D; Gibin, D; Gninenko, S N; Godley, A; Gómez-Cadenas, J J; Gosset, J; Gössling, C; Gouanère, M; Grant, A; Graziani, G; Guglielmi, A M; Hagner, C; Hernando, J A; Hubbard, D B; Hurst, P; Hyett, N; Iacopini, E; Joseph, C L; Juget, F R; Kent, N; Kirsanov, M M; Klimov, O; Kokkonen, J; Kovzelev, A; Krasnoperov, A V; Kustov, D; Lacaprara, S; Lachaud, C; Lakic, B; Lanza, A; La Rotonda, L; Laveder, M; Letessier-Selvon, A A; Lévy, J M; Linssen, Lucie; Ljubicic, A; Long, J; Lupi, A; Marchionni, A; Martelli, F; Méchain, X; Mendiburu, J P; Meyer, J P; Mezzetto, Mauro; Mishra, S R; Moorhead, G F; Naumov, D V; Nédélec, P; Nefedov, Yu A; Nguyen-Mau, C; Orestano, D; Pastore, F; Peak, L S; Pennacchio, E; Pessard, H; Petti, R; Placci, A; Polesello, G; Pollmann, D; Polyarush, A Yu; Popov, B; Poulsen, C; Rebuffi, L; Renò, R; Rico, J; Riemann, P; Roda, C; Rubbia, André; Salvatore, F; Schahmaneche, K; Schmidt, B; Schmidt, T; Sconza, A; Sevior, M E; Sillou, D; Soler, F J P; Sozzi, G; Steele, D; Stiegler, U; Stipcevic, M; Stolarczyk, T; Tareb-Reyes, M; Taylor, G; Tereshchenko, V V; Toropin, A N; Touchard, A M; Tovey, Stuart N; Tran, M T; Tsesmelis, E; Ulrichs, J; Vacavant, L; Valdata-Nappi, M; Valuev, V Y; Vannucci, François; Varvell, K E; Veltri, M; Vercesi, V; Vidal-Sitjes, G; Vieira, J M; Vinogradova, T G; Weber, F V; Weisse, T; Wilson, F F; Winton, L J; Yabsley, B D; Zaccone, Henri; Zuber, K; Zuccon, P

    2001-01-01

    We present the results of a search for nu(mu)-->nu(e) oscillations in the NOMAD experiment at CERN. The experiment looked for the appearance of nu(e) in a predominantly nu(mu) wide-band neutrino beam at the CERN SPS. No evidence for oscillations was found. The 90% confidence limits obtained are delta m^2 10 eV^2.

  16. Search for $\

    Astier, Pierre; Baldisseri, Alberto; Baldo-Ceolin, Massimilla; Banner, M; Bassompierre, Gabriel; Benslama, K; Besson, N; Bird, I; Blumenfeld, B; Bobisut, F; Bouchez, J; Boyd, S; Bueno, A G; Bunyatov, S A; Camilleri, L L; Cardini, A; Cattaneo, Paolo Walter; Cavasinni, V; Cervera-Villanueva, A; Challis, R C; Chukanov, A; Collazuol, G; Conforto, G; Conta, C; Contalbrigo, M; Cousins, R D; Daniels, D; De Santo, A; Degaudenzi, H M; Del Prete, T; Di Lella, L; Dignan, T; Dumarchez, J; Feldman, G J; Ferrari, A; Ferrari, R; Ferrère, D; Flaminio, Vincenzo; Fraternali, M; Gaillard, J M; Gangler, E; Geiser, A; Geppert, D; Gibin, D; Gninenko, S N; Godley, A; Gosset, J; Gouanère, M; Grant, A; Graziani, G; Guglielmi, A M; Gómez-Cadenas, J J; Gössling, C; Hagner, C; Hernando, J; Hong, T M; Hubbard, D B; Hurst, P; Hyett, N; Iacopini, E; Joseph, C L; Juget, F R; Kent, N; Kirsanov, M M; Klimov, O; Kokkonen, J; Kovzelev, A; Krasnoperov, A V; Kustov, D; La Rotonda, L; Lacaprara, S; Lachaud, C; Lakic, B; Lanza, A; Laveder, M; Letessier-Selvon, A A; Linssen, Lucie; Ljubicic, A; Long, J; Lupi, A; Lévy, J M; Marchionni, A; Martelli, F; Mendiburu, J P; Meyer, J P; Mezzetto, Mauro; Mishra, S R; Moorhead, G F; Méchain, X; Naumov, D V; Nefedov, Yu A; Nguyen-Mau, C; Nédélec, P; Orestano, D; Pastore, F; Peak, L S; Pennacchio, E; Pessard, H; Petti, R; Placci, A; Polesello, G; Pollmann, D; Polyarush, A Yu; Popov, B; Poulsen, C; Rebuffi, L; Renò, R; Rico, J; Riemann, P; Roda, C; Rubbia, André; Salvatore, F; Schahmaneche, K; Schmidt, B; Schmidt, T; Sconza, A; Sevior, M E; Shih, D; Sillou, D; Soler, F J P; Sozzi, G; Steele, D; Stiegler, U; Stipcevic, M; Stolarczyk, T; Tareb-Reyes, M; Taylor, G N; Tereshchenko, V V; Toropin, A N; Touchard, A M; Tovey, Stuart N; Tran, M T; Tsesmelis, E; Ulrichs, J; Vacavant, L; Valdata-Nappi, M; Valuev, V Yu; Vannucci, François; Varvell, K E; Veltri, M; Vercesi, V; Vidal-Sitjes, G; Vieira, J M; Vinogradova, T G; Weber, F V; Weisse, T; Wilson, F F; Winton, L J; Yabsley, B D; Zaccone, Henri; Zuber, K; Zuccon, P; do Couto e Silva, E

    2003-01-01

    We present the results of a search for nu_mu → nu_e oscillations in the NOMAD experiment at Cern. The experiment looked for the appearance of nu_e in a predominantly nu_mu wide-band neutrino beam at the CERN SPS. No evidence for oscillations was found. The 90% confidence limits obtained are Delta m^2 ~ 10 eV^2.

  17. Mapping of VSG similarities in Trypanosoma brucei.

    Weirather, Jason L; Wilson, Mary E; Donelson, John E

    2012-02-01

    The protozoan parasite Trypanosoma brucei switches its variant surface glycoprotein (VSG) to subvert its mammalian hosts' immune responses. The T. brucei genome contains as many as 1600 VSG genes (VSGs), but most are silent noncoding pseudogenes. Only one functional VSG, located in a telomere-linked expression site, is transcribed at a time. Silent VSGs are copied into a VSG expression site through gene conversion. Truncated gene conversion events can generate new mosaic VSGs with segments of sequence identity to other VSGs. To examine the VSG family sub-structure within which these events occur, we combined the available VSG sequences and annotations with scripted BLAST searches to map the relationships among VSGs in the T. brucei genome. Clusters of related VSGs were visualized in 2- and 3-dimensions for different N- and C-terminal regions. Five types of N-termini (N1-N5) were observed, within which gene recombinational events are likely to occur, often with fully-coding 'functional' or 'atypical'VSGs centrally located between more dissimilar VSGs. Members of types N1, N3 and N4 are most closely related in the middle of the N-terminal region, whereas type N2 members are more similar near the N-terminus. Some preference occurs in pairing between specific N- and C-terminal types. Statistical analyses indicated no overall tendency for more related VSGs to be located closer in the genome than less related VSGs, although exceptions were noted. Many potential mosaic gene formation events within each N-terminal type were identified, contrasted by only one possible mosaic gene formation between N-terminal types (N1 and N2). These data suggest that mosaic gene formation is a major contributor to the overall VSG diversity, even though gene recombinational events between members of different N-terminal types occur only rarely. PMID:22079099

  18. Enhancing Solution Similarity in Multi-Objective Vehicle Routing Problems with Different Demand Periods

    Murata, Tadahiko; Itai, Ryota

    2008-01-01

    In this chapter, we proposed a local search that can be used in a two-fold EMO algorithm for multiple-objective VRPs with different demands. The simulation results show that the proposed method have the fine effectiveness to enhance the similarity of obtained routes for vehicles. Although the local search slightly deteriorates the maximum duration, it improves the similarity of the routes that may decrease the possibility of getting lost the way of drivers. If drivers get lost their ways duri...

  19. Looking at the cooler hosts - accurate metallicity determination of M dwarfs with and without planets

    Lindgren, Sara; Heiter, Ulrike

    2015-12-01

    M dwarfs constitute 70% of the stars in the local Galaxy and are becoming attractive targets in the search for Earth-sized planets and planets within the habitable zone. With our research we aim to extend the current understanding of planet formation theory and explore the planet - host metallicity correlation for these cooler hosts.Unlike their solar-type counterparts, the metallicity of M dwarfs is difficult to determine. Their low surface temperature results in plenty of diatomic and triatomic molecules in the photospheric layers. Especially in the optical wavelength region these molecules give rise to a forest of millions of weak lines, making accurate spectroscopy nearly impossible. Previous studies of M dwarfs have therefore established different metallicity calibrations using photometric colors or spectral indices. But these methods exclude the possibility of detailed chemical analysis. High-resolution spectrographs operating in the infrared have recently opened up a new window for investigating M dwarfs. In the infrared the number of molecular transitions is greatly reduced, allowing an accurate continuum placement, and a large number of unblended atomic lines are available. This enabled us to use similar methods as is standard for warmer solar-like stars, and determine the overall metallicity through synthetic spectral fitting.In the first part of our work we used high-resolution spectra taken in the J band (1100-1400nm) with the CRIRES spectrograph, VLT, to verify our method internally and externally by analyzing both components in several M+FGK binaries. In the second part of this study we are analyzing 20 single M dwarfs to achieve a good coverage of effective temperature and metallicity, where our sample covers subtypes M0-M6 and estimated metallicites ranging from +0.8 to -0.8 dex. With these data we aim to derive the to-date most accurate relationship between photometric colors and metallicity for M dwarfs. We will present the current status of our

  20. Visualizing Search Behavior with Adaptive Discriminations

    Cook, Robert G.; Qadri, Muhammad A. J.

    2013-01-01

    We examined different aspects of the visual search behavior of a pigeon using an open-ended, adaptive testing procedure controlled by a genetic algorithm. The animal had to accurately search for and peck a gray target element randomly located from among a variable number of surrounding darker and lighter distractor elements. Display composition was controlled by a genetic algorithm involving the multivariate configuration of different parameters or genes (number of distractors, element size, ...

  1. Optimal directed searches for continuous gravitational waves

    Ming, J.; Krishnan, B.; Papa, M.; Aulbert, C.; Fehrmann, H.

    2016-01-01

    Wide parameter space searches for long lived continuous gravitational wave signals are computationally limited. It is therefore critically important that available computational resources are used rationally. In this paper we consider directed searches, i.e. targets for which the sky position is known accurately but the frequency and spindown parameters are completely unknown. Given a list of such potential astrophysical targets, we therefore need to prioritize. On which target(s) should we s...

  2. Interacting with image hierarchies for fast and accurate object segmentation

    Beard, David V.; Eberly, David H.; Hemminger, Bradley M.; Pizer, Stephen M.; Faith, R. E.; Kurak, Charles; Livingston, Mark

    1994-05-01

    Object definition is an increasingly important area of medical image research. Accurate and fairly rapid object definition is essential for measuring the size and, perhaps more importantly, the change in size of anatomical objects such as kidneys and tumors. Rapid and fairly accurate object definition is essential for 3D real-time visualization including both surgery planning and Radiation oncology treatment planning. One approach to object definition involves the use of 3D image hierarchies, such as Eberly's Ridge Flow. However, the image hierarchy segmentation approach requires user interaction in selecting regions and subtrees. Further, visualizing and comprehending the anatomy and the selected portions of the hierarchy can be problematic. In this paper we will describe the Magic Crayon tool which allows a user to define rapidly and accurately various anatomical objects by interacting with image hierarchies such as those generated with Eberly's Ridge Flow algorithm as well as other 3D image hierarchies. Preliminary results suggest that fairly complex anatomical objects can be segmented in under a minute with sufficient accuracy for 3D surgery planning, 3D radiation oncology treatment planning, and similar applications. Potential modifications to the approach for improved accuracy are summarized.

  3. Reading and visual search: a developmental study in normal children.

    Magali Seassau

    Full Text Available Studies dealing with developmental aspects of binocular eye movement behaviour during reading are scarce. In this study we have explored binocular strategies during reading and during visual search tasks in a large population of normal young readers. Binocular eye movements were recorded using an infrared video-oculography system in sixty-nine children (aged 6 to 15 and in a group of 10 adults (aged 24 to 39. The main findings are (i in both tasks the number of progressive saccades (to the right and regressive saccades (to the left decreases with age; (ii the amplitude of progressive saccades increases with age in the reading task only; (iii in both tasks, the duration of fixations as well as the total duration of the task decreases with age; (iv in both tasks, the amplitude of disconjugacy recorded during and after the saccades decreases with age; (v children are significantly more accurate in reading than in visual search after 10 years of age. Data reported here confirms and expands previous studies on children's reading. The new finding is that younger children show poorer coordination than adults, both while reading and while performing a visual search task. Both reading skills and binocular saccades coordination improve with age and children reach a similar level to adults after the age of 10. This finding is most likely related to the fact that learning mechanisms responsible for saccade yoking develop during childhood until adolescence.

  4. Multimode Process Fault Detection Using Local Neighborhood Similarity Analysis☆

    Xiaogang Deng; Xuemin Tian

    2014-01-01

    Traditional data driven fault detection methods assume unimodal distribution of process data so that they often perform not wel in chemical process with multiple operating modes. In order to monitor the multimode chemical process effectively, this paper presents a novel fault detection method based on local neighborhood similarity analysis (LNSA). In the proposed method, prior process knowledge is not required and only the multimode normal operation data are used to construct a reference dataset. For online monitoring of process state, LNSA applies moving window technique to obtain a current snapshot data window. Then neighborhood searching technique is used to acquire the corresponding local neighborhood data window from the reference dataset. Similarity analysis between snapshot and neighborhood data windows is performed, which includes the calculation of principal component analysis (PCA) similarity factor and distance similarity factor. The PCA similarity factor is to capture the change of data direction while the distance similarity factor is used for monitoring the shift of data center position. Based on these similarity factors, two monitoring statistics are built for multimode process fault detection. Final y a simulated continuous stirred tank system is used to demonstrate the effectiveness of the proposed method. The simulation results show that LNSA can detect multimode process changes effectively and performs better than traditional fault detection methods.

  5. Feedback about more accurate versus less accurate trials: differential effects on self-confidence and activation.

    Badami, Rokhsareh; VaezMousavi, Mohammad; Wulf, Gabriele; Namazizadeh, Mahdi

    2012-06-01

    One purpose of the present study was to examine whether self-confidence or anxiety would be differentially affected byfeedback from more accurate rather than less accurate trials. The second purpose was to determine whether arousal variations (activation) would predict performance. On day 1, participants performed a golf putting task under one of two conditions: one group received feedback on the most accurate trials, whereas another group received feedback on the least accurate trials. On day 2, participants completed an anxiety questionnaire and performed a retention test. Shin conductance level, as a measure of arousal, was determined. The results indicated that feedback about more accurate trials resulted in more effective learning as well as increased self-confidence. Also, activation was a predictor of performance. PMID:22808705

  6. Dependency Similarity, Attraction and Perceived Happiness.

    Pandey, Janak

    1978-01-01

    Subjects were asked to evaluate either a similar personality or a dissimilar personality. Subjects rated similar others more positively than dissimilar others and, additionally, perceived similar others as more helpful and sympathetic than dissimilar others. (Author)

  7. The Search for Directed Intelligence

    Lubin, Philip

    2016-01-01

    We propose a search for sources of directed energy systems such as those now becoming technologically feasible on Earth. Recent advances in our own abilities allow us to foresee our own capability that will radically change our ability to broadcast our presence. We show that systems of this type have the ability to be detected at vast distances and indeed can be detected across the entire horizon. This profoundly changes the possibilities for searches for extra-terrestrial technology advanced civilizations. We show that even modest searches can be extremely effective at detecting or limiting many civilization classes. We propose a search strategy that will observe more than 10 12 stellar and planetary systems with possible extensions to more than 10 20 systems allowing us to test the hypothesis that other similarly or more advanced civilization with this same capability, and are broadcasting, exist.

  8. Comparing NEO Search Telescopes

    Myhrvold, Nathan

    2015-01-01

    Multiple terrestrial and space-based telescopes have been proposed for detecting and tracking near-Earth objects (NEOs). Detailed simulations of the search performance of these systems have used complex computer codes that are not widely available, which hinders accurate cross- comparison of the proposals and obscures whether they have consistent assumptions. Moreover, some proposed instruments would survey infrared (IR) bands, whereas others would operate in the visible band, and differences among asteroid thermal and visible light models used in the simulations further complicate like-to-like comparisons. I use simple physical principles to estimate basic performance metrics for the ground-based Large Synoptic Survey Telescope and three space-based instruments - Sentinel, NEOCam, and a Cubesat constellation. The performance is measured against two different NEO distributions, the Bottke et al. distribution of general NEOs, and the Veres et al. distribution of earth impacting NEO. The results of the comparis...

  9. How Accurate is inv(A)*b?

    Druinsky, Alex

    2012-01-01

    Several widely-used textbooks lead the reader to believe that solving a linear system of equations Ax = b by multiplying the vector b by a computed inverse inv(A) is inaccurate. Virtually all other textbooks on numerical analysis and numerical linear algebra advise against using computed inverses without stating whether this is accurate or not. In fact, under reasonable assumptions on how the inverse is computed, x = inv(A)*b is as accurate as the solution computed by the best backward-stable solvers. This fact is not new, but obviously obscure. We review the literature on the accuracy of this computation and present a self-contained numerical analysis of it.

  10. Cerebral fat embolism: Use of MR spectroscopy for accurate diagnosis

    Laxmi Kokatnur

    2015-01-01

    Full Text Available Cerebral fat embolism (CFE is an uncommon but serious complication following orthopedic procedures. It usually presents with altered mental status, and can be a part of fat embolism syndrome (FES if associated with cutaneous and respiratory manifestations. Because of the presence of other common factors affecting the mental status, particularly in the postoperative period, the diagnosis of CFE can be challenging. Magnetic resonance imaging (MRI of brain typically shows multiple lesions distributed predominantly in the subcortical region, which appear as hyperintense lesions on T2 and diffusion weighted images. Although the location offers a clue, the MRI findings are not specific for CFE. Watershed infarcts, hypoxic encephalopathy, disseminated infections, demyelinating disorders, diffuse axonal injury can also show similar changes on MRI of brain. The presence of fat in these hyperintense lesions, identified by MR spectroscopy as raised lipid peaks will help in accurate diagnosis of CFE. Normal brain tissue or conditions producing similar MRI changes will not show any lipid peak on MR spectroscopy. We present a case of CFE initially misdiagnosed as brain stem stroke based on clinical presentation and cranial computed tomography (CT scan, and later, MR spectroscopy elucidated the accurate diagnosis.

  11. Searching for uranium

    In the not-so-distant past, the search for uranium usually followed a conceptual approach in which an unexplored terrain was selected because of its presumed similarities with one that is known to contain one or more deposits. A description, in general terms, is given of the methodology adopted during the different stages of the exploration programme, up to the point of a discovery. Three case histories prove that, in order to reach this point, a certain amount of improvisation and luck is usually required. (author)

  12. Accurate Finite Difference Methods for Option Pricing

    Persson, Jonas

    2006-01-01

    Stock options are priced numerically using space- and time-adaptive finite difference methods. European options on one and several underlying assets are considered. These are priced with adaptive numerical algorithms including a second order method and a more accurate method. For American options we use the adaptive technique to price options on one stock with and without stochastic volatility. In all these methods emphasis is put on the control of errors to fulfill predefined tolerance level...

  13. Accurate, reproducible measurement of blood pressure.

    Campbell, N. R.; Chockalingam, A; Fodor, J. G.; McKay, D. W.

    1990-01-01

    The diagnosis of mild hypertension and the treatment of hypertension require accurate measurement of blood pressure. Blood pressure readings are altered by various factors that influence the patient, the techniques used and the accuracy of the sphygmomanometer. The variability of readings can be reduced if informed patients prepare in advance by emptying their bladder and bowel, by avoiding over-the-counter vasoactive drugs the day of measurement and by avoiding exposure to cold, caffeine con...

  14. Accurate variational forms for multiskyrmion configurations

    Jackson, A.D.; Weiss, C.; Wirzba, A.; Lande, A.

    1989-04-17

    Simple variational forms are suggested for the fields of a single skyrmion on a hypersphere, S/sub 3/(L), and of a face-centered cubic array of skyrmions in flat space, R/sub 3/. The resulting energies are accurate at the level of 0.2%. These approximate field configurations provide a useful alternative to brute-force solutions of the corresponding Euler equations.

  15. Efficient Accurate Context-Sensitive Anomaly Detection

    2007-01-01

    For program behavior-based anomaly detection, the only way to ensure accurate monitoring is to construct an efficient and precise program behavior model. A new program behavior-based anomaly detection model,called combined pushdown automaton (CPDA) model was proposed, which is based on static binary executable analysis. The CPDA model incorporates the optimized call stack walk and code instrumentation technique to gain complete context information. Thereby the proposed method can detect more attacks, while retaining good performance.

  16. Towards accurate modeling of moving contact lines

    Holmgren, Hanna

    2015-01-01

    The present thesis treats the numerical simulation of immiscible incompressible two-phase flows with moving contact lines. The conventional Navier–Stokes equations combined with a no-slip boundary condition leads to a non-integrable stress singularity at the contact line. The singularity in the model can be avoided by allowing the contact line to slip. Implementing slip conditions in an accurate way is not straight-forward and different regularization techniques exist where ad-hoc procedures ...

  17. Accurate phase-shift velocimetry in rock

    Shukla, Matsyendra Nath; Vallatos, Antoine; Phoenix, Vernon R.; Holmes, William M.

    2016-06-01

    Spatially resolved Pulsed Field Gradient (PFG) velocimetry techniques can provide precious information concerning flow through opaque systems, including rocks. This velocimetry data is used to enhance flow models in a wide range of systems, from oil behaviour in reservoir rocks to contaminant transport in aquifers. Phase-shift velocimetry is the fastest way to produce velocity maps but critical issues have been reported when studying flow through rocks and porous media, leading to inaccurate results. Combining PFG measurements for flow through Bentheimer sandstone with simulations, we demonstrate that asymmetries in the molecular displacement distributions within each voxel are the main source of phase-shift velocimetry errors. We show that when flow-related average molecular displacements are negligible compared to self-diffusion ones, symmetric displacement distributions can be obtained while phase measurement noise is minimised. We elaborate a complete method for the production of accurate phase-shift velocimetry maps in rocks and low porosity media and demonstrate its validity for a range of flow rates. This development of accurate phase-shift velocimetry now enables more rapid and accurate velocity analysis, potentially helping to inform both industrial applications and theoretical models.

  18. Autonomous Search

    Hamadi, Youssef; Saubion, Frédéric

    2012-01-01

    Decades of innovations in combinatorial problem solving have produced better and more complex algorithms. These new methods are better since they can solve larger problems and address new application domains. They are also more complex which means that they are hard to reproduce and often harder to fine-tune to the peculiarities of a given problem. This last point has created a paradox where efficient tools are out of reach of practitioners. Autonomous search (AS) represents a new research field defined to precisely address the above challenge. Its major strength and originality consist in the

  19. Web Search Engines: Search Syntax and Features.

    Ojala, Marydee

    2002-01-01

    Presents a chart that explains the search syntax, features, and commands used by the 12 most widely used general Web search engines. Discusses Web standardization, expanded types of content searched, size of databases, and search engines that include both simple and advanced versions. (LRW)

  20. Database search van tijdreeksen, met toepassing in de firma Medtronic

    KELLENS, Tom

    2006-01-01

    In deze thesis richten we ons op het probleem van similarity search van tijdreeksen. Similarity search kan onderverdeeld worden in twee categorieën, nl. whole matching en subsequence matching. Dit laatste kan beschouwd worden als een veralgemening van whole matching. We zullen in dit werk zien hoe deze veralgemening met behulp van sliding window technieken kan gerealiseerd worden. Similarity search is in essentie gebaseerd op een afstandsfunctie. Op basis van het soort af...

  1. The Hofmethode: Computing Semantic Similarities between E-Learning Products

    Oliver Michel

    2009-11-01

    Full Text Available The key task in building useful e-learning repositories is to develop a system with an algorithm allowing users to retrieve information that corresponds to their specific requirements. To achieve this, products (or their verbal descriptions, i.e. presented in metadata need to be compared and structured according to the results of this comparison. Such structuring is crucial insofar as there are many search results that correspond to the entered keyword. The Hofmethode is an algorithm (based on psychological considerations to compute semantic similarities between texts and therefore offer a way to compare e-learning products. The computed similarity values are used to build semantic maps in which the products are visually arranged according to their similarities. The paper describes how the Hofmethode is implemented in the online database edulap, and how it contributes to help the user to explore the data in which he is interested.

  2. Enhancing Divergent Search through Extinction Events

    Lehman, Joel; Miikkulainen, Risto

    2015-01-01

    capacity to evolve. This hypothesis is tested through experiments in two evolutionary robotics domains. The results show that combining extinction events with divergent search increases evolvability, while combining them with convergent search offers no similar benefit. The conclusion is that extinction...

  3. Missing Links in Middle School: Developing Use of Disciplinary Relatedness in Evaluating Internet Search Results

    Keil, Frank C.; Kominsky, Jonathan F.

    2013-01-01

    In the “digital native” generation, internet search engines are a commonly used source of information. However, adolescents may fail to recognize relevant search results when they are related in discipline to the search topic but lack other cues. Middle school students, high school students, and adults rated simulated search results for relevance to the search topic. The search results were designed to contrast deep discipline-based relationships with lexical similarity to the search topic. R...

  4. High Frequency QRS ECG Accurately Detects Cardiomyopathy

    Schlegel, Todd T.; Arenare, Brian; Poulin, Gregory; Moser, Daniel R.; Delgado, Reynolds

    2005-01-01

    High frequency (HF, 150-250 Hz) analysis over the entire QRS interval of the ECG is more sensitive than conventional ECG for detecting myocardial ischemia. However, the accuracy of HF QRS ECG for detecting cardiomyopathy is unknown. We obtained simultaneous resting conventional and HF QRS 12-lead ECGs in 66 patients with cardiomyopathy (EF = 23.2 plus or minus 6.l%, mean plus or minus SD) and in 66 age- and gender-matched healthy controls using PC-based ECG software recently developed at NASA. The single most accurate ECG parameter for detecting cardiomyopathy was an HF QRS morphological score that takes into consideration the total number and severity of reduced amplitude zones (RAZs) present plus the clustering of RAZs together in contiguous leads. This RAZ score had an area under the receiver operator curve (ROC) of 0.91, and was 88% sensitive, 82% specific and 85% accurate for identifying cardiomyopathy at optimum score cut-off of 140 points. Although conventional ECG parameters such as the QRS and QTc intervals were also significantly longer in patients than controls (P less than 0.001, BBBs excluded), these conventional parameters were less accurate (area under the ROC = 0.77 and 0.77, respectively) than HF QRS morphological parameters for identifying underlying cardiomyopathy. The total amplitude of the HF QRS complexes, as measured by summed root mean square voltages (RMSVs), also differed between patients and controls (33.8 plus or minus 11.5 vs. 41.5 plus or minus 13.6 mV, respectively, P less than 0.003), but this parameter was even less accurate in distinguishing the two groups (area under ROC = 0.67) than the HF QRS morphologic and conventional ECG parameters. Diagnostic accuracy was optimal (86%) when the RAZ score from the HF QRS ECG and the QTc interval from the conventional ECG were used simultaneously with cut-offs of greater than or equal to 40 points and greater than or equal to 445 ms, respectively. In conclusion 12-lead HF QRS ECG employing

  5. INDUS - a composition-based approach for rapid and accurate taxonomic classification of metagenomic sequences

    Mohammed, Monzoorul Haque; Ghosh, Tarini Shankar; Reddy, Rachamalla Maheedhar; Reddy, Chennareddy Venkata Siva Kumar; Singh, Nitin Kumar; Sharmila S Mande

    2011-01-01

    Background Taxonomic classification of metagenomic sequences is the first step in metagenomic analysis. Existing taxonomic classification approaches are of two types, similarity-based and composition-based. Similarity-based approaches, though accurate and specific, are extremely slow. Since, metagenomic projects generate millions of sequences, adopting similarity-based approaches becomes virtually infeasible for research groups having modest computational resources. In this study, we present ...

  6. Local Search –The Perfect Guide

    Tanmay Kadam

    2014-03-01

    Full Text Available This paper contributes the new idea to the world of local search in computer science. The use of the local search is to provide the client with the most accurate information of source to destination. This paper enhances this idea and helps the client to reach its destination without any dilemma. This paper aims in providing the route to the client from source to destination with the alternative of three paths, one being the shortest distance then the other. Furthermore this paper also explains the use of local search via mobile, helping the client to reach its destination with the help of web mapping services.

  7. How accurately can we calculate thermal systems?

    The objective was to determine how accurately simple reactor lattice integral parameters can be determined, considering user input, differences in the methods, source data and the data processing procedures and assumptions. Three simple square lattice test cases with different fuel to moderator ratios were defined. The effect of the thermal scattering models were shown to be important and much bigger than the spread in the results. Nevertheless, differences of up to 0.4% in the K-eff calculated by continuous energy Monte Carlo codes were observed even when the same source data were used. (author)

  8. Accurate diagnosis is essential for amebiasis

    2004-01-01

    @@ Amebiasis is one of the three most common causes of death from parasitic disease, and Entamoeba histolytica is the most widely distributed parasites in the world. Particularly, Entamoeba histolytica infection in the developing countries is a significant health problem in amebiasis-endemic areas with a significant impact on infant mortality[1]. In recent years a world wide increase in the number of patients with amebiasis has refocused attention on this important infection. On the other hand, improving the quality of parasitological methods and widespread use of accurate tecniques have improved our knowledge about the disease.

  9. Investigations on Accurate Analysis of Microstrip Reflectarrays

    Zhou, Min; Sørensen, S. B.; Kim, Oleksiy S.;

    2011-01-01

    An investigation on accurate analysis of microstrip reflectarrays is presented. Sources of error in reflectarray analysis are examined and solutions to these issues are proposed. The focus is on two sources of error, namely the determination of the equivalent currents to calculate the radiation...... pattern, and the inaccurate mutual coupling between array elements due to the lack of periodicity. To serve as reference, two offset reflectarray antennas have been designed, manufactured and measured at the DTUESA Spherical Near-Field Antenna Test Facility. Comparisons of simulated and measured data are...

  10. Improved Scatter Search Using Cuckoo Search

    Ahmed T.Sadiq Al-Obaidi

    2013-02-01

    Full Text Available The Scatter Search (SS is a deterministic strategy that has been applied successfully to some combinatorial and continuous optimization problems. Cuckoo Search (CS is heuristic search algorithm which is inspired by the reproduction strategy of cuckoos. This paper presents enhanced scatter search algorithm using CS algorithm. The improvement provides Scatter Search with random exploration for search space of problem and more of diversity and intensification for promising solutions. The original and improved Scatter Search has been tested on Traveling Salesman Problem. A computational experiment with benchmark instances is reported. The results demonstrate that the improved Scatter Search algorithms produce better performance than original Scatter Search algorithm. The improvement in the value of average fitness is 23.2% comparing with original SS. The developed algorithm has been compared with other algorithms for the same problem, and the result was competitive with some algorithm and insufficient with another.

  11. Data mining technique for fast retrieval of similar waveforms in Fusion massive databases

    Fusion measurement systems generate similar waveforms for reproducible behavior. A major difficulty related to data analysis is the identification, in a rapid and automated way, of a set of discharges with comparable behaviour, i.e. discharges with 'similar' waveforms. Here we introduce a new technique for rapid searching and retrieval of 'similar' signals. The approach consists of building a classification system that avoids traversing the whole database looking for similarities. The classification system diminishes the problem dimensionality (by means of waveform feature extraction) and reduces the searching space to just the most probable 'similar' waveforms (clustering techniques). In the searching procedure, the input waveform is classified in any of the existing clusters. Then, a similarity measure is computed between the input signal and all cluster elements in order to identify the most similar waveforms. The inner product of normalized vectors is used as the similarity measure as it allows the searching process to be independent of signal gain and polarity. This development has been applied recently to TJ-II stellarator databases and has been integrated into its remote participation system

  12. Data mining technique for fast retrieval of similar waveforms in Fusion massive databases

    Vega, J. [Asociacion EURATOM/CIEMAT Para Fusion, Madrid (Spain)], E-mail: jesus.vega@ciemat.es; Pereira, A.; Portas, A. [Asociacion EURATOM/CIEMAT Para Fusion, Madrid (Spain); Dormido-Canto, S.; Farias, G.; Dormido, R.; Sanchez, J.; Duro, N. [Departamento de Informatica y Automatica, UNED, Madrid (Spain); Santos, M. [Departamento de Arquitectura de Computadores y Automatica, UCM, Madrid (Spain); Sanchez, E. [Asociacion EURATOM/CIEMAT Para Fusion, Madrid (Spain); Pajares, G. [Departamento de Arquitectura de Computadores y Automatica, UCM, Madrid (Spain)

    2008-01-15

    Fusion measurement systems generate similar waveforms for reproducible behavior. A major difficulty related to data analysis is the identification, in a rapid and automated way, of a set of discharges with comparable behaviour, i.e. discharges with 'similar' waveforms. Here we introduce a new technique for rapid searching and retrieval of 'similar' signals. The approach consists of building a classification system that avoids traversing the whole database looking for similarities. The classification system diminishes the problem dimensionality (by means of waveform feature extraction) and reduces the searching space to just the most probable 'similar' waveforms (clustering techniques). In the searching procedure, the input waveform is classified in any of the existing clusters. Then, a similarity measure is computed between the input signal and all cluster elements in order to identify the most similar waveforms. The inner product of normalized vectors is used as the similarity measure as it allows the searching process to be independent of signal gain and polarity. This development has been applied recently to TJ-II stellarator databases and has been integrated into its remote participation system.

  13. Accurate radiative transfer calculations for layered media.

    Selden, Adrian C

    2016-07-01

    Simple yet accurate results for radiative transfer in layered media with discontinuous refractive index are obtained by the method of K-integrals. These are certain weighted integrals applied to the angular intensity distribution at the refracting boundaries. The radiative intensity is expressed as the sum of the asymptotic angular intensity distribution valid in the depth of the scattering medium and a transient term valid near the boundary. Integrated boundary equations are obtained, yielding simple linear equations for the intensity coefficients, enabling the angular emission intensity and the diffuse reflectance (albedo) and transmittance of the scattering layer to be calculated without solving the radiative transfer equation directly. Examples are given of half-space, slab, interface, and double-layer calculations, and extensions to multilayer systems are indicated. The K-integral method is orders of magnitude more accurate than diffusion theory and can be applied to layered scattering media with a wide range of scattering albedos, with potential applications to biomedical and ocean optics. PMID:27409700

  14. Accurate basis set truncation for wavefunction embedding

    Barnes, Taylor A.; Goodpaster, Jason D.; Manby, Frederick R.; Miller, Thomas F.

    2013-07-01

    Density functional theory (DFT) provides a formally exact framework for performing embedded subsystem electronic structure calculations, including DFT-in-DFT and wavefunction theory-in-DFT descriptions. In the interest of efficiency, it is desirable to truncate the atomic orbital basis set in which the subsystem calculation is performed, thus avoiding high-order scaling with respect to the size of the MO virtual space. In this study, we extend a recently introduced projection-based embedding method [F. R. Manby, M. Stella, J. D. Goodpaster, and T. F. Miller III, J. Chem. Theory Comput. 8, 2564 (2012)], 10.1021/ct300544e to allow for the systematic and accurate truncation of the embedded subsystem basis set. The approach is applied to both covalently and non-covalently bound test cases, including water clusters and polypeptide chains, and it is demonstrated that errors associated with basis set truncation are controllable to well within chemical accuracy. Furthermore, we show that this approach allows for switching between accurate projection-based embedding and DFT embedding with approximate kinetic energy (KE) functionals; in this sense, the approach provides a means of systematically improving upon the use of approximate KE functionals in DFT embedding.

  15. Accurate determination of characteristic relative permeability curves

    Krause, Michael H.; Benson, Sally M.

    2015-09-01

    A recently developed technique to accurately characterize sub-core scale heterogeneity is applied to investigate the factors responsible for flowrate-dependent effective relative permeability curves measured on core samples in the laboratory. The dependency of laboratory measured relative permeability on flowrate has long been both supported and challenged by a number of investigators. Studies have shown that this apparent flowrate dependency is a result of both sub-core scale heterogeneity and outlet boundary effects. However this has only been demonstrated numerically for highly simplified models of porous media. In this paper, flowrate dependency of effective relative permeability is demonstrated using two rock cores, a Berea Sandstone and a heterogeneous sandstone from the Otway Basin Pilot Project in Australia. Numerical simulations of steady-state coreflooding experiments are conducted at a number of injection rates using a single set of input characteristic relative permeability curves. Effective relative permeability is then calculated from the simulation data using standard interpretation methods for calculating relative permeability from steady-state tests. Results show that simplified approaches may be used to determine flowrate-independent characteristic relative permeability provided flow rate is sufficiently high, and the core heterogeneity is relatively low. It is also shown that characteristic relative permeability can be determined at any typical flowrate, and even for geologically complex models, when using accurate three-dimensional models.

  16. Accurate shear measurement with faint sources

    Zhang, Jun; Foucaud, Sebastien [Center for Astronomy and Astrophysics, Department of Physics and Astronomy, Shanghai Jiao Tong University, 955 Jianchuan road, Shanghai, 200240 (China); Luo, Wentao, E-mail: betajzhang@sjtu.edu.cn, E-mail: walt@shao.ac.cn, E-mail: foucaud@sjtu.edu.cn [Key Laboratory for Research in Galaxies and Cosmology, Shanghai Astronomical Observatory, Nandan Road 80, Shanghai, 200030 (China)

    2015-01-01

    For cosmic shear to become an accurate cosmological probe, systematic errors in the shear measurement method must be unambiguously identified and corrected for. Previous work of this series has demonstrated that cosmic shears can be measured accurately in Fourier space in the presence of background noise and finite pixel size, without assumptions on the morphologies of galaxy and PSF. The remaining major source of error is source Poisson noise, due to the finiteness of source photon number. This problem is particularly important for faint galaxies in space-based weak lensing measurements, and for ground-based images of short exposure times. In this work, we propose a simple and rigorous way of removing the shear bias from the source Poisson noise. Our noise treatment can be generalized for images made of multiple exposures through MultiDrizzle. This is demonstrated with the SDSS and COSMOS/ACS data. With a large ensemble of mock galaxy images of unrestricted morphologies, we show that our shear measurement method can achieve sub-percent level accuracy even for images of signal-to-noise ratio less than 5 in general, making it the most promising technique for cosmic shear measurement in the ongoing and upcoming large scale galaxy surveys.

  17. Studying dream content using the archive and search engine on DreamBank.net.

    Domhoff, G William; Schneider, Adam

    2008-12-01

    This paper shows how the dream archive and search engine on DreamBank.net, a Web site containing over 22,000 dream reports, can be used to generate new findings on dream content, some of which raise interesting questions about the relationship between dreaming and various forms of waking thought. It begins with studies that draw dream reports from DreamBank.net for studies of social networks in dreams, and then demonstrates the usefulness of the search engine by employing word strings relating to religious and sexual elements. Examples from two lengthy individual dream series are used to show how the dreams of one person can be studied for characters, activities, and emotions. A final example shows that accurate inferences about a person's religious beliefs can be made on the basis of reading through dreams retrieved with a few keywords. The overall findings are similar to those in studies using traditional forms of content analysis. PMID:18682331

  18. Fast and Accurate Brain Image Retrieval Using Gabor Wavelet Algorithm

    J.Esther

    2014-01-01

    Full Text Available CBIR in medical image databases are used to assist physician in diagnosis the diseases and also used to aid diagnosis by identifying similar past cases. In order to retrieve a fast, accurate and an effective similarity of images from the large data set. The pre-processing step is extraction of brain. It removes the unwanted non-brain areas like scalp, skull, neck, eyes, ear etc from the MRI Head scan images. After removing the unwanted areas of non-brain region, it is very effective to retrieve the similar images. In this paper it is proposed a brain extraction technique using fuzzy morphological operators. For the experimental results 1200 MRI images are taken from scan centre and some brain images are collected from web and these have been implemented with popular brain extraction algorithm of Graph- Cut Algorithm (GCUT and Expectation Maximization algorithm (EMA. The experiment result shows that the proposed algorithm fuzzy morphological operator algorithm (FMOA is prompting the best promising results. Using this FMOA result retrieved the brain image from the large collection of databases using Gabor-Wavelet Transform.

  19. Fast and accurate fitting of relaxation dispersion data using the flexible software package GLOVE

    Sugase, Kenji; Konuma, Tsuyoshi [Suntory Foundation for Life Sciences, Bioorganic Research Institute (Japan); Lansing, Jonathan C. [Momenta Pharmaceuticals, Inc. (United States); Wright, Peter E., E-mail: wright@scripps.edu [Scripps Research Institute, Department of Integrative Structural and Computational Biology and Skaggs Institute of Chemical Biology (United States)

    2013-07-15

    Relaxation dispersion spectroscopy is one of the most widely used techniques for the analysis of protein dynamics. To obtain a detailed understanding of the protein function from the view point of dynamics, it is essential to fit relaxation dispersion data accurately. The grid search method is commonly used for relaxation dispersion curve fits, but it does not always find the global minimum that provides the best-fit parameter set. Also, the fitting quality does not always improve with increase of the grid size although the computational time becomes longer. This is because relaxation dispersion curve fitting suffers from a local minimum problem, which is a general problem in non-linear least squares curve fitting. Therefore, in order to fit relaxation dispersion data rapidly and accurately, we developed a new fitting program called GLOVE that minimizes global and local parameters alternately, and incorporates a Monte-Carlo minimization method that enables fitting parameters to pass through local minima with low computational cost. GLOVE also implements a random search method, which sets up initial parameter values randomly within user-defined ranges. We demonstrate here that the combined use of the three methods can find the global minimum more rapidly and more accurately than grid search alone.

  20. Dynamic Search and Working Memory in Social Recall

    Hills, Thomas T.; Pachur, Thorsten

    2012-01-01

    What are the mechanisms underlying search in social memory (e.g., remembering the people one knows)? Do the search mechanisms involve dynamic local-to-global transitions similar to semantic search, and are these transitions governed by the general control of attention, associated with working memory span? To find out, we asked participants to…

  1. Asthma and COPD: Differences and Similarities

    ... and COPD: differences and similarities Share | Asthma and COPD: Differences and Similarities This article has been reviewed ... or you could have Chronic Obstructive Pulmonary Disease (COPD) , such as emphysema or chronic bronchitis. Because asthma ...

  2. A Quantum-Based Similarity Method in Virtual Screening

    Mohammed Mumtaz Al-Dabbagh

    2015-10-01

    Full Text Available One of the most widely-used techniques for ligand-based virtual screening is similarity searching. This study adopted the concepts of quantum mechanics to present as state-of-the-art similarity method of molecules inspired from quantum theory. The representation of molecular compounds in mathematical quantum space plays a vital role in the development of quantum-based similarity approach. One of the key concepts of quantum theory is the use of complex numbers. Hence, this study proposed three various techniques to embed and to re-represent the molecular compounds to correspond with complex numbers format. The quantum-based similarity method that developed in this study depending on complex pure Hilbert space of molecules called Standard Quantum-Based (SQB. The recall of retrieved active molecules were at top 1% and top 5%, and significant test is used to evaluate our proposed methods. The MDL drug data report (MDDR, maximum unbiased validation (MUV and Directory of Useful Decoys (DUD data sets were used for experiments and were represented by 2D fingerprints. Simulated virtual screening experiment show that the effectiveness of SQB method was significantly increased due to the role of representational power of molecular compounds in complex numbers forms compared to Tanimoto benchmark similarity measure.

  3. The FLUKA Code: An Accurate Simulation Tool for Particle Therapy

    Battistoni, Giuseppe; Bauer, Julia; Boehlen, Till T.; Cerutti, Francesco; Chin, Mary P. W.; Dos Santos Augusto, Ricardo; Ferrari, Alfredo; Ortega, Pablo G.; Kozłowska, Wioletta; Magro, Giuseppe; Mairani, Andrea; Parodi, Katia; Sala, Paola R.; Schoofs, Philippe; Tessonnier, Thomas; Vlachoudis, Vasilis

    2016-01-01

    Monte Carlo (MC) codes are increasingly spreading in the hadrontherapy community due to their detailed description of radiation transport and interaction with matter. The suitability of a MC code for application to hadrontherapy demands accurate and reliable physical models capable of handling all components of the expected radiation field. This becomes extremely important for correctly performing not only physical but also biologically based dose calculations, especially in cases where ions heavier than protons are involved. In addition, accurate prediction of emerging secondary radiation is of utmost importance in innovative areas of research aiming at in vivo treatment verification. This contribution will address the recent developments of the FLUKA MC code and its practical applications in this field. Refinements of the FLUKA nuclear models in the therapeutic energy interval lead to an improved description of the mixed radiation field as shown in the presented benchmarks against experimental data with both 4He and 12C ion beams. Accurate description of ionization energy losses and of particle scattering and interactions lead to the excellent agreement of calculated depth–dose profiles with those measured at leading European hadron therapy centers, both with proton and ion beams. In order to support the application of FLUKA in hospital-based environments, Flair, the FLUKA graphical interface, has been enhanced with the capability of translating CT DICOM images into voxel-based computational phantoms in a fast and well-structured way. The interface is capable of importing also radiotherapy treatment data described in DICOM RT standard. In addition, the interface is equipped with an intuitive PET scanner geometry generator and automatic recording of coincidence events. Clinically, similar cases will be presented both in terms of absorbed dose and biological dose calculations describing the various available features. PMID:27242956

  4. The FLUKA Code: An Accurate Simulation Tool for Particle Therapy.

    Battistoni, Giuseppe; Bauer, Julia; Boehlen, Till T; Cerutti, Francesco; Chin, Mary P W; Dos Santos Augusto, Ricardo; Ferrari, Alfredo; Ortega, Pablo G; Kozłowska, Wioletta; Magro, Giuseppe; Mairani, Andrea; Parodi, Katia; Sala, Paola R; Schoofs, Philippe; Tessonnier, Thomas; Vlachoudis, Vasilis

    2016-01-01

    Monte Carlo (MC) codes are increasingly spreading in the hadrontherapy community due to their detailed description of radiation transport and interaction with matter. The suitability of a MC code for application to hadrontherapy demands accurate and reliable physical models capable of handling all components of the expected radiation field. This becomes extremely important for correctly performing not only physical but also biologically based dose calculations, especially in cases where ions heavier than protons are involved. In addition, accurate prediction of emerging secondary radiation is of utmost importance in innovative areas of research aiming at in vivo treatment verification. This contribution will address the recent developments of the FLUKA MC code and its practical applications in this field. Refinements of the FLUKA nuclear models in the therapeutic energy interval lead to an improved description of the mixed radiation field as shown in the presented benchmarks against experimental data with both (4)He and (12)C ion beams. Accurate description of ionization energy losses and of particle scattering and interactions lead to the excellent agreement of calculated depth-dose profiles with those measured at leading European hadron therapy centers, both with proton and ion beams. In order to support the application of FLUKA in hospital-based environments, Flair, the FLUKA graphical interface, has been enhanced with the capability of translating CT DICOM images into voxel-based computational phantoms in a fast and well-structured way. The interface is capable of importing also radiotherapy treatment data described in DICOM RT standard. In addition, the interface is equipped with an intuitive PET scanner geometry generator and automatic recording of coincidence events. Clinically, similar cases will be presented both in terms of absorbed dose and biological dose calculations describing the various available features. PMID:27242956

  5. Shape Similarity Measures of Linear Entities

    2002-01-01

    The essential of feature matching technology lies in how to measure the similarity of spatial entities.Among all the possible similarity measures,the shape similarity measure is one of the most important measures because it is easy to collect the necessary parameters and it is also well matched with the human intuition.In this paper a new shape similarity measure of linear entities based on the differences of direction change along each line is presented and its effectiveness is illustrated.

  6. Learning Good Edit Similarities with Generalization Guarantees

    Bellet, Aurélien; Habrard, Amaury; Sebban, Marc

    2011-01-01

    Similarity and distance functions are essential to many learning algorithms, thus training them has attracted a lot of interest. When it comes to dealing with structured data (e.g., strings or trees), edit similarities are widely used, and there exists a few methods for learning them. However, these methods offer no theoretical guarantee as to the generalization performance and discriminative power of the resulting similarities. Recently, a theory of learning with good similarity functions wa...

  7. Composing Measures for Computing Text Similarity

    Bär, Daniel; Zesch, Torsten; Gurevych, Iryna

    2015-01-01

    We present a comprehensive study of computing similarity between texts. We start from the observation that while the concept of similarity is well grounded in psychology, text similarity is much less well-defined in the natural language processing community. We thus define the notion of text similarity and distinguish it from related tasks such as textual entailment and near-duplicate detection. We then identify multiple text dimensions, i.e. characteristics inherent to texts that can be used...

  8. Appropriate Similarity Measures for Author Cocitation Analysis

    van Eck, Nees Jan; Waltman, Ludo

    2007-01-01

    textabstractWe provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not ...

  9. Accurate Telescope Mount Positioning with MEMS Accelerometers

    Mészáros, László; Pál, András; Csépány, Gergely

    2014-01-01

    This paper describes the advantages and challenges of applying microelectromechanical accelerometer systems (MEMS accelerometers) in order to attain precise, accurate and stateless positioning of telescope mounts. This provides a completely independent method from other forms of electronic, optical, mechanical or magnetic feedback or real-time astrometry. Our goal is to reach the sub-arcminute range which is well smaller than the field-of-view of conventional imaging telescope systems. Here we present how this sub-arcminute accuracy can be achieved with very cheap MEMS sensors and we also detail how our procedures can be extended in order to attain even finer measurements. In addition, our paper discusses how can a complete system design be implemented in order to be a part of a telescope control system.

  10. Accurate estimation of indoor travel times

    Prentow, Thor Siiger; Blunck, Henrik; Stisen, Allan;

    2014-01-01

    the InTraTime method for accurately estimating indoor travel times via mining of historical and real-time indoor position traces. The method learns during operation both travel routes, travel times and their respective likelihood---both for routes traveled as well as for sub-routes thereof. InTraTime...... allows to specify temporal and other query parameters, such as time-of-day, day-of-week or the identity of the traveling individual. As input the method is designed to take generic position traces and is thus interoperable with a variety of indoor positioning systems. The method's advantages include...... a minimal-effort setup and self-improving operations due to unsupervised learning---as it is able to adapt implicitly to factors influencing indoor travel times such as elevators, rotating doors or changes in building layout. We evaluate and compare the proposed InTraTime method to indoor adaptions...

  11. Accurate sky background modelling for ESO facilities

    Full text: Ground-based measurements like e.g. high resolution spectroscopy are heavily influenced by several physical processes. Amongst others, line absorption/ emission, air glow by OH molecules, and scattering of photons within the earth's atmosphere make observations in particular from facilities like the future European extremely large telescope a challenge. Additionally, emission from unresolved extrasolar objects, the zodiacal light, the moon and even thermal emission from the telescope and the instrument contribute significantly to the broad band background over a wide wavelength range. In our talk we review these influences and give an overview on how they can be accurately modeled for increasing the overall precision of spectroscopic and imaging measurements. (author)

  12. Toward Accurate and Quantitative Comparative Metagenomics.

    Nayfach, Stephen; Pollard, Katherine S

    2016-08-25

    Shotgun metagenomics and computational analysis are used to compare the taxonomic and functional profiles of microbial communities. Leveraging this approach to understand roles of microbes in human biology and other environments requires quantitative data summaries whose values are comparable across samples and studies. Comparability is currently hampered by the use of abundance statistics that do not estimate a meaningful parameter of the microbial community and biases introduced by experimental protocols and data-cleaning approaches. Addressing these challenges, along with improving study design, data access, metadata standardization, and analysis tools, will enable accurate comparative metagenomics. We envision a future in which microbiome studies are replicable and new metagenomes are easily and rapidly integrated with existing data. Only then can the potential of metagenomics for predictive ecological modeling, well-powered association studies, and effective microbiome medicine be fully realized. PMID:27565341

  13. Accurate valence band width of diamond

    An accurate width is determined for the valence band of diamond by imaging photoelectron momentum distributions for a variety of initial- and final-state energies. The experimental result of 23.0±0.2 eV2 agrees well with first-principles quasiparticle calculations (23.0 and 22.88 eV) and significantly exceeds the local-density-functional width, 21.5±0.2 eV2. This difference quantifies effects of creating an excited hole state (with associated many-body effects) in a band measurement vs studying ground-state properties treated by local-density-functional calculations. copyright 1997 The American Physical Society

  14. Accurate Weather Forecasting for Radio Astronomy

    Maddalena, Ronald J.

    2010-01-01

    The NRAO Green Bank Telescope routinely observes at wavelengths from 3 mm to 1 m. As with all mm-wave telescopes, observing conditions depend upon the variable atmospheric water content. The site provides over 100 days/yr when opacities are low enough for good observing at 3 mm, but winds on the open-air structure reduce the time suitable for 3-mm observing where pointing is critical. Thus, to maximum productivity the observing wavelength needs to match weather conditions. For 6 years the telescope has used a dynamic scheduling system (recently upgraded; www.gb.nrao.edu/DSS) that requires accurate multi-day forecasts for winds and opacities. Since opacity forecasts are not provided by the National Weather Services (NWS), I have developed an automated system that takes available forecasts, derives forecasted opacities, and deploys the results on the web in user-friendly graphical overviews (www.gb.nrao.edu/ rmaddale/Weather). The system relies on the "North American Mesoscale" models, which are updated by the NWS every 6 hrs, have a 12 km horizontal resolution, 1 hr temporal resolution, run to 84 hrs, and have 60 vertical layers that extend to 20 km. Each forecast consists of a time series of ground conditions, cloud coverage, etc, and, most importantly, temperature, pressure, humidity as a function of height. I use the Liebe's MWP model (Radio Science, 20, 1069, 1985) to determine the absorption in each layer for each hour for 30 observing wavelengths. Radiative transfer provides, for each hour and wavelength, the total opacity and the radio brightness of the atmosphere, which contributes substantially at some wavelengths to Tsys and the observational noise. Comparisons of measured and forecasted Tsys at 22.2 and 44 GHz imply that the forecasted opacities are good to about 0.01 Nepers, which is sufficient for forecasting and accurate calibration. Reliability is high out to 2 days and degrades slowly for longer-range forecasts.

  15. Similarity Structure of Wave-Collapse

    Rypdal, Kristoffer; Juul Rasmussen, Jens; Thomsen, Kenneth

    1985-01-01

    the significance of similarity in the evolution of a collapsing wave packet is investigated. Numerical solutions in radial symmetry demonstrate that the similarity behaviour is local in space and time, and that some similarity solutions must be classified as improper solutions. The nature of the...

  16. Accurate measurement of streamwise vortices using dual-plane PIV

    Waldman, Rye M.; Breuer, Kenneth S. [Brown University, School of Engineering, Providence, RI (United States)

    2012-11-15

    Low Reynolds number aerodynamic experiments with flapping animals (such as bats and small birds) are of particular interest due to their application to micro air vehicles which operate in a similar parameter space. Previous PIV wake measurements described the structures left by bats and birds and provided insight into the time history of their aerodynamic force generation; however, these studies have faced difficulty drawing quantitative conclusions based on said measurements. The highly three-dimensional and unsteady nature of the flows associated with flapping flight are major challenges for accurate measurements. The challenge of animal flight measurements is finding small flow features in a large field of view at high speed with limited laser energy and camera resolution. Cross-stream measurement is further complicated by the predominately out-of-plane flow that requires thick laser sheets and short inter-frame times, which increase noise and measurement uncertainty. Choosing appropriate experimental parameters requires compromise between the spatial and temporal resolution and the dynamic range of the measurement. To explore these challenges, we do a case study on the wake of a fixed wing. The fixed model simplifies the experiment and allows direct measurements of the aerodynamic forces via load cell. We present a detailed analysis of the wake measurements, discuss the criteria for making accurate measurements, and present a solution for making quantitative aerodynamic load measurements behind free-flyers. (orig.)

  17. Partial Recurrent Laryngeal Nerve Paralysis or Paresis? In Search for the Accurate Diagnosis

    Alexander Delides

    2015-01-01

    Full Text Available “Partial paralysis” of the larynx is a term often used to describe a hypomobile vocal fold as is the term “paresis.” We present a case of a dysphonic patient with a mobility disorder of the vocal fold, for whom idiopathic “partial paralysis” was the diagnosis made after laryngeal electromyography, and discuss a proposition for a different implementation of the term.

  18. A Distributed Weighted Voting Approach for Accurate Eye Center Estimation

    Gagandeep Singh

    2013-05-01

    Full Text Available This paper proposes a novel approach for accurate estimation of eye center in face images. A distributed voting based approach in which every pixel votes is adopted for potential eye center candidates. The votes are distributed over a subset of pixels which lie in a direction which is opposite to gradient direction and the weightage of votes is distributed according to a novel mechanism.  First, image is normalized to eliminate illumination variations and its edge map is generated using Canny edge detector. Distributed voting is applied on the edge image to generate different eye center candidates. Morphological closing and local maxima search are used to reduce the number of candidates. A classifier based on spatial and intensity information is used to choose the correct candidates for the locations of eye center. The proposed approach was tested on BioID face database and resulted in better Iris detection rate than the state-of-the-art. The proposed approach is robust against illumination variation, small pose variations, presence of eye glasses and partial occlusion of eyes.Defence Science Journal, 2013, 63(3, pp.292-297, DOI:http://dx.doi.org/10.14429/dsj.63.2763

  19. Search-Based Peer Firms: Aggregating Investor Perceptions Through Internet Co-Searches

    Lee, Charles M.C.; Ma, Paul; Wang, Changyi Chang-Yi

    2015-01-01

    Applying a "co-search" algorithm to Internet traffic at the SEC's EDGAR website, we develop a novel method for identifying economically-related peer firms and for measuring their relative importance. Our results show that firms appearing in chronologically adjacent searches by the same individual (Search-Based Peers or SBPs) are fundamentally similar on multiple dimensions. In direct tests, SBPs dominate GICS6 industry peers in explaining cross-sectional variations in base firms' out-of-sampl...

  20. Web authentic and similar texts detection using AR digital signature

    Πούλος, Μάριος; Σκιαδόπουλος, Σπύρος; Μπώκος, Γιώργος Δ.

    2010-01-01

    In this paper, we propose a new identification technique based on an AR model with a complexity of size O(n) times in web form, with the aim of creating a unique serial number for texts and to detect authentic or similar texts. For the implementation of this purpose, we used an Autoregressive Model (AR) 15 th order, and for the identification procedure, we employed the cross-correlation algorithm. Empirical investigation showed that the proposed method may be used as an accurate method for id...

  1. Transmission coefficient of radiation doses for soil and similar materials

    In this paper simple exponential formula is proposed to define the coefficient of transmitted neutron and secondary gamma radiation doses through soil slabs thicker than 50 g/cm2. Radiation initiated from an isotropic fission neutron source. Unknown parameters of the formula are determined by least square fit and approximation by a rational function. Results obtained by applying this procedure are accurate enough for the purpose of preliminary shield design. Although this formula is derived from calculated data for soil and was originally meant for soil, it can be applied with satisfactory accuracy for similar materials as sand and brick. (authors). 8 refs., 4 tabs., 3 figs

  2. A fingerprint based metric for measuring similarities of crystalline structures

    Measuring similarities/dissimilarities between atomic structures is important for the exploration of potential energy landscapes. However, the cell vectors together with the coordinates of the atoms, which are generally used to describe periodic systems, are quantities not directly suitable as fingerprints to distinguish structures. Based on a characterization of the local environment of all atoms in a cell, we introduce crystal fingerprints that can be calculated easily and define configurational distances between crystalline structures that satisfy the mathematical properties of a metric. This distance between two configurations is a measure of their similarity/dissimilarity and it allows in particular to distinguish structures. The new method can be a useful tool within various energy landscape exploration schemes, such as minima hopping, random search, swarm intelligence algorithms, and high-throughput screenings

  3. Initial Experiences with Retrieving Similar Objects in Simulation Data

    Cheung, S-C S; Kamath, C

    2003-02-21

    Comparing the output of a physics simulation with an experiment, referred to as 'code validation,' is often done by visually comparing the two outputs. In order to determine which simulation is a closer match to the experiment, more quantitative measures are needed. In this paper, we describe our early experiences with this problem by considering the slightly simpler problem of finding objects in a image that are similar to a given query object. Focusing on a dataset from a fluid mixing problem, we report on our experiments with different features that are used to represent the objects of interest in the data. These early results indicate that the features must be chosen carefully to correctly represent the query object and the goal of the similarity search.

  4. A fingerprint based metric for measuring similarities of crystalline structures

    Zhu, Li; Fuhrer, Tobias; Schaefer, Bastian; Grauzinyte, Migle; Goedecker, Stefan, E-mail: stefan.goedecker@unibas.ch [Department of Physics, Universität Basel, Klingelbergstr. 82, 4056 Basel (Switzerland); Amsler, Maximilian [Department of Physics, Universität Basel, Klingelbergstr. 82, 4056 Basel (Switzerland); Department of Materials Science and Engineering, Northwestern University, Evanston, Illinois 60208 (United States); Faraji, Somayeh; Rostami, Samare; Ghasemi, S. Alireza [Institute for Advanced Studies in Basic Sciences, P.O. Box 45195-1159, Zanjan (Iran, Islamic Republic of); Sadeghi, Ali [Physics Department, Shahid Beheshti University, G. C., Evin, 19839 Tehran (Iran, Islamic Republic of); Wolverton, Chris [Department of Materials Science and Engineering, Northwestern University, Evanston, Illinois 60208 (United States)

    2016-01-21

    Measuring similarities/dissimilarities between atomic structures is important for the exploration of potential energy landscapes. However, the cell vectors together with the coordinates of the atoms, which are generally used to describe periodic systems, are quantities not directly suitable as fingerprints to distinguish structures. Based on a characterization of the local environment of all atoms in a cell, we introduce crystal fingerprints that can be calculated easily and define configurational distances between crystalline structures that satisfy the mathematical properties of a metric. This distance between two configurations is a measure of their similarity/dissimilarity and it allows in particular to distinguish structures. The new method can be a useful tool within various energy landscape exploration schemes, such as minima hopping, random search, swarm intelligence algorithms, and high-throughput screenings.

  5. A fingerprint based metric for measuring similarities of crystalline structures

    Zhu, Li; Fuhrer, Tobias; Schaefer, Bastian; Faraji, Somayeh; Rostami, Samara; Ghasemi, S Alireza; Sadeghi, Ali; Grauzinyte, Migle; Wolverton, Christopher; Goedecker, Stefan

    2015-01-01

    Measuring similarities/dissimilarities between atomic structures is important for the exploration of potential energy landscapes. However, the cell vectors together with the coordinates of the atoms, which are generally used to describe periodic systems, are quantities not suitable as fingerprints to distinguish structures. Based on a characterization of the local environment of all atoms in a cell we introduce crystal fingerprints that can be calculated easily and allow to define configurational distances between crystalline structures that satisfy the mathematical properties of a metric. This distance between two configurations is a measure of their similarity/dissimilarity and it allows in particular to distinguish structures. The new method is an useful tool within various energy landscape exploration schemes, such as minima hopping, random search, swarm intelligence algorithms and high-throughput screenings.

  6. A fingerprint based metric for measuring similarities of crystalline structures

    Zhu, Li; Amsler, Maximilian; Fuhrer, Tobias; Schaefer, Bastian; Faraji, Somayeh; Rostami, Samare; Ghasemi, S. Alireza; Sadeghi, Ali; Grauzinyte, Migle; Wolverton, Chris; Goedecker, Stefan

    2016-01-01

    Measuring similarities/dissimilarities between atomic structures is important for the exploration of potential energy landscapes. However, the cell vectors together with the coordinates of the atoms, which are generally used to describe periodic systems, are quantities not directly suitable as fingerprints to distinguish structures. Based on a characterization of the local environment of all atoms in a cell, we introduce crystal fingerprints that can be calculated easily and define configurational distances between crystalline structures that satisfy the mathematical properties of a metric. This distance between two configurations is a measure of their similarity/dissimilarity and it allows in particular to distinguish structures. The new method can be a useful tool within various energy landscape exploration schemes, such as minima hopping, random search, swarm intelligence algorithms, and high-throughput screenings.

  7. Faceted Semantic Search for Personalized Social Search

    Mas, Massimiliano Dal

    2012-01-01

    Actual social networks (like Facebook, Twitter, Linkedin, ...) need to deal with vagueness on ontological indeterminacy. In this paper is analyzed the prototyping of a faceted semantic search for personalized social search using the "joint meaning" in a community environment. User researches in a "collaborative" environment defined by folksonomies can be supported by the most common features on the faceted semantic search. A solution for the context-aware personalized search is based on "join...

  8. Search Engine Marketing a Search Engine Optimization

    Šmidrkal, Jan

    2012-01-01

    Thesis is monitoring general status in promoting of web pages, area of internet marketing a in internet search engines. It is examining dissect current tendencies in area of internet pages optimization for search engines and way of paid promotion in search engines especially in PPC systems. In theoretical part is focusing on effect of particular factors and methods of optimization on final position of pages in Search engines result page and summary of possibilities of paid promotion by the he...

  9. An accurate and portable eye movement detector for studying sleep in small animals.

    Sánchez-López, Álvaro; Escudero, Miguel

    2015-08-01

    Although eye movements are a highly valuable variable in attempts to precisely identify different periods of the sleep-wake cycle, their indirect measurement by electrooculography is not good enough. The present article describes an accurate and portable scleral search coil that allows the detection of tonic and phasic characteristics of eye movements in free-moving animals. Six adult Wistar rats were prepared for chronic recording of electroencephalography, electromyography and eye movements using the scleral search coil technique. We developed a miniature magnetic field generator made with two coils, consisting of 35 turns and 15 mm diameter of insulated 0.2 mm cooper wire, mounted in a frame of carbon fibre. This portable scleral search coil was fixed on the head of the animal, with each magnetic coil parallel to the eye coil and at 5 mm from each eye. Eye movements detected by the portable scleral search coil were compared with those measured by a commercial scleral search coil requiring immobilizing the head of the animal. No qualitative differences were found between the two scleral search coil systems in their capabilities to detect eye movements. This innovative portable scleral search coil system is an essential tool to detect slow changes in eye position and miniature rapid eye movements during sleep. The portable scleral search coil is much more suitable for detecting eye movements than any previously available system because of its precision and simplicity, and because it does not require immobilization of the animal's head. PMID:25590417

  10. Approaching system equilibrium with accurate or not accurate feedback information in a two-route system

    Zhao, Xiao-mei; Xie, Dong-fan; Li, Qi

    2015-02-01

    With the development of intelligent transport system, advanced information feedback strategies have been developed to reduce traffic congestion and enhance the capacity. However, previous strategies provide accurate information to travelers and our simulation results show that accurate information brings negative effects, especially in delay case. Because travelers prefer to the best condition route with accurate information, and delayed information cannot reflect current traffic condition but past. Then travelers make wrong routing decisions, causing the decrease of the capacity and the increase of oscillations and the system deviating from the equilibrium. To avoid the negative effect, bounded rationality is taken into account by introducing a boundedly rational threshold BR. When difference between two routes is less than the BR, routes have equal probability to be chosen. The bounded rationality is helpful to improve the efficiency in terms of capacity, oscillation and the gap deviating from the system equilibrium.

  11. Code Similarity on High Level Programs

    Bernal, M Miron; Nazuno, J Figueroa

    2007-01-01

    This paper presents a new approach for code similarity on High Level programs. Our technique is based on Fast Dynamic Time Warping, that builds a warp path or points relation with local restrictions. The source code is represented into Time Series using the operators inside programming languages that makes possible the comparison. This makes possible subsequence detection that represent similar code instructions. In contrast with other code similarity algorithms, we do not make features extraction. The experiments show that two source codes are similar when their respective Time Series are similar.

  12. Areal Feature Matching Based on Similarity Using Critic Method

    Kim, J.; Yu, K.

    2015-10-01

    In this paper, we propose an areal feature matching method that can be applied for many-to-many matching, which involves matching a simple entity with an aggregate of several polygons or two aggregates of several polygons with fewer user intervention. To this end, an affine transformation is applied to two datasets by using polygon pairs for which the building name is the same. Then, two datasets are overlaid with intersected polygon pairs that are selected as candidate matching pairs. If many polygons intersect at this time, we calculate the inclusion function between such polygons. When the value is more than 0.4, many of the polygons are aggregated as single polygons by using a convex hull. Finally, the shape similarity is calculated between the candidate pairs according to the linear sum of the weights computed in CRITIC method and the position similarity, shape ratio similarity, and overlap similarity. The candidate pairs for which the value of the shape similarity is more than 0.7 are determined as matching pairs. We applied the method to two geospatial datasets: the digital topographic map and the KAIS map in South Korea. As a result, the visual evaluation showed two polygons that had been well detected by using the proposed method. The statistical evaluation indicates that the proposed method is accurate when using our test dataset with a high F-measure of 0.91.

  13. Internet search. Types of search engines

    Kralina, Anna Sergeevna

    2010-01-01

    For effective search in the Internet it is necessary to know: what types of search machines exist. First form a machine with classified lists of resources. The second type consists of machines that use queries about the resource. The third type of search machines consists of machines-catalogs. The fourth type is represented by meta-machines.

  14. Div-BLAST: Diversification of Sequence Search Results

    Eser, Elif; Can, Tolga; Ferhatosmanoğlu, Hakan

    2014-01-01

    Sequence similarity tools, such as BLAST, seek sequences most similar to a query from a database of sequences. They return results significantly similar to the query sequence and that are typically highly similar to each other. Most sequence analysis tasks in bioinformatics require an exploratory approach, where the initial results guide the user to new searches. However, diversity has not yet been considered an integral component of sequence search tools for this discipline. Some redundancy ...

  15. Stability of similarity measurements for bipartite networks

    Liu, Jian-Guo; Pan, Xue; Guo, Qiang; Zhou, Tao

    2015-01-01

    Similarity is a fundamental measure in network analyses and machine learning algorithms, with wide applications ranging from personalized recommendation to socio-economic dynamics. We argue that an effective similarity measurement should guarantee the stability even under some information loss. With six bipartite networks, we investigate the stabilities of fifteen similarity measurements by comparing the similarity matrixes of two data samples which are randomly divided from original data sets. Results show that, the fifteen measurements can be well classified into three clusters according to their stabilities, and measurements in the same cluster have similar mathematical definitions. In addition, we develop a top-$n$-stability method for personalized recommendation, and find that the unstable similarities would recommend false information to users, and the performance of recommendation would be largely improved by using stable similarity measurements. This work provides a novel dimension to analyze and eval...

  16. Trajectories through similarity space produced by local neocortical circuits

    Beggs, John; Chen, Wei; Hobbs, Jon; Tang, Aonan

    2009-03-01

    The dynamics found in local cortical networks strongly impact the types of computations they can perform. Major classes of cortical network models assume that spatio-temporal activity evolves with either ultra-stable, chaotic or neutral dynamics. While experimental evidence has demonstrated that repeatable activity states can exist in cortical networks, it is still unclear what the spatio-temporal dynamics near these states are. To accurately address this question, the trajectories of similar, but not identical, inputs must be quantified. We use 60 channel microelectrode arrays to measure spatio-temporal trajectories through similarity space at 4 ms resolution in organotypic cortical cultures and acute cortical slices. Here we show that while attractive, chaotic and neutral trajectories can exist in these networks, the average trajectory has a Lyapunov exponent near zero (0.01 ± 0.2, mean ± s.d.), indicating that neutral dynamics prevail.

  17. Studying of Semantic Similarity Methods in Ontology

    Vahideh Reshadat

    2012-06-01

    Full Text Available Humans are able to easily judge if a pair of concepts are related in some way. Understanding of how humans are able to perform this task is not easy. Semantic similarity denotes computing the similarity between concepts, having the same meaning or related information, which are not necessarily lexically similar. Semantic similarity between concepts plays an important role in Semantic Web, knowledge sharing, Web mining, semantic sense understanding and text summarization. This also is an important problem in Natural Language Processing and Information Retrieval Researches. These techniques are becoming important components of most of the Information Retrieval (IR, Information Extraction (IE and other intelligent knowledge based systems. Therefore it has received considerable attention in the literature. Ontology has a good hierarchical structure of concepts. In the ontology, semantic information can be realized through the semantic relationship of concepts. Ontology-based semantic similarity techniques can estimate the semantic similarity between two hierarchically expressed concepts in a given ontology or taxonomy. Semantic similarity is usually computed by mapping concepts to ontology and by examining their relationships in it. The most popular semantic similarity methods are implemented and evaluated using WordNet and MeSH. Several algorithmic approaches for computing semantic similarity have been proposed. This paper discusses the various approaches used for identifying semantically similar concepts in ontology.

  18. Tabu Search Based Strategies for Conformational Search

    Stepanenko, Svetlana; Engels, Bernd

    2009-09-01

    This paper presents an application of the new nonlinear global optimization routine gradient only tabu search (GOTS) to conformational search problems. It is based on the tabu search strategy which tries to determine the global minimum of a function by the steepest descent-modest ascent strategy. The refinement of ranking procedure of the original GOTS method and the exploitation of simulated annealing elements are described, and the modifications of the GOTS algorithm necessary to adopt it to conformation searches are explained. The utility of the GOTS for conformational search problems is tested using various examples.

  19. SEARCHING LOST PEOPLE WITH UAVS: THE SYSTEM AND RESULTS OF THE CLOSE-SEARCH PROJECT

    Molina, P.; I. Colomina; Vitoria, T.; P.F. Silva; J. Skaloud; Kornus, W.; R. Prades; Aguilera, C.

    2012-01-01

    This paper will introduce the goals, concept and results of the project named CLOSE-SEARCH, which stands for ’Accurate and safe EGNOS-SoL Navigation for UAV-based low-cost Search-And-Rescue (SAR) operations’. The main goal is to integrate a medium-size, helicopter-type Unmanned Aerial Vehicle (UAV), a thermal imaging sensor and an EGNOS-based multi-sensor navigation system, including an Autonomous Integrity Monitoring (AIM) capability, to support search operations in difficult-to-acc...

  20. Comparing NEO Search Telescopes

    Myhrvold, Nathan

    2016-04-01

    Multiple terrestrial and space-based telescopes have been proposed for detecting and tracking near-Earth objects (NEOs). Detailed simulations of the search performance of these systems have used complex computer codes that are not widely available, which hinders accurate cross-comparison of the proposals and obscures whether they have consistent assumptions. Moreover, some proposed instruments would survey infrared (IR) bands, whereas others would operate in the visible band, and differences among asteroid thermal and visible-light models used in the simulations further complicate like-to-like comparisons. I use simple physical principles to estimate basic performance metrics for the ground-based Large Synoptic Survey Telescope and three space-based instruments—Sentinel, NEOCam, and a Cubesat constellation. The performance is measured against two different NEO distributions, the Bottke et al. distribution of general NEOs, and the Veres et al. distribution of Earth-impacting NEO. The results of the comparison show simplified relative performance metrics, including the expected number of NEOs visible in the search volumes and the initial detection rates expected for each system. Although these simplified comparisons do not capture all of the details, they give considerable insight into the physical factors limiting performance. Multiple asteroid thermal models are considered, including FRM, NEATM, and a new generalized form of FRM. I describe issues with how IR albedo and emissivity have been estimated in previous studies, which may render them inaccurate. A thermal model for tumbling asteroids is also developed and suggests that tumbling asteroids may be surprisingly difficult for IR telescopes to observe.

  1. Fast and Provably Accurate Bilateral Filtering.

    Chaudhury, Kunal N; Dabhade, Swapnil D

    2016-06-01

    The bilateral filter is a non-linear filter that uses a range filter along with a spatial filter to perform edge-preserving smoothing of images. A direct computation of the bilateral filter requires O(S) operations per pixel, where S is the size of the support of the spatial filter. In this paper, we present a fast and provably accurate algorithm for approximating the bilateral filter when the range kernel is Gaussian. In particular, for box and Gaussian spatial filters, the proposed algorithm can cut down the complexity to O(1) per pixel for any arbitrary S . The algorithm has a simple implementation involving N+1 spatial filterings, where N is the approximation order. We give a detailed analysis of the filtering accuracy that can be achieved by the proposed approximation in relation to the target bilateral filter. This allows us to estimate the order N required to obtain a given accuracy. We also present comprehensive numerical results to demonstrate that the proposed algorithm is competitive with the state-of-the-art methods in terms of speed and accuracy. PMID:27093722

  2. Accurate adiabatic correction in the hydrogen molecule

    Pachucki, Krzysztof, E-mail: krp@fuw.edu.pl [Faculty of Physics, University of Warsaw, Pasteura 5, 02-093 Warsaw (Poland); Komasa, Jacek, E-mail: komasa@man.poznan.pl [Faculty of Chemistry, Adam Mickiewicz University, Umultowska 89b, 61-614 Poznań (Poland)

    2014-12-14

    A new formalism for the accurate treatment of adiabatic effects in the hydrogen molecule is presented, in which the electronic wave function is expanded in the James-Coolidge basis functions. Systematic increase in the size of the basis set permits estimation of the accuracy. Numerical results for the adiabatic correction to the Born-Oppenheimer interaction energy reveal a relative precision of 10{sup −12} at an arbitrary internuclear distance. Such calculations have been performed for 88 internuclear distances in the range of 0 < R ⩽ 12 bohrs to construct the adiabatic correction potential and to solve the nuclear Schrödinger equation. Finally, the adiabatic correction to the dissociation energies of all rovibrational levels in H{sub 2}, HD, HT, D{sub 2}, DT, and T{sub 2} has been determined. For the ground state of H{sub 2} the estimated precision is 3 × 10{sup −7} cm{sup −1}, which is almost three orders of magnitude higher than that of the best previous result. The achieved accuracy removes the adiabatic contribution from the overall error budget of the present day theoretical predictions for the rovibrational levels.

  3. Accurate fission data for nuclear safety

    Solders, A; Jokinen, A; Kolhinen, V S; Lantz, M; Mattera, A; Penttila, H; Pomp, S; Rakopoulos, V; Rinta-Antila, S

    2013-01-01

    The Accurate fission data for nuclear safety (AlFONS) project aims at high precision measurements of fission yields, using the renewed IGISOL mass separator facility in combination with a new high current light ion cyclotron at the University of Jyvaskyla. The 30 MeV proton beam will be used to create fast and thermal neutron spectra for the study of neutron induced fission yields. Thanks to a series of mass separating elements, culminating with the JYFLTRAP Penning trap, it is possible to achieve a mass resolving power in the order of a few hundred thousands. In this paper we present the experimental setup and the design of a neutron converter target for IGISOL. The goal is to have a flexible design. For studies of exotic nuclei far from stability a high neutron flux (10^12 neutrons/s) at energies 1 - 30 MeV is desired while for reactor applications neutron spectra that resembles those of thermal and fast nuclear reactors are preferred. It is also desirable to be able to produce (semi-)monoenergetic neutrons...

  4. K-Means Clustering For Segment Web Search Results

    Hasitha Indika Arumawadu; R. M. Kapila Tharanga Rathnayaka; S. K. Illangarathne

    2015-01-01

    Clustering is the power full technique for segment relevant data into different levels. This study has proposed K-means clustering method for cluster web search results for search engines. For represent documents we used vector space model and use cosine similarity method for measure similarity between user query and the search results. As an improvement of K-means clustering we used distortion curve method for identify optimal initial number of clusters.

  5. Phase Coherent Observations and Millisecond Pulsar Searches

    Shrauner, Jay Arthur

    1997-07-01

    We have built a new radio astronomical receiving system designed specifically for very high precision timing and polarimetry of fast pulsars. Unlike most detectors currently used to study pulsars, this instrument does not square the received signal at the time of observation. Instead, voltages proportional to the instantaneous electric vectors of incoming signals are digitized, time-tagged, and recorded on high speed magnetic media. During processing, the data streams are convolved with an inverse 'chirp' function that completely removes the phase retardation introduced by interstellar dispersion. The intrinsic time resolution of this system is the inverse of the system bandwidth, typically well under 1 μs. We have tested this and another phase-coherent observing-system in observations using the Arecibo 305 m and Green Bank 140 foot telescopes. With these two sets of observations we have studied giant pulses, performed high precision timing, and obtained high-resolution polarization profiles and accurate dispersion We have verified the existence of pulses with intensities hundreds of measures. times the mean for both the main pulse and interpulse of PSR B1937+21, and have established that the amplitudes of both types of giant pulses have similar power-law distributions. The giant pulses are narrower than the average pulses, systematically delayed by 40-50 μs, and many are nearly 100% circularly polarized. We have also conducted two searches of the Northern hemisphere for pulsars. The first used the original pulsar discovery telescope in Cambridge, England to search the entire Northern hemisphere at 81.5 MHz, with an average sensitivity to slow pulsars of 230 mJy. Although we obtained flux densities and pulse profiles of 20 known pulsars, no new pulsars were discovered. The second search effort covered a total of 384 deg2 of previously unsearched sky at 430 MHz using the Arecibo telescope, with an average sensitivity to slow pulsars of 0.83 mJy. We discovered 7

  6. Good edit similarity learning by loss minimization

    Bellet, Aurélien; Habrard, Amaury; Sebban, Marc

    2012-01-01

    Similarity functions are a fundamental component of many learning algorithms. When dealing with string or tree-structured data, edit distancebased measures are widely used, and there exists a few methods for learning them from data. However, these methods offer no theoretical guarantee as to the generalization ability and discriminative power of the learned similarities. In this paper, we propose a loss minimization-based edit similarity learning approach, called GESL. It is driven by the not...

  7. Dark matter halo's and self similarity

    Alard, C.

    2012-01-01

    This papers explores the self similar solutions of the Vlasov-Poisson system and their relation to the gravitational collapse of dynamically cold systems. Analytic solutions are derived for power law potential in one dimension, and extensions of these solutions in three dimensions are proposed. Next the self similarity of the collapse of cold dynamical systems is investigated numerically. The fold system in phase space is consistent with analytic self similar solutions, the solutions present ...

  8. Learning music similarity from relative user ratings

    Wolff, D.; Weyde, T.

    2013-01-01

    Computational modelling of music similarity is an increasingly important part of personalisation and optimisation in music information retrieval and research in music perception and cognition. The use of relative similarity ratings is a new and promising approach to modelling similarity that avoids well known problems with absolute ratings. In this article, we use relative ratings from the MagnaTagATune dataset with new and existing variants of state-of-the-art algorithms and provide the firs...

  9. Similarity and a Duality for Fullerenes

    Jennifer J. Edmond; Graver, Jack E.

    2015-01-01

    Fullerenes are molecules of carbon that are modeled by trivalent plane graphs with only pentagonal and hexagonal faces. Scaling up a fullerene gives a notion of similarity, and fullerenes are partitioned into similarity classes. In this expository article, we illustrate how the values of two important fullerene parameters can be deduced for all fullerenes in a similarity class by computing the values of these parameters for just the three smallest representatives of that class. In addition, i...

  10. Molecular quantum similarity using conceptual DFT descriptors

    Patrick Bultinck; Ramon carbó-dorca

    2005-09-01

    This paper reports a Molecular Quantum Similarity study for a set of congeneric steroid molecules, using as basic similarity descriptors electron density ρ (r), shape function (r), the Fukui functions +(r) and -(r) and local softness +(r) and -(r). Correlations are investigated between similarity indices for each couple of descriptors used and compared to assess whether these different descriptors sample different information and to investigate what information is revealed by each descriptor.

  11. Quadruplet-Wise Image Similarity Learning

    Law M.T.; Thome N.; Cord M.

    2013-01-01

    International audience This paper introduces a novel similarity learning frame-work. Working with inequality constraints involving quadruplets of images, our approach aims at efficiently modeling similarity from rich or complex semantic label relationships. From these quadruplet-wise constraints, we propose a similarity learning framework relying on a con-vex optimization scheme. We then study how our metric learning scheme can exploit specific class relationships, such as class ranking (r...

  12. Semantic Features for Classifying Referring Search Terms

    May, Chandler J.; Henry, Michael J.; McGrath, Liam R.; Bell, Eric B.; Marshall, Eric J.; Gregory, Michelle L.

    2012-05-11

    When an internet user clicks on a result in a search engine, a request is submitted to the destination web server that includes a referrer field containing the search terms given by the user. Using this information, website owners can analyze the search terms leading to their websites to better understand their visitors needs. This work explores some of the features that can be used for classification-based analysis of such referring search terms. We present initial results for the example task of classifying HTTP requests countries of origin. A system that can accurately predict the country of origin from query text may be a valuable complement to IP lookup methods which are susceptible to the obfuscation of dereferrers or proxies. We suggest that the addition of semantic features improves classifier performance in this example application. We begin by looking at related work and presenting our approach. After describing initial experiments and results, we discuss paths forward for this work.

  13. The Search for Another Earth

    2016-07-01

    Is there life anywhere else in the vast cosmos?Are there planets similar to the Earth? For centuries,these questions baffled curious minds. Eithera positive or negative answer, if found oneday, would carry a deep philosophical significancefor our very existence in the universe. Althoughthe search for extra-terrestrial intelligence wasinitiated decades ago, a systematic scientific andglobal quest towards achieving a convincing answerbegan in 1995 with the discovery of the firstconfirmed planet orbiting around the solar-typestar 51 Pegasi. Since then, astronomers have discoveredmany exoplanets using two main techniques,radial velocity and transit measurements.In the first part of this article, we shall describethe different astronomical methods through whichthe extrasolar planets of various kinds are discovered.In the second part of the article we shalldiscuss the various kinds of exoplanets, in particularabout the habitable planets discovered tilldate and the present status of our search for ahabitable planet similar to the Earth.

  14. Efficient Proposed Framework for Semantic Search Engine using New Semantic Ranking Algorithm

    M. M. El-gayar

    2015-08-01

    Full Text Available The amount of information raises billions of databases every year and there is an urgent need to search for that information by a specialize tool called search engine. There are many of search engines available today, but the main challenge in these search engines is that most of them cannot retrieve meaningful information intelligently. The semantic web technology is a solution that keeps data in a readable format that helps machines to match smartly this data with related information based on meanings. In this paper, we will introduce a proposed semantic framework that includes four phases crawling, indexing, ranking and retrieval phase. This semantic framework operates over a sorting RDF by using efficient proposed ranking algorithm and enhanced crawling algorithm. The enhanced crawling algorithm crawls relevant forum content from the web with minimal overhead. The proposed ranking algorithm is produced to order and evaluate similar meaningful data in order to make the retrieval process becomes faster, easier and more accurate. We applied our work on a standard database and achieved 99 percent effectiveness on semantic performance in minimum time and less than 1 percent error rate compared with the other semantic systems.

  15. Semantic Web Based Efficient Search Using Ontology and Mathematical Model

    K.Palaniammal

    2014-01-01

    Full Text Available The semantic web is the forthcoming technology in the world of search engine. It becomes mainly focused towards the search which is more meaningful rather than the syntactic search prevailing now. This proposed work concerns about the semantic search with respect to the educational domain. In this paper, we propose semantic web based efficient search using ontology and mathematical model that takes into account the misleading, unmatched kind of service information, lack of relevant domain knowledge and the wrong service queries. To solve these issues in this framework is designed to make three major contributions, which are ontology knowledge base, Natural Language Processing (NLP techniques and search model. Ontology knowledge base is to store domain specific service ontologies and service description entity (SDE metadata. The search model is to retrieve SDE metadata as efficient for Education lenders, which include mathematical model. The Natural language processing techniques for spell-check and synonym based search. The results are retrieved and stored in an ontology, which in terms prevents the data redundancy. The results are more accurate to search, sensitive to spell check and synonymous context. This paper reduces the user’s time and complexity in finding for the correct results of his/her search text and our model provides more accurate results. A series of experiments are conducted in order to respectively evaluate the mechanism and the employed mathematical model.

  16. A Survey of Meta Search Engine%元搜索引擎研究

    张卫丰; 徐宝文; 周晓宇; 李东; 许蕾

    2001-01-01

    With the explosive increase of the network information,it is more and more difficult for people to look up information. The occurrence of the Web search engines overcomes this problem in some degree. However, because different search engines use different mechanisms, scope and algorithms, the repetition of the search results for the same query is no more than 34 %. If wish to get relativly fullscale ,accurate search results,multi-search engines should be used and the meta search engines occur. In this paper ,the meta search engines are surveyed. At first ,the history ,the principles and the elements of the meta search engines are discussed. Then,the related creteria of the meta search engines are analyzed and several typical meta search engines are compared. Finally,on this base,the trend of the meta search engine is introduced.

  17. Accurate orbit propagation with planetary close encounters

    Baù, Giulio; Milani Comparetti, Andrea; Guerra, Francesca

    2015-08-01

    We tackle the problem of accurately propagating the motion of those small bodies that undergo close approaches with a planet. The literature is lacking on this topic and the reliability of the numerical results is not sufficiently discussed. The high-frequency components of the perturbation generated by a close encounter makes the propagation particularly challenging both from the point of view of the dynamical stability of the formulation and the numerical stability of the integrator. In our approach a fixed step-size and order multistep integrator is combined with a regularized formulation of the perturbed two-body problem. When the propagated object enters the region of influence of a celestial body, the latter becomes the new primary body of attraction. Moreover, the formulation and the step-size will also be changed if necessary. We present: 1) the restarter procedure applied to the multistep integrator whenever the primary body is changed; 2) new analytical formulae for setting the step-size (given the order of the multistep, formulation and initial osculating orbit) in order to control the accumulation of the local truncation error and guarantee the numerical stability during the propagation; 3) a new definition of the region of influence in the phase space. We test the propagator with some real asteroids subject to the gravitational attraction of the planets, the Yarkovsky and relativistic perturbations. Our goal is to show that the proposed approach improves the performance of both the propagator implemented in the OrbFit software package (which is currently used by the NEODyS service) and of the propagator represented by a variable step-size and order multistep method combined with Cowell's formulation (i.e. direct integration of position and velocity in either the physical or a fictitious time).

  18. Accurate paleointensities - the multi-method approach

    de Groot, Lennart

    2016-04-01

    The accuracy of models describing rapid changes in the geomagnetic field over the past millennia critically depends on the availability of reliable paleointensity estimates. Over the past decade methods to derive paleointensities from lavas (the only recorder of the geomagnetic field that is available all over the globe and through geologic times) have seen significant improvements and various alternative techniques were proposed. The 'classical' Thellier-style approach was optimized and selection criteria were defined in the 'Standard Paleointensity Definitions' (Paterson et al, 2014). The Multispecimen approach was validated and the importance of additional tests and criteria to assess Multispecimen results must be emphasized. Recently, a non-heating, relative paleointensity technique was proposed -the pseudo-Thellier protocol- which shows great potential in both accuracy and efficiency, but currently lacks a solid theoretical underpinning. Here I present work using all three of the aforementioned paleointensity methods on suites of young lavas taken from the volcanic islands of Hawaii, La Palma, Gran Canaria, Tenerife, and Terceira. Many of the sampled cooling units are <100 years old, the actual field strength at the time of cooling is therefore reasonably well known. Rather intuitively, flows that produce coherent results from two or more different paleointensity methods yield the most accurate estimates of the paleofield. Furthermore, the results for some flows pass the selection criteria for one method, but fail in other techniques. Scrutinizing and combing all acceptable results yielded reliable paleointensity estimates for 60-70% of all sampled cooling units - an exceptionally high success rate. This 'multi-method paleointensity approach' therefore has high potential to provide the much-needed paleointensities to improve geomagnetic field models for the Holocene.

  19. Towards Accurate Application Characterization for Exascale (APEX)

    Hammond, Simon David [Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)

    2015-09-01

    Sandia National Laboratories has been engaged in hardware and software codesign activities for a number of years, indeed, it might be argued that prototyping of clusters as far back as the CPLANT machines and many large capability resources including ASCI Red and RedStorm were examples of codesigned solutions. As the research supporting our codesign activities has moved closer to investigating on-node runtime behavior a nature hunger has grown for detailed analysis of both hardware and algorithm performance from the perspective of low-level operations. The Application Characterization for Exascale (APEX) LDRD was a project concieved of addressing some of these concerns. Primarily the research was to intended to focus on generating accurate and reproducible low-level performance metrics using tools that could scale to production-class code bases. Along side this research was an advocacy and analysis role associated with evaluating tools for production use, working with leading industry vendors to develop and refine solutions required by our code teams and to directly engage with production code developers to form a context for the application analysis and a bridge to the research community within Sandia. On each of these accounts significant progress has been made, particularly, as this report will cover, in the low-level analysis of operations for important classes of algorithms. This report summarizes the development of a collection of tools under the APEX research program and leaves to other SAND and L2 milestone reports the description of codesign progress with Sandia’s production users/developers.

  20. Fast, accurate standardless XRF analysis with IQ+

    Full text: Due to both chemical and physical effects, the most accurate XRF data are derived from calibrations set up using in-type standards, necessitating some prior knowledge of the samples being analysed. Whilst this is often the case for routine samples, particularly in production control, for completely unknown samples the identification and availability of in-type standards can be problematic. Under these circumstances standardless analysis can offer a viable solution. Successful analysis of completely unknown samples requires a complete chemical overview of the speciemen together with the flexibility of a fundamental parameters (FP) algorithm to handle wide-ranging compositions. Although FP algorithms are improving all the time, most still require set-up samples to define the spectrometer response to a particular element. Whilst such materials may be referred to as standards, the emphasis in this kind of analysis is that only a single calibration point is required per element and that the standard chosen does not have to be in-type. The high sensitivities of modern XRF spectrometers, together with recent developments in detector counting electronics that possess a large dynamic range and high-speed data processing capacity bring significant advances to fast, standardless analysis. Illustrated with a tantalite-columbite heavy-mineral concentrate grading use-case, this paper will present the philosophy behind the semi-quantitative IQ+ software and the required hardware. This combination can give a rapid scan-based overview and quantification of the sample in less than two minutes, together with the ability to define channels for specific elements of interest where higher accuracy and lower levels of quantification are required. The accuracy, precision and limitations of standardless analysis will be assessed using certified reference materials of widely differing chemical and physical composition. Copyright (2002) Australian X-ray Analytical Association Inc

  1. Sound Search Engine Concept

    2006-01-01

    Sound search is provided by the major search engines, however, indexing is text based, not sound based. We will establish a dedicated sound search services with based on sound feature indexing. The current demo shows the concept of the sound search engine. The first engine will be realased June...

  2. Recommending search experiences

    Saaya, Zurina; Smyth, Barry; Coyle, Maurice; Briggs, Peter

    2011-01-01

    In this paper we focus on a multi-case case-based reasoning system to support users during collaborative search tasks. In particular we describe how repositories of search experiences/knowledge can be recommended to users at search time. These recommendations are evaluated using real-world search data.

  3. Web Search Engines

    Rajashekar, TB

    1998-01-01

    The World Wide Web is emerging as an all-in-one information source. Tools for searching Web-based information include search engines, subject directories and meta search tools. We take a look at key features of these tools and suggest practical hints for effective Web searching.

  4. How doctors search

    Lykke, Marianne; Price, Susan; Delcambre, Lois

    2012-01-01

    context-specific aspects of the main topic of the documents. We have tested the model in an interactive searching study with family doctors with the purpose to explore doctors’ querying behaviour, how they applied the means for specifying a search, and how these features contributed to the search outcome....... In general, the doctors were capable of exploiting system features and search tactics during the searching. Most searchers produced well-structured queries that contained appropriate search facets. When searches failed it was not due to query structure or query length. Failures were mostly caused by...

  5. Large Neighborhood Search

    Pisinger, David; Røpke, Stefan

    2010-01-01

    Heuristics based on large neighborhood search have recently shown outstanding results in solving various transportation and scheduling problems. Large neighborhood search methods explore a complex neighborhood by use of heuristics. Using large neighborhoods makes it possible to find better...... candidate solutions in each iteration and hence traverse a more promising search path. Starting from the large neighborhood search method,we give an overview of very large scale neighborhood search methods and discuss recent variants and extensions like variable depth search and adaptive large neighborhood...... search....

  6. Similarity indices I: what do they measure

    A method for estimating the effects of environmental effusions on ecosystems is described. The characteristics of 25 similarity indices used in studies of ecological communities were investigated. The type of data structure, to which these indices are frequently applied, was described as consisting of vectors of measurements on attributes (species) observed in a set of samples. A general similarity index was characterized as the result of a two-step process defined on a pair of vectors. In the first step an attribute similarity score is obtained for each attribute by comparing the attribute values observed in the pair of vectors. The result is a vector of attribute similarity scores. These are combined in the second step to arrive at the similarity index. The operation in the first step was characterized as a function, g, defined on pairs of attribute values. The second operation was characterized as a function, F, defined on the vector of attribute similarity scores from the first step. Usually, F was a simple sum or weighted sum of the attribute similarity scores. It is concluded that similarity indices should not be used as the test statistic to discriminate between two ecological communities

  7. Similarity indices I: what do they measure.

    Johnston, J.W.

    1976-11-01

    A method for estimating the effects of environmental effusions on ecosystems is described. The characteristics of 25 similarity indices used in studies of ecological communities were investigated. The type of data structure, to which these indices are frequently applied, was described as consisting of vectors of measurements on attributes (species) observed in a set of samples. A general similarity index was characterized as the result of a two-step process defined on a pair of vectors. In the first step an attribute similarity score is obtained for each attribute by comparing the attribute values observed in the pair of vectors. The result is a vector of attribute similarity scores. These are combined in the second step to arrive at the similarity index. The operation in the first step was characterized as a function, g, defined on pairs of attribute values. The second operation was characterized as a function, F, defined on the vector of attribute similarity scores from the first step. Usually, F was a simple sum or weighted sum of the attribute similarity scores. It is concluded that similarity indices should not be used as the test statistic to discriminate between two ecological communities.

  8. Perceived Similarity, Proactive Adjustment, and Organizational Socialization

    Kammeyer-Mueller, John D.; Livingston, Beth A.; Liao, Hui

    2011-01-01

    The present study explores how perceived demographic and attitudinal similarity can influence proactive behavior among organizational newcomers. We propose that newcomers who perceive themselves as similar to their co-workers will be more willing to seek new information or build relationships, which in turn will lead to better long-term…

  9. Some Effects of Similarity Self-Disclosure

    Murphy, Kevin C.; Strong, Stanley R.

    1972-01-01

    College males were interviewed about how college had altered their friendships, values, and plans. The interviewers diclosed experiences and feelings similar to those revealed by the students. Results support Byrne's Law of Similarity in generating interpersonal attraction in the interview and suggest that the timing of self-disclosures is…

  10. Quaternionic Salkowski Curves and Similar Curves

    Önder, Mehmet

    2012-01-01

    In this paper, we give the definitions and characterizations of spatial quaternionic Salkowski, anti-Salkowski and similar curves. We show that quaternionic Salkowski and anti-Salkowski curves are quaternionic slant helices. Moreover, we obtain that the families of quaternionic Salkowski and anti-Salkowski curves form the families of quaternionic similar curves.

  11. Unprecedented accurate abundances: signatures of other Earths?

    Melendez, J; Gustafsson, B; Yong, D; Ramírez, I

    2009-01-01

    For more than 140 years the chemical composition of our Sun has been considered typical of solar-type stars. Our highly differential elemental abundance analysis of unprecedented accuracy (~0.01 dex) of the Sun relative to solar twins, shows that the Sun has a peculiar chemical composition with a ~20% depletion of refractory elements relative to the volatile elements in comparison with solar twins. The abundance differences correlate strongly with the condensation temperatures of the elements. A similar study of solar analogs from planet surveys shows that this peculiarity also holds in comparisons with solar analogs known to have close-in giant planets while the majority of solar analogs without detected giant planets show the solar abundance pattern. The peculiarities in the solar chemical composition can be explained as signatures of the formation of terrestrial planets like our own Earth.

  12. Measure of Node Similarity in Multilayer Networks

    Mollgaard, Anders; Dammeyer, Jesper; Jensen, Mogens H; Lehmann, Sune; Mathiesen, Joachim

    2016-01-01

    The weight of links in a network is often related to the similarity of the nodes. Here, we introduce a simple tunable measure for analysing the similarity of nodes across different link weights. In particular, we use the measure to analyze homophily in a group of 659 freshman students at a large university. Our analysis is based on data obtained using smartphones equipped with custom data collection software, complemented by questionnaire-based data. The network of social contacts is represented as a weighted multilayer network constructed from different channels of telecommunication as well as data on face-to-face contacts. We find that even strongly connected individuals are not more similar with respect to basic personality traits than randomly chosen pairs of individuals. In contrast, several socio-demographics variables have a significant degree of similarity. We further observe that similarity might be present in one layer of the multilayer network and simultaneously be absent in the other layers. For a...

  13. Semantic Similarity Calculation of Chinese Word

    Liqiang Pan

    2014-08-01

    Full Text Available This paper puts forward a two layers computing method to calculate semantic similarity of Chinese word. Firstly, using Latent Dirichlet Allocation (LDA subject model to generate subject spatial domain. Then mapping word into topic space and forming topic distribution which is used to calculate semantic similarity of word(the first layer computing. Finally, using semantic dictionary "HowNet" to deeply excavate semantic similarity of word (the second layer computing. This method not only overcomes the problem that it’s not specific enough merely using LDA to calculate semantic similarity of word, but also solves the problems such as new words (haven’t been added in dictionary and without considering specific context when calculating semantic similarity based on semantic dictionary "HowNet". By experimental comparison, this thesis proves feasibility,availability and advantages of the calculation method.

  14. Similarity and a Duality for Fullerenes

    Jennifer J. Edmond

    2015-11-01

    Full Text Available Fullerenes are molecules of carbon that are modeled by trivalent plane graphs with only pentagonal and hexagonal faces. Scaling up a fullerene gives a notion of similarity, and fullerenes are partitioned into similarity classes. In this expository article, we illustrate how the values of two important fullerene parameters can be deduced for all fullerenes in a similarity class by computing the values of these parameters for just the three smallest representatives of that class. In addition, it turns out that there is a natural duality theory for similarity classes of fullerenes based on one of the most important fullerene construction techniques: leapfrog construction. The literature on fullerenes is very extensive, and since this is a general interest journal, we will summarize and illustrate the fundamental results that we will need to develop similarity and this duality.

  15. Criteria for dynamic similarity in bouncing gaits.

    Bullimore, Sharon R; Donelan, J Maxwell

    2008-01-21

    Animals of different sizes tend to move in a dynamically similar manner when travelling at speeds corresponding to equal values of a dimensionless parameter (DP) called the Froude number. Consequently, the Froude number has been widely used for defining equivalent speeds and predicting speeds of locomotion by extinct species and on other planets. However, experiments using simulated reduced gravity have demonstrated that equality of the Froude number does not guarantee dynamic similarity. This has cast doubt upon the usefulness of the Froude number in locomotion research. Here we use dimensional analysis of the planar spring-mass model, combined with Buckingham's Pi-Theorem, to demonstrate that four DPs must be equal for dynamic similarity in bouncing gaits such as trotting, hopping and bipedal running. This can be reduced to three DPs by applying the constraint of maintaining a constant average speed of locomotion. Sensitivity analysis indicates that all of these DPs are important for predicting dynamic similarity. We show that the reason humans do not run in a dynamically similar manner at equal Froude number in different levels of simulated reduced gravity is that dimensionless leg stiffness decreases as gravity increases. The reason that the Froude number can predict dynamic similarity in Earth gravity is that dimensionless leg stiffness and dimensionless vertical landing speed are both independent of size. In conclusion, although equal Froude number is not sufficient for dynamic similarity, it is a necessary condition. Therefore, to detect fundamental differences in locomotion, animals of different sizes should be compared at equal Froude number, so that they can be as close to dynamic similarity as possible. More generally, the concept of dynamic similarity provides a powerful framework within which similarities and differences in locomotion can be interpreted. PMID:17983630

  16. Ontology-based prior art search

    Bondarenok, A.

    2003-01-01

    This article describes a method of prior art document search based on semantic similarities of a user query and indexed documents. The ontology-based technology of knowledge extraction and representation is used to build document and query images, which are compared using the semantic similarity technique. Documents are ranked according to their semantic similarities to the query, and the top results are shown to the user.

  17. Measure of Node Similarity in Multilayer Networks.

    Mollgaard, Anders; Zettler, Ingo; Dammeyer, Jesper; Jensen, Mogens H; Lehmann, Sune; Mathiesen, Joachim

    2016-01-01

    The weight of links in a network is often related to the similarity of the nodes. Here, we introduce a simple tunable measure for analysing the similarity of nodes across different link weights. In particular, we use the measure to analyze homophily in a group of 659 freshman students at a large university. Our analysis is based on data obtained using smartphones equipped with custom data collection software, complemented by questionnaire-based data. The network of social contacts is represented as a weighted multilayer network constructed from different channels of telecommunication as well as data on face-to-face contacts. We find that even strongly connected individuals are not more similar with respect to basic personality traits than randomly chosen pairs of individuals. In contrast, several socio-demographics variables have a significant degree of similarity. We further observe that similarity might be present in one layer of the multilayer network and simultaneously be absent in the other layers. For a variable such as gender, our measure reveals a transition from similarity between nodes connected with links of relatively low weight to dis-similarity for the nodes connected by the strongest links. We finally analyze the overlap between layers in the network for different levels of acquaintanceships. PMID:27300084

  18. Efficient Privacy Preserving Protocols for Similarity Join

    Bilal Hawashin

    2012-04-01

    Full Text Available During the similarity join process, one or more sources may not allow sharing its data with other sources. In this case, a privacy preserving similarity join is required. We showed in our previous work [4] that using long attributes, such as paper abstracts, movie summaries, product descriptions, and user feedbacks, could improve the similarity join accuracy using supervised learning. However, the existing secure protocols for similarity join methods can not be used to join sources using these long attributes. Moreover, the majority of the existing privacy‐preserving protocols do not consider the semantic similarities during the similarity join process. In this paper, we introduce a secure efficient protocol to semantically join sources when the join attributes are long attributes. We provide two secure protocols for both scenarios when a training set exists and when there is no available training set. Furthermore, we introduced the multi‐label supervised secure protocol and the expandable supervised secure protocol. Results show that our protocols can efficiently join sources using the long attributes by considering the semantic relationships among the long string values. Therefore, it improves the overall secure similarity join performance.

  19. The baryonic self similarity of dark matter

    Alard, C., E-mail: alard@iap.fr [Institut d' Astrophysique de Paris, 98bis boulevard Arago, F-75014 Paris (France)

    2014-06-20

    The cosmological simulations indicates that dark matter halos have specific self-similar properties. However, the halo similarity is affected by the baryonic feedback. By using momentum-driven winds as a model to represent the baryon feedback, an equilibrium condition is derived which directly implies the emergence of a new type of similarity. The new self-similar solution has constant acceleration at a reference radius for both dark matter and baryons. This model receives strong support from the observations of galaxies. The new self-similar properties imply that the total acceleration at larger distances is scale-free, the transition between the dark matter and baryons dominated regime occurs at a constant acceleration, and the maximum amplitude of the velocity curve at larger distances is proportional to M {sup 1/4}. These results demonstrate that this self-similar model is consistent with the basics of modified Newtonian dynamics (MOND) phenomenology. In agreement with the observations, the coincidence between the self-similar model and MOND breaks at the scale of clusters of galaxies. Some numerical experiments show that the behavior of the density near the origin is closely approximated by a Einasto profile.

  20. Accurate calculation of (31)P NMR chemical shifts in polyoxometalates.

    Pascual-Borràs, Magda; López, Xavier; Poblet, Josep M

    2015-04-14

    We search for the best density functional theory strategy for the determination of (31)P nuclear magnetic resonance (NMR) chemical shifts, δ((31)P), in polyoxometalates. Among the variables governing the quality of the quantum modelling, we tackle herein the influence of the functional and the basis set. The spin-orbit and solvent effects were routinely included. To do so we analysed the family of structures α-[P2W18-xMxO62](n-) with M = Mo(VI), V(V) or Nb(V); [P2W17O62(M'R)](n-) with M' = Sn(IV), Ge(IV) and Ru(II) and [PW12-xMxO40](n-) with M = Pd(IV), Nb(V) and Ti(IV). The main results suggest that, to date, the best procedure for the accurate calculation of δ((31)P) in polyoxometalates is the combination of TZP/PBE//TZ2P/OPBE (for NMR//optimization step). The hybrid functionals (PBE0, B3LYP) tested herein were applied to the NMR step, besides being more CPU-consuming, do not outperform pure GGA functionals. Although previous studies on (183)W NMR suggested that the use of very large basis sets like QZ4P were needed for geometry optimization, the present results indicate that TZ2P suffices if the functional is optimal. Moreover, scaling corrections were applied to the results providing low mean absolute errors below 1 ppm for δ((31)P), which is a step forward in order to confirm or predict chemical shifts in polyoxometalates. Finally, via a simplified molecular model, we establish how the small variations in δ((31)P) arise from energy changes in the occupied and virtual orbitals of the PO4 group. PMID:25738630

  1. Similarity-based pattern analysis and recognition

    Pelillo, Marcello

    2013-01-01

    This accessible text/reference presents a coherent overview of the emerging field of non-Euclidean similarity learning. The book presents a broad range of perspectives on similarity-based pattern analysis and recognition methods, from purely theoretical challenges to practical, real-world applications. The coverage includes both supervised and unsupervised learning paradigms, as well as generative and discriminative models. Topics and features: explores the origination and causes of non-Euclidean (dis)similarity measures, and how they influence the performance of traditional classification alg

  2. Collaborative Search Trails for Video Search

    Hopfgartner, Frank; Vallet, David; Halvey, Martin; Jose, Joemon

    2009-01-01

    In this paper we present an approach for supporting users in the difficult task of searching for video. We use collaborative feedback mined from the interactions of earlier users of a video search system to help users in their current search tasks. Our objective is to improve the quality of the results that users find, and in doing so also assist users to explore a large and complex information space. It is hoped that this will lead to them considering search options that they may not have co...

  3. Searching and Indexing Genomic Databases via Kernelization

    Travis eGagie

    2015-02-01

    Full Text Available The rapid advance of DNA sequencing technologies has yielded databases of thousands of genomes. To search and index these databases effectively, it is important that we take advantage of the similarity between those genomes. Several authors have recently suggested searching or indexing only one reference genome and the parts of the other genomes where they differ. In this paper we survey the twenty-year history of this idea and discuss its relation to kernelization in parameterized complexity.

  4. Searching and Indexing Genomic Databases via Kernelization

    Gagie, Travis; Puglisi, Simon J.

    2015-01-01

    The rapid advance of DNA sequencing technologies has yielded databases of thousands of genomes. To search and index these databases effectively, it is important that we take advantage of the similarity between those genomes. Several authors have recently suggested searching or indexing only one reference genome and the parts of the other genomes where they differ. In this paper, we survey the 20-year history of this idea and discuss its relation to kernelization in parameterized complexity. PMID:25710001

  5. Using the Dual-Target Cost to Explore the Nature of Search Target Representations

    Stroud, Michael J.; Menneer, Tamaryn; Cave, Kyle R.; Donnelly, Nick

    2012-01-01

    Eye movements were monitored to examine search efficiency and infer how color is mentally represented to guide search for multiple targets. Observers located a single color target very efficiently by fixating colors similar to the target. However, simultaneous search for 2 colors produced a dual-target cost. In addition, as the similarity between…

  6. The search for the missing elements

    The theory behind the Periodic Table of elements predicts that some ''super heavy'' elements should exist with atomic number 114. These would have stable spherical nuclear shells completely filled with protons and neutrons. This article describes the search, at various laboratories, to discover and isolate these superheavy elements which occur at the so-called Magic Island of stability. The search has proved harder than predicted. Even though the elements' half-lives are too short for them to exist in nature, more sensitive detectors may be able to reveal their existence in the future. Future work will focus on detecting spontaneous fission fragments more efficiently and accurately measuring their atomic numbers. (UK)

  7. Interpersonal attraction and personality: what is attractive--self similarity, ideal similarity, complementarity or attachment security?

    Klohnen, Eva C; Luo, Shanhong

    2003-10-01

    Little is known about whether personality characteristics influence initial attraction. Because adult attachment differences influence a broad range of relationship processes, the authors examined their role in 3 experimental attraction studies. The authors tested four major attraction hypotheses--self similarity, ideal-self similarity, complementarity, and attachment security--and examined both actual and perceptual factors. Replicated analyses across samples, designs, and manipulations showed that actual security and self similarity predicted attraction. With regard to perceptual factors, ideal similarity, self similarity, and security all were significant predictors. Whereas perceptual ideal and self similarity had incremental predictive power, perceptual security's effects were subsumed by perceptual ideal similarity. Perceptual self similarity fully mediated actual attachment similarity effects, whereas ideal similarity was only a partial mediator. PMID:14561124

  8. Discovering Music Structure via Similarity Fusion

    Arenas-García, Jerónimo; Parrado-Hernandez, Emilio; Meng, Anders;

    Automatic methods for music navigation and music recommendation exploit the structure in the music to carry out a meaningful exploration of the “song space”. To get a satisfactory performance from such systems, one should incorporate as much information about songs similarity as possible; however...... semantics”, in such a way that all observed similarities can be satisfactorily explained using the latent semantics. Therefore, one can think of these semantics as the real structure in music, in the sense that they can explain the observed similarities among songs. The suitability of the PLSA model for...... representing music structure is studied in a simplified scenario consisting of 4412 songs and two similarity measures among them. The results suggest that the PLSA model is a useful framework to combine different sources of information, and provides a reasonable space for song representation....

  9. Distance and Similarity Measures for Soft Sets

    Kharal, Athar

    2010-01-01

    In [P. Majumdar, S. K. Samanta, Similarity measure of soft sets, New Mathematics and Natural Computation 4(1)(2008) 1-12], the authors use matrix representation based distances of soft sets to introduce matching function and distance based similarity measures. We first give counterexamples to show that their Definition 2.7 and Lemma 3.5(3) contain errors, then improve their Lemma 4.4 making it a corllary of our result. The fundamental assumption of Majumdar et al has been shown to be flawed. This motivates us to introduce set operations based measures. We present a case (Example 28) where Majumdar-Samanta similarity measure produces an erroneous result but the measure proposed herein decides correctly. Several properties of the new measures have been presented and finally the new similarity measures have been applied to the problem of financial diagnosis of firms.

  10. HYPOTHESIS TESTING WITH THE SIMILARITY INDEX

    Mulltilocus DNA fingerprinting methods have been used extensively to address genetic issues in wildlife populations. Hypotheses concerning population subdivision and differing levels of diversity can be addressed through the use of the similarity index (S), a band-sharing coeffic...

  11. Bilateral Trade Flows and Income Distribution Similarity

    2016-01-01

    Current models of bilateral trade neglect the effects of income distribution. This paper addresses the issue by accounting for non-homothetic consumer preferences and hence investigating the role of income distribution in the context of the gravity model of trade. A theoretically justified gravity model is estimated for disaggregated trade data (Dollar volume is used as dependent variable) using a sample of 104 exporters and 108 importers for 1980–2003 to achieve two main goals. We define and calculate new measures of income distribution similarity and empirically confirm that greater similarity of income distribution between countries implies more trade. Using distribution-based measures as a proxy for demand similarities in gravity models, we find consistent and robust support for the hypothesis that countries with more similar income-distributions trade more with each other. The hypothesis is also confirmed at disaggregated level for differentiated product categories. PMID:27137462

  12. Correlation between social proximity and mobility similarity

    Fan, Chao; Huang, Junming; Rong, Zhihai; Zhou, Tao

    2016-01-01

    Human behaviors exhibit ubiquitous correlations in many aspects, such as individual and collective levels, temporal and spatial dimensions, content, social and geographical layers. With rich Internet data of online behaviors becoming available, it attracts academic interests to explore human mobility similarity from the perspective of social network proximity. Existent analysis shows a strong correlation between online social proximity and offline mobility similari- ty, namely, mobile records between friends are significantly more similar than between strangers, and those between friends with common neighbors are even more similar. We argue the importance of the number and diversity of com- mon friends, with a counter intuitive finding that the number of common friends has no positive impact on mobility similarity while the diversity plays a key role, disagreeing with previous studies. Our analysis provides a novel view for better understanding the coupling between human online and offline behaviors, and will...

  13. Learning content similarity for music recommendation

    McFee, Brian; Lanckriet, Gert

    2011-01-01

    Many tasks in music information retrieval, such as recommendation, and playlist generation for online radio, fall naturally into the query-by-example setting, wherein a user queries the system by providing a song, and the system responds with a list of relevant or similar song recommendations. Such applications ultimately depend on the notion of similarity between items to produce high-quality results. Current state-of-the-art systems employ collaborative filter methods to represent musical items, effectively comparing items in terms of their constituent users. While collaborative filter techniques perform well when historical data is available for each item, their reliance on historical data impedes performance on novel or unpopular items. To combat this problem, practitioners rely on content-based similarity, which naturally extends to novel items, but is typically out-performed by collaborative filter methods. In this article, we propose a method for optimizing contentbased similarity by learning from a sa...

  14. Bilateral Trade Flows and Income Distribution Similarity.

    Martínez-Zarzoso, Inmaculada; Vollmer, Sebastian

    2016-01-01

    Current models of bilateral trade neglect the effects of income distribution. This paper addresses the issue by accounting for non-homothetic consumer preferences and hence investigating the role of income distribution in the context of the gravity model of trade. A theoretically justified gravity model is estimated for disaggregated trade data (Dollar volume is used as dependent variable) using a sample of 104 exporters and 108 importers for 1980-2003 to achieve two main goals. We define and calculate new measures of income distribution similarity and empirically confirm that greater similarity of income distribution between countries implies more trade. Using distribution-based measures as a proxy for demand similarities in gravity models, we find consistent and robust support for the hypothesis that countries with more similar income-distributions trade more with each other. The hypothesis is also confirmed at disaggregated level for differentiated product categories. PMID:27137462

  15. Distances and similarities in intuitionistic fuzzy sets

    Szmidt, Eulalia

    2014-01-01

    This book presents the state-of-the-art in theory and practice regarding similarity and distance measures for intuitionistic fuzzy sets. Quantifying similarity and distances is crucial for many applications, e.g. data mining, machine learning, decision making, and control. The work provides readers with a comprehensive set of theoretical concepts and practical tools for both defining and determining similarity between intuitionistic fuzzy sets. It describes an automatic algorithm for deriving intuitionistic fuzzy sets from data, which can aid in the analysis of information in large databases. The book also discusses other important applications, e.g. the use of similarity measures to evaluate the extent of agreement between experts in the context of decision making.

  16. Interpersonal Congruency, Attitude Similarity, and Interpersonal Attraction

    Touhey, John C.

    1975-01-01

    As no experimental study has examined the effects of congruency on attraction, the present investigation orthogonally varied attitude similarity and interpersonal congruency in order to compare the two independent variables as determinants of interpersonal attraction. (Author/RK)

  17. Media segmentation using self-similarity decomposition

    Foote, Jonathan T.; Cooper, Matthew L.

    2003-01-01

    We present a framework for analyzing the structure of digital media streams. Though our methods work for video, text, and audio, we concentrate on detecting the structure of digital music files. In the first step, spectral data is used to construct a similarity matrix calculated from inter-frame spectral similarity.The digital audio can be robustly segmented by correlating a kernel along the diagonal of the similarity matrix. Once segmented, spectral statistics of each segment are computed. In the second step,segments are clustered based on the self-similarity of their statistics. This reveals the structure of the digital music in a set of segment boundaries and labels. Finally, the music is summarized by selecting clusters with repeated segments throughout the piece. The summaries can be customized for various applications based on the structure of the original music.

  18. Two-way migration between similar countries

    Kreickemeier, Udo; Wrona, Jens

    2011-01-01

    We develop a model to explain two-way migration of high-skilled individuals between countries that are similar in their economic characteristics. High-skilled migration is explained by a combination of two features: In both countries there is a continuum of workers with differing abilities, which are private knowledge, and the production technology gives incentives to firms for hiring workers of similar ability. In the presence on migration cost, high-skilled workers self-select into the grou...

  19. Privacy-preserving matching of similar patients.

    Vatsalan, Dinusha; Christen, Peter

    2016-02-01

    The identification of similar entities represented by records in different databases has drawn considerable attention in many application areas, including in the health domain. One important type of entity matching application that is vital for quality healthcare analytics is the identification of similar patients, known as similar patient matching. A key component of identifying similar records is the calculation of similarity of the values in attributes (fields) between these records. Due to increasing privacy and confidentiality concerns, using the actual attribute values of patient records to identify similar records across different organizations is becoming non-trivial because the attributes in such records often contain highly sensitive information such as personal and medical details of patients. Therefore, the matching needs to be based on masked (encoded) values while being effective and efficient to allow matching of large databases. Bloom filter encoding has widely been used as an efficient masking technique for privacy-preserving matching of string and categorical values. However, no work on Bloom filter-based masking of numerical data, such as integer (e.g. age), floating point (e.g. body mass index), and modulus (numbers wrap around upon reaching a certain value, e.g. date and time), which are commonly required in the health domain, has been presented in the literature. We propose a framework with novel methods for masking numerical data using Bloom filters, thereby facilitating the calculation of similarities between records. We conduct an empirical study on publicly available real-world datasets which shows that our framework provides efficient masking and achieves similar matching accuracy compared to the matching of actual unencoded patient records. PMID:26707453

  20. Limit theorems for self-similar tilings

    Bufetov, Alexander I

    2012-01-01

    We study deviation of ergodic averages for dynamical systems given by self-similar tilings on the plane and in higher dimensions. The main object of our paper is a special family of finitely-additive measures for our systems. An asymptotic formula is given for ergodic integrals in terms of these finitely-additive measures, and, as a corollary, limit theorems are obtained for dynamical systems given by self-similar tilings.

  1. Some more similarities between Peirce and Skinner

    Moxley, Roy A

    2002-01-01

    C. S. Peirce is noted for pioneering a variety of views, and the case is made here for the similarities and parallels between his views and B. F. Skinner's radical behaviorism. In addition to parallels previously noted, these similarities include an advancement of experimental science, a behavioral psychology, a shift from nominalism to realism, an opposition to positivism, a selectionist account for strengthening behavior, the importance of a community of selves, a recursive approach to meth...

  2. Computing Similarity Dependencies with Pattern Structures

    Baixeries, Jaume; Kaytoue, Mehdi; Napoli, Amedeo

    2013-01-01

    Functional dependencies provide valuable knowledge on the relations between the attributes of a data table. To extend their use, generalizations have been proposed, among which purity and approximate dependencies. After discussing those generalizations, we provide an alternative definition, the similarity dependencies, to handle a similarity relation between data-values, hence un-crisping the basic definition of functional dependencies. This work is rooted in formal concept analysis, and we s...

  3. A probabilistic approach to melodic similarity

    Bernabeu Briones, José Francisco; CALERA RUBIO, JORGE; Iñesta Quereda, José Manuel; Rizo Valero, David

    2009-01-01

    Melodic similarity is an important research topic in music information retrieval. The representation of symbolic music by means of trees has proven to be suitable in melodic similarity computation, because they are able to code rhythm in their structure leaving only pitch representations as a degree of freedom for coding. In order to compare trees, different edit distances have been previously used. In this paper, stochastic k-testable tree-models, formerly used in other domains like structur...

  4. Learning semantic similarity for very short texts

    De Boom, Cedric; Van Canneyt, Steven; Bohez, Steven; Demeester, Thomas; Dhoedt, Bart

    2015-01-01

    Levering data on social media, such as Twitter and Facebook, requires information retrieval algorithms to become able to relate very short text fragments to each other. Traditional text similarity methods such as tf-idf cosine-similarity, based on word overlap, mostly fail to produce good results in this case, since word overlap is little or non-existent. Recently, distributed word representations, or word embeddings, have been shown to successfully allow words to match on the semantic level....

  5. Similarity Theory of Withdrawn Water Temperature Experiment

    Yunpeng Han; Xueping Gao

    2015-01-01

    Selective withdrawal from a thermal stratified reservoir has been widely utilized in managing reservoir water withdrawal. Besides theoretical analysis and numerical simulation, model test was also necessary in studying the temperature of withdrawn water. However, information on the similarity theory of the withdrawn water temperature model remains lacking. Considering flow features of selective withdrawal, the similarity theory of the withdrawn water temperature model was analyzed theoretical...

  6. SIMILARITY NETWORK FOR SEMANTIC WEB SERVICES SUBSTITUTION

    Cherifi, Chantal

    2013-01-01

    Web services substitution is one of the most challenging tasks for automating the composition process of multiple Web services. It aims to improve performances and to deal efficiently with Web services failures. Many existing solutions have approached the problem through classification of substitutable Web services. To go a step further, we propose in this paper a network based approach where nodes are Web services operations and links join similar operations. Four similarity measures based o...

  7. Scalable Similarity Matching in Streaming Time Series

    Marascu, Alice; Ali Khan, Suleiman; Palpanas, Themis

    2011-01-01

    Nowadays online monitoring of data streams is essential in many real life applications, like sensor network monitoring, manufacturing process control, and video surveillance. One major problem in this area is the online identification of streaming sequences similar to a predefined set of pattern-sequences. In this paper, we present a novel solution that extends the state of the art both in terms of effectiveness and efficiency. We propose the first online similarity matching algorithm based o...

  8. Automatic Planning of External Search Engine Optimization

    Vita Jasevičiūtė

    2015-07-01

    Full Text Available This paper describes an investigation of the external search engine optimization (SEO action planning tool, dedicated to automatically extract a small set of most important keywords for each month during whole year period. The keywords in the set are extracted accordingly to external measured parameters, such as average number of searches during the year and for every month individually. Additionally the position of the optimized web site for each keyword is taken into account. The generated optimization plan is similar to the optimization plans prepared manually by the SEO professionals and can be successfully used as a support tool for web site search engine optimization.

  9. Combining local and global visual feature similarity using a text search engine

    Amato, Giuseppe; Bolettieri, Paolo; Falchi, Fabrizio; Gennaro, Claudio; Rabitti, Fausto

    2011-01-01

    In this paper we propose a novel approach that allows processing image content based queries expressed as arbitrary combinations of local and global visual features, by using a single index realized as an inverted file. The index was implemented on top of the Lucene retrieval engine. This is particularly useful to allow people to efficiently and interactively check the quality of the retrieval result by exploiting combinations of features, by using a single index realized as an inverted file....

  10. NFFinder: an online bioinformatics tool for searching similar transcriptomics experiments in the context of drug repositioning.

    Setoain, Javier; Franch, Mònica; Martínez, Marta; Tabas-Madrid, Daniel; Sorzano, Carlos O S; Bakker, Annette; Gonzalez-Couto, Eduardo; Elvira, Juan; Pascual-Montano, Alberto

    2015-07-01

    Drug repositioning, using known drugs for treating conditions different from those the drug was originally designed to treat, is an important drug discovery tool that allows for a faster and cheaper development process by using drugs that are already approved or in an advanced trial stage for another purpose. This is especially relevant for orphan diseases because they affect too few people to make drug research de novo economically viable. In this paper we present NFFinder, a bioinformatics tool for identifying potential useful drugs in the context of orphan diseases. NFFinder uses transcriptomic data to find relationships between drugs, diseases and a phenotype of interest, as well as identifying experts having published on that domain. The application shows in a dashboard a series of graphics and tables designed to help researchers formulate repositioning hypotheses and identify potential biological relationships between drugs and diseases. NFFinder is freely available at http://nffinder.cnb.csic.es. PMID:25940629

  11. Similarity search and mining in uncertain spatial and spatio-temporal databases

    Züfle, Andreas

    2013-01-01

    Both the current trends in technology such as smart phones, general mobile devices, stationary sensors and satellites as well as a new user mentality of utilizing this technology to voluntarily share information produce a huge flood of geo-spatial and geo-spatio-temporal data. This data flood provides a tremendous potential of discovering new and possibly useful knowledge. In addition to the fact that measurements are imprecise, due to the physical limitation of the devices, some form of inte...

  12. List of clustered permutations in secondary memory for proximity searching

    Roggero, Patricia; Reyes, Nora Susana; Figueroa, Karina; Paredes, Rodrigo

    2015-01-01

    Similarity search is a difficult problem and various indexing schemas have been defined to process similarity queries efficiently in many applications, including multimedia databases and other repositories handling complex objects. Metric indices support efficient similarity searches, but most of them are designed for main memory. Thus, they can handle only small datasets, suffering serious performance degradations when the objects reside on disk. Most reallife database applications require i...

  13. Collaborative Personalized Web Recommender System using Entropy based Similarity Measure

    Mehta, Harita; Bedi, Punam; Dixit, V S

    2012-01-01

    On the internet, web surfers, in the search of information, always strive for recommendations. The solutions for generating recommendations become more difficult because of exponential increase in information domain day by day. In this paper, we have calculated entropy based similarity between users to achieve solution for scalability problem. Using this concept, we have implemented an online user based collaborative web recommender system. In this model based collaborative system, the user session is divided into two levels. Entropy is calculated at both the levels. It is shown that from the set of valuable recommenders obtained at level I; only those recommenders having lower entropy at level II than entropy at level I, served as trustworthy recommenders. Finally, top N recommendations are generated from such trustworthy recommenders for an online user.

  14. Search for neutral leptons

    At present we know of three kinds of neutral leptons: the electron neutrino, the muon neutrino, and the tau neutrino. This paper reviews the search for additional neutral leptons. The method and significance of a search depends upon the model used for the neutral lepton being sought. Some models for the properties and decay modes of proposed neutral leptons are described. Past and present searches are reviewed. The limits obtained by some completed searches are given, and the methods of searches in progress are described. Future searches are discussed. 41 references

  15. Genetic algorithms as global random search methods

    Peck, Charles C.; Dhawan, Atam P.

    1995-01-01

    Genetic algorithm behavior is described in terms of the construction and evolution of the sampling distributions over the space of candidate solutions. This novel perspective is motivated by analysis indicating that the schema theory is inadequate for completely and properly explaining genetic algorithm behavior. Based on the proposed theory, it is argued that the similarities of candidate solutions should be exploited directly, rather than encoding candidate solutions and then exploiting their similarities. Proportional selection is characterized as a global search operator, and recombination is characterized as the search process that exploits similarities. Sequential algorithms and many deletion methods are also analyzed. It is shown that by properly constraining the search breadth of recombination operators, convergence of genetic algorithms to a global optimum can be ensured.

  16. Efficient and Accurate WLAN Positioning with Weighted Graphs

    Hansen, René; Thomsen, Bent

    This paper concerns indoor location determination by using existing WLAN infrastructures and WLAN enabled mobile devices. The location fingerprinting technique performs localization by first constructing a radio map of signal strengths from nearby access points. The radio map is subsequently searched using a classification algorithm to determine a location estimate. This paper addresses two distinct challenges of location fingerprinting incurred by positioning moving users. Firstly, movement affects the positioning accuracy negatively due to increased signal strength fluctuations. Secondly, tracking moving users requires a low-latency overhead which translates into efficient computations to be done on a mobile device with limited capabilities. We present a technique to simultaneously improve the positioning accuracy and computational efficiency. The technique utilizes a weighted graph model of the indoor environment to improve positioning accuracy and computational efficiency by only considering the subset of locations in the radio map that are feasible to reach from a previously estimated position. The technique is general and can be used on top of any existing location system. Our results indicate that we are able to achieve similar dynamic localization accuracy to static localization. Effectively, we are able to counter the adverse effects of added signal fluctuations caused by movement. However, as some of our experiments testify, any location system is fundamentally constrained by the underlying environment. We give pointers to research which allows such problems to be detected early and thereby avoided before deploying a system.

  17. Similarity of Symbol Frequency Distributions with Heavy Tails

    Gerlach, Martin; Font-Clos, Francesc; Altmann, Eduardo G.

    2016-04-01

    Quantifying the similarity between symbolic sequences is a traditional problem in information theory which requires comparing the frequencies of symbols in different sequences. In numerous modern applications, ranging from DNA over music to texts, the distribution of symbol frequencies is characterized by heavy-tailed distributions (e.g., Zipf's law). The large number of low-frequency symbols in these distributions poses major difficulties to the estimation of the similarity between sequences; e.g., they hinder an accurate finite-size estimation of entropies. Here, we show analytically how the systematic (bias) and statistical (fluctuations) errors in these estimations depend on the sample size N and on the exponent γ of the heavy-tailed distribution. Our results are valid for the Shannon entropy (α =1 ), its corresponding similarity measures (e.g., the Jensen-Shanon divergence), and also for measures based on the generalized entropy of order α . For small α 's, including α =1 , the errors decay slower than the 1 /N decay observed in short-tailed distributions. For α larger than a critical value α*=1 +1 /γ ≤2 , the 1 /N decay is recovered. We show the practical significance of our results by quantifying the evolution of the English language over the last two centuries using a complete α spectrum of measures. We find that frequent words change more slowly than less frequent words and that α =2 provides the most robust measure to quantify language change.

  18. On the similarity of symbol frequency distributions with heavy tails

    Gerlach, Martin; Altmann, Eduardo G

    2015-01-01

    Quantifying the similarity between symbolic sequences is a traditional problem in Information Theory which requires comparing the frequencies of symbols in different sequences. In numerous modern applications, ranging from DNA over music to texts, the distribution of symbol frequencies is characterized by heavy-tailed distributions (e.g., Zipf's law). The large number of low-frequency symbols in these distributions poses major difficulties to the estimation of the similarity between sequences, e.g., they hinder an accurate finite-size estimation of entropies. Here we show how the accuracy of estimations depend on the sample size~$N$, not only for the Shannon entropy $(\\alpha=1)$ and its corresponding similarity measures (e.g., the Jensen-Shanon divergence) but also for measures based on the generalized entropy of order $\\alpha$. For small $\\alpha$'s, including $\\alpha=1$, the bias and fluctuations in the estimations decay slower than the $1/N$ decay observed in short-tailed distributions. For $\\alpha$ larger ...

  19. Similar Words Identification Using Naive and TF-IDF Method

    Divya K.S.

    2014-10-01

    Full Text Available Requirement satisfaction is one of the most important factors to success of software. All the requirements that are specified by the customer should be satisfied in every phase of the development of the software. Satisfaction assessment is the determination of whether each component of the requirement has been addressed in the design document. The objective of this paper is to implement two methods to identify the satisfied requirements in the design document. To identify the satisfied requirements, similar words in both of the documents are determined. The methods such as Naive satisfaction assessment and TF-IDF satisfaction assessment are performed to determine the similar words that are present in the requirements document and design documents. The two methods are evaluated on the basis of the precision and recall value. To perform the stemming, the Porter’s stemming algorithm is used. The satisfaction assessment methods would determine the similarity in the requirement and design documents. The final result would give a accurate picture of the requirement satisfaction so that the defects can be determined at the early stage of software development. Since the defects determines at the early stage, the cost would be low to correct the defects.

  20. Identifying mechanistic similarities in drug responses

    Zhao, C.

    2012-05-15

    Motivation: In early drug development, it would be beneficial to be able to identify those dynamic patterns of gene response that indicate that drugs targeting a particular gene will be likely or not to elicit the desired response. One approach would be to quantitate the degree of similarity between the responses that cells show when exposed to drugs, so that consistencies in the regulation of cellular response processes that produce success or failure can be more readily identified.Results: We track drug response using fluorescent proteins as transcription activity reporters. Our basic assumption is that drugs inducing very similar alteration in transcriptional regulation will produce similar temporal trajectories on many of the reporter proteins and hence be identified as having similarities in their mechanisms of action (MOA). The main body of this work is devoted to characterizing similarity in temporal trajectories/signals. To do so, we must first identify the key points that determine mechanistic similarity between two drug responses. Directly comparing points on the two signals is unrealistic, as it cannot handle delays and speed variations on the time axis. Hence, to capture the similarities between reporter responses, we develop an alignment algorithm that is robust to noise, time delays and is able to find all the contiguous parts of signals centered about a core alignment (reflecting a core mechanism in drug response). Applying the proposed algorithm to a range of real drug experiments shows that the result agrees well with the prior drug MOA knowledge. © The Author 2012. Published by Oxford University Press. All rights reserved.

  1. Semantic Search among Heterogeneous Biological Databases Based on Gene Ontology

    Shun-Liang CAO; Lei QIN; Wei-Zhong HE; Yang ZHONG; Yang-Yong ZHU; Yi-Xue LI

    2004-01-01

    Semantic search is a key issue in integration of heterogeneous biological databases. In thispaper, we present a methodology for implementing semantic search in BioDW, an integrated biological datawarehouse. Two tables are presented: the DB2GO table to correlate Gene Ontology (GO) annotated entriesfrom BioDW data sources with GO, and the semantic similarity table to record similarity scores derived fromany pair of GO terms. Based on the two tables, multifarious ways for semantic search are provided and thecorresponding entries in heterogeneous biological databases in semantic terms can be expediently searched.

  2. GeoSearch: A lightweight broking middleware for geospatial resources discovery

    Gui, Z.; Yang, C.; Liu, K.; Xia, J.

    2012-12-01

    With petabytes of geodata, thousands of geospatial web services available over the Internet, it is critical to support geoscience research and applications by finding the best-fit geospatial resources from the massive and heterogeneous resources. Past decades' developments witnessed the operation of many service components to facilitate geospatial resource management and discovery. However, efficient and accurate geospatial resource discovery is still a big challenge due to the following reasons: 1)The entry barriers (also called "learning curves") hinder the usability of discovery services to end users. Different portals and catalogues always adopt various access protocols, metadata formats and GUI styles to organize, present and publish metadata. It is hard for end users to learn all these technical details and differences. 2)The cost for federating heterogeneous services is high. To provide sufficient resources and facilitate data discovery, many registries adopt periodic harvesting mechanism to retrieve metadata from other federated catalogues. These time-consuming processes lead to network and storage burdens, data redundancy, and also the overhead of maintaining data consistency. 3)The heterogeneous semantics issues in data discovery. Since the keyword matching is still the primary search method in many operational discovery services, the search accuracy (precision and recall) is hard to guarantee. Semantic technologies (such as semantic reasoning and similarity evaluation) offer a solution to solve these issues. However, integrating semantic technologies with existing service is challenging due to the expandability limitations on the service frameworks and metadata templates. 4)The capabilities to help users make final selection are inadequate. Most of the existing search portals lack intuitive and diverse information visualization methods and functions (sort, filter) to present, explore and analyze search results. Furthermore, the presentation of the value

  3. TOWARDS MORE ACCURATE CLUSTERING METHOD BY USING DYNAMIC TIME WARPING

    Khadoudja Ghanem

    2013-03-01

    Full Text Available An intrinsic problem of classifiers based on machine learning (ML methods is that their learning time grows as the size and complexity of the training dataset increases. For this reason, it is important to have efficient computational methods and algorithms that can be applied on large datasets, such that it is still possible to complete the machine learning tasks in reasonable time. In this context, we present in this paper a more accurate simple process to speed up ML methods. An unsupervised clustering algorithm is combined with Expectation, Maximization (EM algorithm to develop an efficient Hidden Markov Model (HMM training. The idea of the proposed process consists of two steps. In the first step, training instances with similar inputs are clustered and a weight factor which represents the frequency of these instances is assigned to each representative cluster. Dynamic Time Warping technique is used as a dissimilarity function to cluster similar examples. In the second step, all formulas in the classical HMM training algorithm (EM associated with the number of training instances are modified to include the weight factor in appropriate terms. This process significantly accelerates HMM training while maintaining the same initial, transition and emission probabilities matrixes as those obtained with the classical HMM training algorithm. Accordingly, the classification accuracy is preserved. Depending on the size of the training set, speedups of up to 2200 times is possible when the size is about 100.000 instances. The proposed approach is not limited to training HMMs, but it can be employed for a large variety of MLs methods.

  4. Towards More Accurate Clutering Method by Using Dynamic Time Warping

    Khadoudja Ghanem

    2013-04-01

    Full Text Available An intrinsic problem of classifiers based on machine learning (ML methods is that their learning timegrows as the size and complexity of the training dataset increases. For this reason, it is important to have efficient computational methods and algorithms that can be applied on large datasets, such that it is still possible to complete the machine learning tasks in reasonable time. In this context, we present in this paper a more accurate simple process to speed up ML methods. An unsupervised clustering algorithm is combined with Expectation, Maximization (EM algorithm to develop an efficient Hidden Markov Model (HMM training. The idea of the proposed process consists of two steps. In the first step, training instances with similar inputs are clustered and a weight factor which represents the frequency of these instances is assigned to each representative cluster. Dynamic Time Warping technique is used as a dissimilarity function to cluster similar examples. In the second step, all formulas in the classical HMM training algorithm (EM associated with the number of training instances are modified to include the weight factor in appropriate terms. This process significantly accelerates HMM training while maintaining the same initial, transition and emission probabilities matrixes as those obtained with the classical HMM training algorithm. Accordingly, the classification accuracy is preserved. Depending on the size of the training set, speedups of up to 2200 times is possible when the size is about 100.000 instances. The proposed approach is not limited to training HMMs, but it can be employed for a large variety of MLs methods

  5. Similarity Metrics for Closed Loop Dynamic Systems

    Whorton, Mark S.; Yang, Lee C.; Bedrossian, Naz; Hall, Robert A.

    2008-01-01

    To what extent and in what ways can two closed-loop dynamic systems be said to be "similar?" This question arises in a wide range of dynamic systems modeling and control system design applications. For example, bounds on error models are fundamental to the controller optimization with modern control design methods. Metrics such as the structured singular value are direct measures of the degree to which properties such as stability or performance are maintained in the presence of specified uncertainties or variations in the plant model. Similarly, controls-related areas such as system identification, model reduction, and experimental model validation employ measures of similarity between multiple realizations of a dynamic system. Each area has its tools and approaches, with each tool more or less suited for one application or the other. Similarity in the context of closed-loop model validation via flight test is subtly different from error measures in the typical controls oriented application. Whereas similarity in a robust control context relates to plant variation and the attendant affect on stability and performance, in this context similarity metrics are sought that assess the relevance of a dynamic system test for the purpose of validating the stability and performance of a "similar" dynamic system. Similarity in the context of system identification is much more relevant than are robust control analogies in that errors between one dynamic system (the test article) and another (the nominal "design" model) are sought for the purpose of bounding the validity of a model for control design and analysis. Yet system identification typically involves open-loop plant models which are independent of the control system (with the exception of limited developments in closed-loop system identification which is nonetheless focused on obtaining open-loop plant models from closed-loop data). Moreover the objectives of system identification are not the same as a flight test and

  6. CAST: a new program package for the accurate characterization of large and flexible molecular systems.

    Grebner, Christoph; Becker, Johannes; Weber, Daniel; Bellinger, Daniel; Tafipolski, Maxim; Brückner, Charlotte; Engels, Bernd

    2014-09-15

    The presented program package, Conformational Analysis and Search Tool (CAST) allows the accurate treatment of large and flexible (macro) molecular systems. For the determination of thermally accessible minima CAST offers the newly developed TabuSearch algorithm, but algorithms such as Monte Carlo (MC), MC with minimization, and molecular dynamics are implemented as well. For the determination of reaction paths, CAST provides the PathOpt, the Nudge Elastic band, and the umbrella sampling approach. Access to free energies is possible through the free energy perturbation approach. Along with a number of standard force fields, a newly developed symmetry-adapted perturbation theory-based force field is included. Semiempirical computations are possible through DFTB+ and MOPAC interfaces. For calculations based on density functional theory, a Message Passing Interface (MPI) interface to the Graphics Processing Unit (GPU)-accelerated TeraChem program is available. The program is available on request. PMID:25056524

  7. Combining visual dictionary, kernel-based similarity and learning strategy for image category retrieval

    Gosselin, Philippe-Henri; Cord, Matthieu; Philipp-Foliguet, Sylvie

    2008-01-01

    This paper presents a search engine architecture, RETIN, aiming at retrieving com- plex categories in large image databases. For indexing, a scheme based on a two-step quantization process is presented to compute visual codebooks. The similarity be- tween images is represented in a kernel framework. Such a similarity is combined with online learning strategies motivated by recent Machine-Learning developments such as Active Learning. Additionally, an offine supervised learning is embedded in ...

  8. Characteristics of Optimal Function for Ontology Similarity Measure via Multi-dividing

    Wei Gao; Tianwei Xu

    2012-01-01

    As a powerful tool, ontology has been widely applied in social science, medicine science and computer science. In computer networks, especially, ontology is used for search extension, thus boost the quality of information retrieval. Ontology concept similarity calculation is an essential problem in these applications. A new method to get similarity between vertices on ontology graph is by machine learning, and multi-dividing algorithm is suitable for ontology problem. It is usually get an ont...

  9. Nanocrystal energetics via quantum similarity measures

    We first develop a descriptor-based representation of atomic environments by devising two local similarity indices defined from an atom-partitioned quantum-chemical descriptor. Then, we employ this representation to explore the size-, shape- and composition-dependent nanocrystal energetics. For this purpose, we utilize an energy difference μ that is related to the atomic chemical potential, which enables one to characterize energetic heterogeneities. Employing first-principles calculations based on the density functional theory for a set of database systems, namely unary atomic clusters in the shape of regular polyhedra and the bulk solids of C, Si, Pd and Pt, we explore the correlations between the energy difference μ and similarity indices. We find that there exists an interconnection between nanocrystal energetics and quantum similarity measures. Accordingly, we develop a means for computing total energy differences from the similarity indices via interpolation, and utilize a test set comprising a variety of unary nanocrystals and binary nanoalloys/nanocompounds for validation. Our findings indicate that the similarity-based energies could be utilized in computer-aided design of nanoparticles. (paper)

  10. Image fusion using bi-directional similarity

    Bai, Chunshan; Luo, Xiaoyan

    2015-05-01

    Infrared images are widely used in the practical applications to capture abundant information. However, it is still challenging to enhance the infrared image by the visual image. In this paper, we propose an effective method using bidirectional similarity. In the proposed method, we aim to find an optimal solution from many feasible solutions without introducing intermediate image. We employ some priori constraints to meet the requirements of image fusion which can be detailed to preserve both good characteristics in the infrared image and spatial information in the visual image. In the iterative step, we use the matrix with the square of the difference between images to integrate the image holding most information. We call this matrix the bidirectional similarity distance. By the bidirectional similarity distance, we can get the transitive images. Then, we fuse the images according to the weight. Experimental results show that, compared to the traditional image fusion algorithm, fusion images from bidirectional similarity fusion algorithm have greatly improved in the subjective vision, entropy, structural similarity index measurement. We believe that the proposed scheme can have a wide applications.

  11. The prediction method of similar cycles

    Zhan-Le Du; Hua-Ning Wang

    2011-01-01

    The concept of degree of similarity (η),is proposed to quantitatively describe the similarity of a parameter (e.g.the maximum amplitude Rmax) of a solar cycle relative to a referenced one,and the prediction method of similar cycles is further developed.For two parameters,the solar minimum (Rmin) and rising rate (βa),which can be directly measured a few months after the minimum,a synthesis degree of similarity (ηs) is defined as the weighted-average of the η values around Rmin and βa,with the weights given by the coefficients of determination ofRmax withRmin and βa,respectively.The monthly values of the whole referenced cycle can be predicted by averaging the corresponding values in the most similar cycles with the weights givenby the ηs values.As an application,Cycle 24 is predicted to peak around January 2013 ±8 (month) with a size of about Rmax =84 + 17 and to end around September 2019.

  12. Which fast nearest neighbour search algorithm to use?

    Serrano Díaz-Carrasco, Aureo; Micó Andrés, Luisa; Oncina Carratalá, Jose

    2013-01-01

    Choosing which fast Nearest Neighbour search algorithm to use depends on the task we face. Usually kd-tree search algorithm is selected when the similarity function is the Euclidean or the Manhattan distances. Generic fast search algorithms (algorithms that works with any distance function) are only used when there is not specific fast search algorithms for the involved distance function. In this work we show that in real data problems generic search algorithms (i.e. MDF-tree) can be faster t...

  13. Faster and More Accurate Sequence Alignment with SNAP

    Zaharia, Matei; Curtis, Kristal; Fox, Armando; Patterson, David; Shenker, Scott; Stoica, Ion; Karp, Richard M; Sittler, Taylor

    2011-01-01

    We present the Scalable Nucleotide Alignment Program (SNAP), a new short and long read aligner that is both more accurate (i.e., aligns more reads with fewer errors) and 10-100x faster than state-of-the-art tools such as BWA. Unlike recent aligners based on the Burrows-Wheeler transform, SNAP uses a simple hash index of short seed sequences from the genome, similar to BLAST's. However, SNAP greatly reduces the number and cost of local alignment checks performed through several measures: it uses longer seeds to reduce the false positive locations considered, leverages larger memory capacities to speed index lookup, and excludes most candidate locations without fully computing their edit distance to the read. The result is an algorithm that scales well for reads from one hundred to thousands of bases long and provides a rich error model that can match classes of mutations (e.g., longer indels) that today's fast aligners ignore. We calculate that SNAP can align a dataset with 30x coverage of a human genome in le...

  14. Accurate measurement of streamwise vortices in low speed aerodynamic flows

    Waldman, Rye M.; Kudo, Jun; Breuer, Kenneth S.

    2010-11-01

    Low Reynolds number experiments with flapping animals (such as bats and small birds) are of current interest in understanding biological flight mechanics, and due to their application to Micro Air Vehicles (MAVs) which operate in a similar parameter space. Previous PIV wake measurements have described the structures left by bats and birds, and provided insight to the time history of their aerodynamic force generation; however, these studies have faced difficulty drawing quantitative conclusions due to significant experimental challenges associated with the highly three-dimensional and unsteady nature of the flows, and the low wake velocities associated with lifting bodies that only weigh a few grams. This requires the high-speed resolution of small flow features in a large field of view using limited laser energy and finite camera resolution. Cross-stream measurements are further complicated by the high out-of-plane flow which requires thick laser sheets and short interframe times. To quantify and address these challenges we present data from a model study on the wake behind a fixed wing at conditions comparable to those found in biological flight. We present a detailed analysis of the PIV wake measurements, discuss the criteria necessary for accurate measurements, and present a new dual-plane PIV configuration to resolve these issues.

  15. How Accurate Is Pierce's Theory of Traveling Wave Tube?

    Simon, D. H.; Chernin, D.; Wong, P.; Zhang, P.; Lau, Y. Y.; Dong, C. F.; Hoff, B.; Gilgenbach, R. M.

    2015-11-01

    This paper provides a rigorous test of the accuracy of Pierce's classical theory of traveling wave tubes (TWTs). The EXACT dispersion relation for a dielectric TWT is derived, from which the spatial amplification rate, ki, is calculated. This ki is compared with that obtained from Pierce's widely used 3-wave theory and his more general 4-wave theory (which includes the reverse propagating circuit mode). We have used various procedures to extract Pierce's gain parameter C and space charge parameter Q from the exact dispersion relation. We find that, in general, the 3-wave theory is a poor representation to the exact dispersion relation if C >0.05. However, the 4-wave theory gives excellent agreement even for C as high as 0.12 and over more than 20 percent bandwidth, if the quantity (k2 × C3) is evaluated accurately as a function of frequency, and if Q is expanded to first order in the wavenumber k, where Q is the difference between the exact dispersion relation and its 4-wave representation in which Q is set to zero. Similar tests will be performed on the disk-on-rod slow wave TWT, for which the hot tube dispersion relation including all space harmonics has been obtained. Supported by AFOSR FA9550-14-1-0309, FA9550-15-1-0097, AFRL FA9451-14-1-0374, and L-3 Communications.

  16. Downhole temperature tool accurately measures well bore profile

    This paper reports that an inexpensive temperature tool provides accurate temperatures measurements during drilling operations for better design of cement jobs, workovers, well stimulation, and well bore hydraulics. Valid temperature data during specific wellbore operations can improve initial job design, fluid testing, and slurry placement, ultimately enhancing well bore performance. This improvement applies to cement slurries, breaker activation for slurries, breaker activation for stimulation and profile control, and fluid rheological properties for all downhole operations. The temperature tool has been run standalone mounted inside drill pipe, on slick wire line and braided cable, and as a free-falltool. It has also been run piggyback on both directional surveys (slick line and free-fall) and standard logging runs. This temperature measuring system has been used extensively in field well bores to depths of 20,000 ft. The temperature tool is completely reusable in the field, ever similar to the standard directional survey tools used on may drilling rigs. The system includes a small, rugged, programmable temperature sensor, a standard body housing, various adapters for specific applications, and a personal computer (PC) interface

  17. Visual Similarity Based Document Layout Analysis

    Di Wen; Xiao-Qing Ding

    2006-01-01

    In this paper, a visual similarity based document layout analysis (DLA) scheme is proposed, which by using clustering strategy can adaptively deal with documents in different languages, with different layout structures and skew angles. Aiming at a robust and adaptive DLA approach, the authors first manage to find a set of representative filters and statistics to characterize typical texture patterns in document images, which is through a visual similarity testing process.Texture features are then extracted from these filters and passed into a dynamic clustering procedure, which is called visual similarity clustering. Finally, text contents are located from the clustered results. Benefit from this scheme, the algorithm demonstrates strong robustness and adaptability in a wide variety of documents, which previous traditional DLA approaches do not possess.

  18. Efficient Similarity Join Method Using Unsupervised Learning

    Bilal Hawashin

    2012-11-01

    Full Text Available This paper proposes an efficient similarity join method using unsupervised learning, when no labeled datais available. In our previous work, we showed that the performance of similarity join could improve whenlong string attributes, such as paper abstracts, movie summaries, product descriptions, and user feedback,are used under supervised learning, where a training set exists. In this work, we adopt using long stringattributes during the similarity join under unsupervised learning. Along with its importance when nolabeled data exists, unsupervised learning is used when no labeled data is available, it acts also as a quickpreprocessing method for huge datasets. Here, we show that using long attributes during the unsupervisedlearning can further enhance the performance. Moreover, we provide an efficient dynamically expandablealgorithm for databases with frequent transactions.

  19. Modelling of flashing flows using similarity fluids

    A method is presented for the investigation of thermodynamic and fluid dynamic similarity in the two-phase, liquid-vapor region. A simplified model fluid is developed based on a set of heuristic assumptions. The fundamental governing equations are reduced to dimensionless form through the introduction of appropriate scales. Although the methodology outlined is general in its scope, it has been applied to the case of similarity between water substance and refrigerant 114 (R114). Sample calculations are presented for the solution of the flow of a two-phase fluid from the flash point to the choking point. The correspondence based on our similarity analysis is shown to be very good. The advantages of being able to substitute R114 for water in laboratory experiments include lower pressures, temperatures, and flow rates as well as a significant reduction in the physical size of the apparatus

  20. PHOG analysis of self-similarity in aesthetic images

    Amirshahi, Seyed Ali; Koch, Michael; Denzler, Joachim; Redies, Christoph

    2012-03-01

    In recent years, there have been efforts in defining the statistical properties of aesthetic photographs and artworks using computer vision techniques. However, it is still an open question how to distinguish aesthetic from non-aesthetic images with a high recognition rate. This is possibly because aesthetic perception is influenced also by a large number of cultural variables. Nevertheless, the search for statistical properties of aesthetic images has not been futile. For example, we have shown that the radially averaged power spectrum of monochrome artworks of Western and Eastern provenance falls off according to a power law with increasing spatial frequency (1/f2 characteristics). This finding implies that this particular subset of artworks possesses a Fourier power spectrum that is self-similar across different scales of spatial resolution. Other types of aesthetic images, such as cartoons, comics and mangas also display this type of self-similarity, as do photographs of complex natural scenes. Since the human visual system is adapted to encode images of natural scenes in a particular efficient way, we have argued that artists imitate these statistics in their artworks. In support of this notion, we presented results that artists portrait human faces with the self-similar Fourier statistics of complex natural scenes although real-world photographs of faces are not self-similar. In view of these previous findings, we investigated other statistical measures of self-similarity to characterize aesthetic and non-aesthetic images. In the present work, we propose a novel measure of self-similarity that is based on the Pyramid Histogram of Oriented Gradients (PHOG). For every image, we first calculate PHOG up to pyramid level 3. The similarity between the histograms of each section at a particular level is then calculated to the parent section at the previous level (or to the histogram at the ground level). The proposed approach is tested on datasets of aesthetic and

  1. Quantifying the similarity of seismic polarizations

    Jones, Joshua P.; Eaton, David W.; Caffagni, Enrico

    2016-02-01

    Assessing the similarities of seismic attributes can help identify tremor, low signal-to-noise (S/N) signals and converted or reflected phases, in addition to diagnosing site noise and sensor misalignment in arrays. Polarization analysis is a widely accepted method for studying the orientation and directional characteristics of seismic phases via computed attributes, but similarity is ordinarily discussed using qualitative comparisons with reference values or known seismic sources. Here we introduce a technique for quantitative polarization similarity that uses weighted histograms computed in short, overlapping time windows, drawing on methods adapted from the image processing and computer vision literature. Our method accounts for ambiguity in azimuth and incidence angle and variations in S/N ratio. Measuring polarization similarity allows easy identification of site noise and sensor misalignment and can help identify coherent noise and emergent or low S/N phase arrivals. Dissimilar azimuths during phase arrivals indicate misaligned horizontal components, dissimilar incidence angles during phase arrivals indicate misaligned vertical components and dissimilar linear polarization may indicate a secondary noise source. Using records of the Mw = 8.3 Sea of Okhotsk earthquake, from Canadian National Seismic Network broad-band sensors in British Columbia and Yukon Territory, Canada, and a vertical borehole array at Hoadley gas field, central Alberta, Canada, we demonstrate that our method is robust to station spacing. Discrete wavelet analysis extends polarization similarity to the time-frequency domain in a straightforward way. Time-frequency polarization similarities of borehole data suggest that a coherent noise source may have persisted above 8 Hz several months after peak resource extraction from a `flowback' type hydraulic fracture.

  2. Searching Online for 'Hemorrhoids'?

    ... For Consumers Home For Consumers Consumer Updates Searching Online for 'Hemorrhoids'? Share Tweet Linkedin Pin it More ... he believes most people would rather search anonymously online for information about hemorrhoids than ask their doctors, ...

  3. Giant African pouched rats (Cricetomys gambianus) that work on tilled soil accurately detect land mines.

    Edwards, Timothy L; Cox, Christophe; Weetjens, Bart; Tewelde, Tesfazghi; Poling, Alan

    2015-09-01

    Pouched rats were employed as mine-detection animals in a quality-control application where they searched for mines in areas previously processed by a mechanical tiller. The rats located 58 mines and fragments in this 28,050-m(2) area with a false indication rate of 0.4 responses per 100 m(2) . Humans with metal detectors found no mines that were not located by the rats. These findings indicate that pouched rats can accurately detect land mines in disturbed soil and suggest that they can play multiple roles in humanitarian demining. PMID:25962550

  4. LSH Ensemble: Internet Scale Domain Search

    Zhu, Erkang; Nargesian, Fatemeh; Pu, Ken Q.; Miller, Renée J.

    2016-01-01

    We study the problem of domain search where a domain is a set of values from an unspecified universe. We use set containment, defined as $|Q \\cap X|/|Q|$, as the measure of relevance of a domain $X$ to a query domain $Q$. Our choice of set containment over Jaccard similarity as a measure of relevance makes our work particularly suitable for searching Open Data and data on the web, as Jaccard similarity is known to have poor performance over sets with large differences in their sizes. We demon...

  5. Unveiling Music Structure Via PLSA Similarity Fusion

    Arenas-García, Jerónimo; Meng, Anders; Petersen, Kaare Brandt;

    2007-01-01

    Nowadays there is an increasing interest in developing methods for building music recommendation systems. In order to get a satisfactory performance from such a system, one needs to incorporate as much information about songs similarity as possible; however, how to do so is not obvious. In this...... observed similarities can be satisfactorily explained using the latent semantics. Additionally, this approach significantly simplifies the song retrieval phase, leading to a more practical system implementation. The suitability of the PLSA model for representing music structure is studied in a simplified...

  6. Self-Similarity Limits of Genomic Signatures

    Wu, Z B

    2002-01-01

    It is shown that metric representation of DNA sequences is one-to-one. By using the metric representation method, suppression of nucleotide strings in the DNA sequences is determined. For a DNA sequence, an optimal string length to display genomic signature in chaos game representation is obtained by eliminating effects of the finite sequence. The optical string length is further shown as a self- similarity limit in computing information dimension. By using the method, self-similarity limits of bacteria complete genomic signatures are further determined.

  7. Outer boundaries of self-similar tiles

    Drenning, Shawn; Palagallo, Judith; Price, Thomas; Strichartz, Robert S.

    2005-01-01

    There are many examples of self-similar tiles that are connected, but whose interior is disconnected. For such tiles we show that the boundary of a component of the interior may be decomposed into a finite union of pieces, each similar to a subset of the outer boundary of the tile. This is significant because the outer boundary typically has lower dimension than the full boundary. We describe a method to realize the outer boundary as the invariant set of a graph-directed iterated function sys...

  8. Some more similarities between Peirce and Skinner.

    Moxley, Roy A

    2002-01-01

    C. S. Peirce is noted for pioneering a variety of views, and the case is made here for the similarities and parallels between his views and B. F. Skinner's radical behaviorism. In addition to parallels previously noted, these similarities include an advancement of experimental science, a behavioral psychology, a shift from nominalism to realism, an opposition to positivism, a selectionist account for strengthening behavior, the importance of a community of selves, a recursive approach to method, and the probabilistic nature of truth. Questions are raised as to the extent to which Skinner's radical behaviorism, as distinguished from his S-R positivism, may be seen as an extension of Peirce's pragmatism. PMID:22478387

  9. Differences and similarities in breast cancer risk assessment models in clinical practice : which model to choose?

    Jacobi, Catharina E.; de Bock, Geertruida H.; Siegerink, Bob; van Asperen, Christi J.

    2009-01-01

    To show differences and similarities between risk estimation models for breast cancer in healthy women from BRCA1/2-negative or untested families. After a systematic literature search seven models were selected: Gail-2, Claus Model, Claus Tables, BOADICEA, Jonker Model, Claus-Extended Formula, and T

  10. A new similarity computing method based on concept similarity in Chinese text processing

    PENG Jing; YANG DongQing; TANG ShiWei; WANG TengJiao; GAO Jun

    2008-01-01

    The paper proposes a new text similarity computing method based on concept similarity in Chinese text processing. The new method converts text to words vec-tor space modet al first, and then splits words into a set of concepts. Through computing the inner products between concepts, it obtains the similarity between words. The new method computes the similarity of text based on the similarity of words at last. The contributions of the paper include: 1) propose a new computing formula between words; 2) propose a new text similarity computing method based on words similarity; 3) successfully use the method in the application of similarity computing of WEB news; and 4) prove the validity of the method through extensive experiments.

  11. Integrated vs. Federated Search

    Løvschall, Kasper

    2009-01-01

    Oplæg om forskelle og ligheder mellem integrated og federated search i bibliotekskontekst. Holdt ved temadag om "Integrated Search - samsøgning i alle kilder" på Danmarks Biblioteksskole den 22. januar 2009.......Oplæg om forskelle og ligheder mellem integrated og federated search i bibliotekskontekst. Holdt ved temadag om "Integrated Search - samsøgning i alle kilder" på Danmarks Biblioteksskole den 22. januar 2009....

  12. Search engine credibility

    2005-01-01

    While search engines have become increasingly popular over the past years, little research is concerned with how they attend to credibility. Through interviews with six Norwegian search engine companies; this study reveals how search engines attend to areas affecting credibility. Search engines appear focused towards areas affecting credibility, yet their understanding of online credibility appears to be low. The study then compares the findings with a previous study of how users assess credi...

  13. Search Based Software Engineering

    Jaspreet Bedi; Kuljit Kaur

    2014-01-01

    This paper reviews the search based software engineering research and finds the major milestones in this direction. The SBSE approach has been the topic of several surveys and reviews. Search Based Software Engineering (SBSE) consists of the application of search-based optimization to software engineering. Using SBSE, a software engineering task is formulated as a search problem by defining a suitable candidate solution representation and a fitness function to differentiate be...

  14. Searching Databases with Keywords

    Shan Wang; Kun-Long Zhang

    2005-01-01

    Traditionally, SQL query language is used to search the data in databases. However, it is inappropriate for end-users, since it is complex and hard to learn. It is the need of end-user, searching in databases with keywords, like in web search engines. This paper presents a survey of work on keyword search in databases. It also includes a brief introduction to the SEEKER system which has been developed.

  15. Updating collection representations for federated search

    Shokouhi, M; Baillie, M; Azzopardi, L.

    2007-01-01

    To facilitate the search for relevant information across a set of online distributed collections, a federated information retrieval system typically represents each collection, centrally, by a set of vocabularies or sampled documents. Accurate retrieval is therefore related to how precise each representation reflects the underlying content stored in that collection. As collections evolve over time, collection representations should also be updated to reflect any change, however, a current sol...

  16. A comprehensive system for evaluation of remote sequence similarity detection

    Kim Bong-Hyun

    2007-08-01

    Full Text Available Abstract Background Accurate and sensitive performance evaluation is crucial for both effective development of better structure prediction methods based on sequence similarity, and for the comparative analysis of existing methods. Up to date, there has been no satisfactory comprehensive evaluation method that (i is based on a large and statistically unbiased set of proteins with clearly defined relationships; and (ii covers all performance aspects of sequence-based structure predictors, such as sensitivity and specificity, alignment accuracy and coverage, and structure template quality. Results With the aim of designing such a method, we (i select a statistically balanced set of divergent protein domains from SCOP, and define similarity relationships for the majority of these domains by complementing the best of information available in SCOP with a rigorous SVM-based algorithm; and (ii develop protocols for the assessment of similarity detection and alignment quality from several complementary perspectives. The evaluation of similarity detection is based on ROC-like curves and includes several complementary approaches to the definition of true/false positives. Reference-dependent approaches use the 'gold standard' of pre-defined domain relationships and structure-based alignments. Reference-independent approaches assess the quality of structural match predicted by the sequence alignment, with respect to the whole domain length (global mode or to the aligned region only (local mode. Similarly, the evaluation of alignment quality includes several reference-dependent and -independent measures, in global and local modes. As an illustration, we use our benchmark to compare the performance of several methods for the detection of remote sequence similarities, and show that different aspects of evaluation reveal different properties of the evaluated methods, highlighting their advantages, weaknesses, and potential for further development. Conclusion The

  17. Personalized online information search and visualization

    Orthner Helmuth F

    2005-03-01

    Full Text Available Abstract Background The rapid growth of online publications such as the Medline and other sources raises the questions how to get the relevant information efficiently. It is important, for a bench scientist, e.g., to monitor related publications constantly. It is also important, for a clinician, e.g., to access the patient records anywhere and anytime. Although time-consuming, this kind of searching procedure is usually similar and simple. Likely, it involves a search engine and a visualization interface. Different words or combination reflects different research topics. The objective of this study is to automate this tedious procedure by recording those words/terms in a database and online sources, and use the information for an automated search and retrieval. The retrieved information will be available anytime and anywhere through a secure web server. Results We developed such a database that stored searching terms, journals and et al., and implement a piece of software for searching the medical subject heading-indexed sources such as the Medline and other online sources automatically. The returned information were stored locally, as is, on a server and visible through a Web-based interface. The search was performed daily or otherwise scheduled and the users logon to the website anytime without typing any words. The system has potentials to retrieve similarly from non-medical subject heading-indexed literature or a privileged information source such as a clinical information system. The issues such as security, presentation and visualization of the retrieved information were thus addressed. One of the presentation issues such as wireless access was also experimented. A user survey showed that the personalized online searches saved time and increased and relevancy. Handheld devices could also be used to access the stored information but less satisfactory. Conclusion The Web-searching software or similar system has potential to be an efficient

  18. Multi Agent Architecture for Search Engine

    Disha Verma

    2016-03-01

    Full Text Available The process of retrieving information is becoming ambiguous day by day due to huge collection of documents present on web. A single keyword produces millions of results related to given query but these results are not up to user expectations. The search results produced from traditional text search engines may be relevant or irrelevant. The underlying reason is Web documents are HTML documents that do not contain semantic descriptors and annotations. The paper proposes multi agent architecture to produce fewer but personalized results. The purpose of the research is to provide platform for domain specific personalized search. Personalized search allows delivering web pages in accordance with user’s interest and domain. The proposed architecture uses client side as well server side personalization to provide user with personalized fever but more accurate results. Multi agent search engine architecture uses the concept of semantic descriptors for acquiring knowledge about given domain and leading to personalized search results. Semantic descriptors are represented as network graph that holds relationship between given problem in form of hierarchy. This hierarchical classification is termed as Taxonomy.

  19. Cultural Similarities and Differences on Idiom Translation

    黄频频; 陈于全

    2010-01-01

    Both English and Chinese are abound with idioms. Idioms are an important part of the hnguage and culture of a society. English and Chinese idioms carved with cultural characteristics account for a great part in the tramlation. This paper studies the translation of idioms concerning their cultural similarities, cultural differences and transhtion principles.

  20. Structural similarity of genetically interacting proteins

    Nussinov Ruth

    2008-07-01

    Full Text Available Abstract Background The study of gene mutants and their interactions is fundamental to understanding gene function and backup mechanisms within the cell. The recent availability of large scale genetic interaction networks in yeast and worm allows the investigation of the biological mechanisms underlying these interactions at a global scale. To date, less than 2% of the known genetic interactions in yeast or worm can be accounted for by sequence similarity. Results Here, we perform a genome-scale structural comparison among protein pairs in the two species. We show that significant fractions of genetic interactions involve structurally similar proteins, spanning 7–10% and 14% of all known interactions in yeast and worm, respectively. We identify several structural features that are predictive of genetic interactions and show their superiority over sequence-based features. Conclusion Structural similarity is an important property that can explain and predict genetic interactions. According to the available data, the most abundant mechanism for genetic interactions among structurally similar proteins is a common interacting partner shared by two genetically interacting proteins.

  1. The Case of the Similar Trees.

    Meyer, Rochelle Wilson

    1982-01-01

    A possible logical flaw based on similar triangles is discussed with the Sherlock Holmes mystery, "The Muskgrave Ritual." The possible flaw has to do with the need for two trees to have equal growth rates over a 250-year period in order for the solution presented to work. (MP)

  2. Similarity counting architecture for object detection

    Lee, Chin-Hwa

    1986-01-01

    A new algorithm to detect object in image is presented here. It can achieve similar results as the two dimensional correlation method with shorter execution time. An architecture using content addressable memory is implemented in the SCALD CAD environment. The design of the shift counter unit is described in detail.

  3. Learning by similarity in coordination problems

    Steiner, Jakub; Stewart, C.

    -, č. 324 (2007), s. 1-40. ISSN 1211-3298 R&D Projects: GA MŠk LC542 Institutional research plan: CEZ:AV0Z70850503 Keywords : similarity * learning * case-based reasoning Subject RIV: AH - Economics http://www.cerge-ei.cz/pdf/wp/Wp324.pdf

  4. Some Similarity between Contractions and Kannan Mappings

    Tomonari Suzuki

    2008-03-01

    Full Text Available Contractions are always continuous and Kannan mappings are not necessarily continuous. This is a very big difference between both mappings. However, we know that relaxed both mappings are quite similar. In this paper, we discuss both mappings from a new point of view.

  5. Large-Scale Similarity Joins With Guarantees

    Pagh, Rasmus

    2015-01-01

    The ability to handle noisy or imprecise data is becoming increasingly important in computing. In the database community the notion of similarity join has been studied extensively, yet existing solutions have offered weak performance guarantees. Either they are based on deterministic filtering te...

  6. Measuring structural similarity in large online networks.

    Shi, Yongren; Macy, Michael

    2016-09-01

    Structural similarity based on bipartite graphs can be used to detect meaningful communities, but the networks have been tiny compared to massive online networks. Scalability is important in applications involving tens of millions of individuals with highly skewed degree distributions. Simulation analysis holding underlying similarity constant shows that two widely used measures - Jaccard index and cosine similarity - are biased by the distribution of out-degree in web-scale networks. However, an alternative measure, the Standardized Co-incident Ratio (SCR), is unbiased. We apply SCR to members of Congress, musical artists, and professional sports teams to show how massive co-following on Twitter can be used to map meaningful affiliations among cultural entities, even in the absence of direct connections to one another. Our results show how structural similarity can be used to map cultural alignments and demonstrate the potential usefulness of social media data in the study of culture, politics, and organizations across the social and behavioral sciences. PMID:27480374

  7. Explaining Sibling Similarities: Perceptions of Sibling Influences

    Whiteman, Shawn D.; McHale, Susan M.; Crouter, Ann C.

    2007-01-01

    This study examined older siblings' influence on their younger brothers and sisters by assessing the connections between youth's perceptions of sibling influence and sibling similarities in four domains: Risky behavior, peer competence, sports interests, and art interests. Participants included two adolescent-age siblings (firstborn age M=17.34;…

  8. Cross-kingdom similarities in microbiome functions

    Mendes, R.; Raaijmakers, J.M.

    2016-01-01

    Recent advances in medical research have revealed how humans rely on their microbiome for diverse traits and functions. Similarly, microbiomes of other higher organisms play key roles in disease, health, growth and development of their host. Exploring microbiome functions across kingdoms holds enorm

  9. The Similarity Renormalization Group with Novel Generators

    Li, W.; Anderson, E. R.; Furnstahl, R. J.

    2011-01-01

    The choice of generator in the Similarity Renormalization Group (SRG) flow equation determines the evolution pattern of the Hamiltonian. The kinetic energy has been used in the generator for most prior applications to nuclear interactions, and other options have been largely unexplored. Here we show how variations of this standard choice can allow the evolution to proceed more efficiently without losing its advantages.

  10. Discovering Music Structure via Similarity Fusion

    Automatic methods for music navigation and music recommendation exploit the structure in the music to carry out a meaningful exploration of the “song space”. To get a satisfactory performance from such systems, one should incorporate as much information about songs similarity as possible; however...

  11. Partial order similarity based on mutual information

    Tibély, Gergely; Palla, Gergely

    2016-01-01

    Comparing the ranking of candidates by different voters is an important topic in social and information science with a high relevance from the point of view of practical applications. In general, ties and pairs of incomparable candidates may occur, thus, the alternative rankings are described by partial orders. Various distance measures between partial orders have already been introduced, where zero distance is corresponding to a perfect match between a pair of partial orders, and larger values signal greater differences. Here we take a different approach and propose a similarity measure based on adjusted mutual information. In general, the similarity value of unity is corresponding to exactly matching partial orders, while a low similarity is associated to a pair of independent partial orders. The time complexity of the computation of this similarity measure is $\\mathcal{O}(\\left|{\\mathcal C}\\right|^3)$ in the worst case, and $\\mathcal{O}(\\left|{\\mathcal C}\\right|^2\\ln \\left|{\\mathcal C}\\right|)$ in the typi...

  12. Similarity, trust in institutions, affect, and populism

    Scholderer, Joachim; Finucane, Melissa L.

    affect is a quicker, easier, and a more efficient way of navigating in a complex and uncertain world. Hence, many theorists give affect a direct and primary role in motivating behavior. Taken together, the results provide uncannily strong support for the value-similarity hypothesis, strengthening the...

  13. Black hole physics: More similar than knot

    Gómez, José L.

    2016-08-01

    The detection of a discrete knot of particle emission from the active galaxy M81* reveals that black hole accretion is self-similar with regard to mass, producing the same knotty jets irrespective of black hole mass and accretion rate.

  14. SELF-SIMILARITY OF VERTICAL BUBBLY JETS

    I. E. Lima Neto

    2015-06-01

    Full Text Available AbstractAn integral model for vertical bubbly jets with nearly monodisperse bubble sizes is presented. The model is based on the Gaussian type self-similarity of mean liquid velocity, bubble velocity and void fraction, as well as on functional relationships for initial liquid jet velocity and radius, bubble diameter and relative velocity. Adjusting the model to experimental data available in the literature for a wide range of densimetric Froude numbers provide constant values for the entrainment coefficient, momentum amplification factor, and spreading ratio of the bubble core for different flow conditions. Consistency and sensitivity of key model parameters are also verified. Overall, the deviations between model predictions and axial/radial profiles of mean liquid velocity, bubble velocity and void fraction are lower than about 20%, which suggests that bubbly jets tend to behave as self-preserving shear flows, similarly to single-phase jets and plumes. Furthermore, model simulations indicate a behavior similar to those of single-phase buoyant jets and slurry jets, but some differences with respect to confined bubbly jets are highlighted. This article provides not only a contribution to the problem of self-similarity in two-phase jets, but also a comprehensive model that can be used for analysis of artificial aeration/mixing systems involving bubbly jets.

  15. Self-similarity of Proton Spin

    Tokarev, M. V.; Aparin, A. A.; Zborovský, Imrich

    Trieste: International School for Advanced Studies, 2015, s. 037. ISSN 1824-8039. [XXII International Baldin Seminar on High Energy Physics Problems. Dubna (RU), 15.09.2014-20.09.2014] R&D Projects: GA MŠk(CZ) LG13031 Institutional support: RVO:61389005 Keywords : Self-similarity * proton-proton collisions * asymmetry Subject RIV: BE - Theoretical Physics

  16. Image Tracking for the High Similarity Drug Tablets Based on Light Intensity Reflective Energy and Artificial Neural Network

    Liang, Zhongwei; Zhou, Liang; Liu, Xiaochu; Wang, Xiaogang

    2014-01-01

    It is obvious that tablet image tracking exerts a notable influence on the efficiency and reliability of high-speed drug mass production, and, simultaneously, it also emerges as a big difficult problem and targeted focus during production monitoring in recent years, due to the high similarity shape and random position distribution of those objectives to be searched for. For the purpose of tracking tablets accurately in random distribution, through using surface fitting approach and transitional vector determination, the calibrated surface of light intensity reflective energy can be established, describing the shape topology and topography details of objective tablet. On this basis, the mathematical properties of these established surfaces have been proposed, and thereafter artificial neural network (ANN) has been employed for classifying those moving targeted tablets by recognizing their different surface properties; therefore, the instantaneous coordinate positions of those drug tablets on one image frame can then be determined. By repeating identical pattern recognition on the next image frame, the real-time movements of objective tablet templates were successfully tracked in sequence. This paper provides reliable references and new research ideas for the real-time objective tracking in the case of drug production practices. PMID:25143781

  17. Image tracking for the high similarity drug tablets based on light intensity reflective energy and artificial neural network.

    Liang, Zhongwei; Zhou, Liang; Liu, Xiaochu; Wang, Xiaogang

    2014-01-01

    It is obvious that tablet image tracking exerts a notable influence on the efficiency and reliability of high-speed drug mass production, and, simultaneously, it also emerges as a big difficult problem and targeted focus during production monitoring in recent years, due to the high similarity shape and random position distribution of those objectives to be searched for. For the purpose of tracking tablets accurately in random distribution, through using surface fitting approach and transitional vector determination, the calibrated surface of light intensity reflective energy can be established, describing the shape topology and topography details of objective tablet. On this basis, the mathematical properties of these established surfaces have been proposed, and thereafter artificial neural network (ANN) has been employed for classifying those moving targeted tablets by recognizing their different surface properties; therefore, the instantaneous coordinate positions of those drug tablets on one image frame can then be determined. By repeating identical pattern recognition on the next image frame, the real-time movements of objective tablet templates were successfully tracked in sequence. This paper provides reliable references and new research ideas for the real-time objective tracking in the case of drug production practices. PMID:25143781

  18. Image Tracking for the High Similarity Drug Tablets Based on Light Intensity Reflective Energy and Artificial Neural Network

    Zhongwei Liang

    2014-01-01

    Full Text Available It is obvious that tablet image tracking exerts a notable influence on the efficiency and reliability of high-speed drug mass production, and, simultaneously, it also emerges as a big difficult problem and targeted focus during production monitoring in recent years, due to the high similarity shape and random position distribution of those objectives to be searched for. For the purpose of tracking tablets accurately in random distribution, through using surface fitting approach and transitional vector determination, the calibrated surface of light intensity reflective energy can be established, describing the shape topology and topography details of objective tablet. On this basis, the mathematical properties of these established surfaces have been proposed, and thereafter artificial neural network (ANN has been employed for classifying those moving targeted tablets by recognizing their different surface properties; therefore, the instantaneous coordinate positions of those drug tablets on one image frame can then be determined. By repeating identical pattern recognition on the next image frame, the real-time movements of objective tablet templates were successfully tracked in sequence. This paper provides reliable references and new research ideas for the real-time objective tracking in the case of drug production practices.

  19. Humans process dog and human facial affect in similar ways.

    Annett Schirmer

    Full Text Available Humans share aspects of their facial affect with other species such as dogs. Here we asked whether untrained human observers with and without dog experience are sensitive to these aspects and recognize dog affect with better-than-chance accuracy. Additionally, we explored similarities in the way observers process dog and human expressions. The stimulus material comprised naturalistic facial expressions of pet dogs and human infants obtained through positive (i.e., play and negative (i.e., social isolation provocation. Affect recognition was assessed explicitly in a rating task using full face images and images cropped to reveal the eye region only. Additionally, affect recognition was assessed implicitly in a lexical decision task using full faces as primes and emotional words and pseudowords as targets. We found that untrained human observers rated full face dog expressions from the positive and negative condition more accurately than would be expected by chance. Although dog experience was unnecessary for this effect, it significantly facilitated performance. Additionally, we observed a range of similarities between human and dog face processing. First, the facial expressions of both species facilitated lexical decisions to affectively congruous target words suggesting that their processing was equally automatic. Second, both dog and human negative expressions were recognized from both full and cropped faces. Third, female observers were more sensitive to affective information than were male observers and this difference was comparable for dog and human expressions. Together, these results extend existing work on cross-species similarities in facial emotions and provide evidence that these similarities are naturally exploited when humans interact with dogs.

  20. Tales from the Field: Search Strategies Applied in Web Searching

    Soohyung Joo; Iris Xie

    2010-01-01

    In their web search processes users apply multiple types of search strategies, which consist of different search tactics. This paper identifies eight types of information search strategies with associated cases based on sequences of search tactics during the information search process. Thirty-one participants representing the general public were recruited for this study. Search logs and verbal protocols offered rich data for the identification of different types of search strategies. Based on...