WorldWideScience

Sample records for accurate similarity search

  1. Application of kernel functions for accurate similarity search in large chemical databases

    OpenAIRE

    2010-01-01

    Background Similaritysearch in chemical structure databases is an important problem with many applications in chemical genomics, drug design, and efficient chemical probe screening among others. It is widely believed that structure based methods provide an efficient way to do the query. Recently various graph kernel functions have been designed to capture the intrinsic similarity of graphs. Though successful in constructing accurate predictive and classification models, graph kernel functions...

  2. Approximate similarity search

    OpenAIRE

    Amato, Giuseppe

    2000-01-01

    Similarity searching is fundamental in various application areas. Recently it has attracted much attention in the database community because of the growing need to deal with large volume of data. Consequently, efficiency has become a matter of concern in design. Although much has been done to develop structures able to perform fast similarity search, results are still not satisfactory, and more research is needed. The performance of similarity search for complex features deteriorates and does...

  3. Protein structural similarity search by Ramachandran codes

    OpenAIRE

    Chang Chih-Hung; Huang Po-Jung; Lo Wei-Cheng; Lyu Ping-Chiang

    2007-01-01

    Abstract Background Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we ai...

  4. Multivariate Time Series Similarity Searching

    OpenAIRE

    Jimin Wang; Yuelong Zhu; Shijin Li; Dingsheng Wan; Pengcheng Zhang

    2014-01-01

    Multivariate time series (MTS) datasets are very common in various financial, multimedia, and hydrological fields. In this paper, a dimension-combination method is proposed to search similar sequences for MTS. Firstly, the similarity of single-dimension series is calculated; then the overall similarity of the MTS is obtained by synthesizing each of the single-dimension similarity based on weighted BORDA voting method. The dimension-combination method could use the existing similarity searchin...

  5. Protein structural similarity search by Ramachandran codes

    Directory of Open Access Journals (Sweden)

    Chang Chih-Hung

    2007-08-01

    Full Text Available Abstract Background Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases. Results We propose a new linear encoding method, SARST (Structural similarity search Aided by Ramachandran Sequential Transformation. SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms. Conclusion As a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era.

  6. Scaling Group Testing Similarity Search

    OpenAIRE

    Iscen, Ahmet; Amsaleg, Laurent; Furon, Teddy

    2016-01-01

    The large dimensionality of modern image feature vectors, up to thousands of dimensions, is challenging the high dimensional indexing techniques. Traditional approaches fail at returning good quality results within a response time that is usable in practice. However, similarity search techniques inspired by the group testing framework have recently been proposed in an attempt to specifically defeat the curse of dimensionality. Yet, group testing does not scale and fails at indexing very large...

  7. Semantically enabled image similarity search

    Science.gov (United States)

    Casterline, May V.; Emerick, Timothy; Sadeghi, Kolia; Gosse, C. A.; Bartlett, Brent; Casey, Jason

    2015-05-01

    Georeferenced data of various modalities are increasingly available for intelligence and commercial use, however effectively exploiting these sources demands a unified data space capable of capturing the unique contribution of each input. This work presents a suite of software tools for representing geospatial vector data and overhead imagery in a shared high-dimension vector or embedding" space that supports fused learning and similarity search across dissimilar modalities. While the approach is suitable for fusing arbitrary input types, including free text, the present work exploits the obvious but computationally difficult relationship between GIS and overhead imagery. GIS is comprised of temporally-smoothed but information-limited content of a GIS, while overhead imagery provides an information-rich but temporally-limited perspective. This processing framework includes some important extensions of concepts in literature but, more critically, presents a means to accomplish them as a unified framework at scale on commodity cloud architectures.

  8. Biosequence Similarity Search on the Mercury System

    OpenAIRE

    Krishnamurthy, Praveen; Buhler, Jeremy; Chamberlain, Roger; Franklin, Mark; Gyang, Kwame; Jacob, Arpith; Lancaster, Joseph

    2007-01-01

    Biosequence similarity search is an important application in modern molecular biology. Search algorithms aim to identify sets of sequences whose extensional similarity suggests a common evolutionary origin or function. The most widely used similarity search tool for biosequences is BLAST, a program designed to compare query sequences to a database. Here, we present the design of BLASTN, the version of BLAST that searches DNA sequences, on the Mercury system, an architecture that supports high...

  9. Similarity Measures for Boolean Search Request Formulations.

    Science.gov (United States)

    Radecki, Tadeusz

    1982-01-01

    Proposes a means for determining the similarity between search request formulations in online information retrieval systems, and discusses the use of similarity measures for clustering search formulations and document files in such systems. Experimental results using the proposed methods are presented in three tables. A reference list is provided.…

  10. Ultra accurate collaborative information filtering via directed user similarity

    OpenAIRE

    Guo, Qiang; Song, Wen-Jun; Liu, Jian-Guo

    2014-01-01

    A key challenge of the collaborative filtering (CF) information filtering is how to obtain the reliable and accurate results with the help of peers' recommendation. Since the similarities from small-degree users to large-degree users would be larger than the ones opposite direction, the large-degree users' selections are recommended extensively by the traditional second-order CF algorithms. By considering the users' similarity direction and the second-order correlations to depress the influen...

  11. Secure sketch search for document similarity

    OpenAIRE

    Örencik, Cengiz; Orencik, Cengiz; Alewiwi, Mahmoud Khaled; SAVAŞ, Erkay; Savas, Erkay

    2015-01-01

    Document similarity search is an important problem that has many applications especially in outsourced data. With the wide spread of cloud computing, users tend to outsource their data to remote servers which are not necessarily trusted. This leads to the problem of protecting the privacy of sensitive data. We design and implement two secure similarity search schemes for textual documents utilizing locality sensitive hashing techniques for cosine similarity. While the first one provides very ...

  12. Efficient Authentication of Outsourced String Similarity Search

    OpenAIRE

    Dong, Boxiang; Wang, Hui

    2016-01-01

    Cloud computing enables the outsourcing of big data analytics, where a third party server is responsible for data storage and processing. In this paper, we consider the outsourcing model that provides string similarity search as the service. In particular, given a similarity search query, the service provider returns all strings from the outsourced dataset that are similar to the query string. A major security concern of the outsourcing paradigm is to authenticate whether the service provider...

  13. Mobile P2P Fast Similarity Search

    OpenAIRE

    Bocek, T; Hecht, F. V.; Hausheer, D; Hunt, E; Stiller, B.

    2009-01-01

    In informal data sharing environments, misspellings cause problems for data indexing and retrieval. This is even more pronounced in mobile environments, in which devices with limited input devices are used. In a mobile environment, similarity search algorithms for finding misspelled data need to account for limited CPU and bandwidth. This demo shows P2P fast similarity search (P2PFastSS) running on mobile phones and laptops that is tailored to uncertain data entry and use...

  14. Multiresolution Similarity Search in Image Databases

    OpenAIRE

    Heczko, Martin; Hinneburg, Alexander; Keim, Daniel A.; Wawryniuk, Markus

    2004-01-01

    Typically searching image collections is based on features of the images. In most cases the features are based on the color histogram of the images. Similarity search based on color histograms is very efficient, but the quality of the search results is often rather poor. One of the reasons is that histogram-based systems only support a specific form of global similarity using the whole histogram as one vector. But there is more information in a histogram than the distribution of colors. This ...

  15. Representation Independent Proximity and Similarity Search

    OpenAIRE

    Chodpathumwan, Yodsawalai; Aleyasin, Amirhossein; Termehchy, Arash; Sun, Yizhou

    2015-01-01

    Finding similar or strongly related entities in a graph database is a fundamental problem in data management and analytics with applications in similarity query processing, entity resolution, and pattern matching. Similarity search algorithms usually leverage the structural properties of the data graph to quantify the degree of similarity or relevance between entities. Nevertheless, the same information can be represented in many different structures and the structural properties observed ove...

  16. Ultra-accurate collaborative information filtering via directed user similarity

    Science.gov (United States)

    Guo, Q.; Song, W.-J.; Liu, J.-G.

    2014-07-01

    A key challenge of the collaborative filtering (CF) information filtering is how to obtain the reliable and accurate results with the help of peers' recommendation. Since the similarities from small-degree users to large-degree users would be larger than the ones in opposite direction, the large-degree users' selections are recommended extensively by the traditional second-order CF algorithms. By considering the users' similarity direction and the second-order correlations to depress the influence of mainstream preferences, we present the directed second-order CF (HDCF) algorithm specifically to address the challenge of accuracy and diversity of the CF algorithm. The numerical results for two benchmark data sets, MovieLens and Netflix, show that the accuracy of the new algorithm outperforms the state-of-the-art CF algorithms. Comparing with the CF algorithm based on random walks proposed by Liu et al. (Int. J. Mod. Phys. C, 20 (2009) 285) the average ranking score could reach 0.0767 and 0.0402, which is enhanced by 27.3% and 19.1% for MovieLens and Netflix, respectively. In addition, the diversity, precision and recall are also enhanced greatly. Without relying on any context-specific information, tuning the similarity direction of CF algorithms could obtain accurate and diverse recommendations. This work suggests that the user similarity direction is an important factor to improve the personalized recommendation performance.

  17. Effective semantic search using thematic similarity

    Directory of Open Access Journals (Sweden)

    Sharifullah Khan

    2014-07-01

    Full Text Available Most existing semantic search systems expand search keywords using domain ontology to deal with semantic heterogeneity. They focus on matching the semantic similarity of individual keywords in a multiple-keywords query; however, they ignore the semantic relationships that exist among the keywords of the query themselves. The systems return less relevant answers for these types of queries. More relevant documents for a multiple-keywords query can be retrieved if the systems know the relationships that exist among multiple keywords in the query. The proposed search methodology matches patterns of keywords for capturing the context of keywords, and then the relevant documents are ranked according to their pattern relevance score. A prototype system has been implemented to validate the proposed search methodology. The system has been compared with existing systems for evaluation. The results demonstrate improvement in precision and recall of search.

  18. Web Search Results Summarization Using Similarity Assessment

    Directory of Open Access Journals (Sweden)

    Sawant V.V.

    2014-06-01

    Full Text Available Now day’s internet has become part of our life, the WWW is most important service of internet because it allows presenting information such as document, imaging etc. The WWW grows rapidly and caters to a diversified levels and categories of users. For user specified results web search results are extracted. Millions of information pouring online, users has no time to surf the contents completely .Moreover the information available is repeated or duplicated in nature. This issue has created the necessity to restructure the search results that could yield results summarized. The proposed approach comprises of different feature extraction of web pages. Web page visual similarity assessment has been employed to address the problems in different fields including phishing, web archiving, web search engine etc. In this approach, initially by enters user query the number of search results get stored. The Earth Mover's Distance is used to assessment of web page visual similarity, in this technique take the web page as a low resolution image, create signature of that web page image with color and co-ordinate features .Calculate the distance between web pages by applying EMD method. Compute the Layout Similarity value by using tag comparison algorithm and template comparison algorithm. Textual similarity is computed by using cosine similarity, and hyperlink analysis is performed to compute outward links. The final similarity value is calculated by fusion of layout, text, hyperlink and EMD value. Once the similarity matrix is found clustering is employed with the help of connected component. Finally group of similar web pages i.e. summarized results get displayed to user. Experiment conducted to demonstrate the effectiveness of four methods to generate summarized result on different web pages and user queries also.

  19. SEAL: Spatio-Textual Similarity Search

    CERN Document Server

    Fan, Ju; Zhou, Lizhu; Chen, Shanshan; Hu, Jun

    2012-01-01

    Location-based services (LBS) have become more and more ubiquitous recently. Existing methods focus on finding relevant points-of-interest (POIs) based on users' locations and query keywords. Nowadays, modern LBS applications generate a new kind of spatio-textual data, regions-of-interest (ROIs), containing region-based spatial information and textual description, e.g., mobile user profiles with active regions and interest tags. To satisfy search requirements on ROIs, we study a new research problem, called spatio-textual similarity search: Given a set of ROIs and a query ROI, we find the similar ROIs by considering spatial overlap and textual similarity. Spatio-textual similarity search has many important applications, e.g., social marketing in location-aware social networks. It calls for an efficient search method to support large scales of spatio-textual data in LBS systems. To this end, we introduce a filter-and-verification framework to compute the answers. In the filter step, we generate signatures for ...

  20. New similarity search based glioma grading

    Energy Technology Data Exchange (ETDEWEB)

    Haegler, Katrin; Brueckmann, Hartmut; Linn, Jennifer [Ludwig-Maximilians-University of Munich, Department of Neuroradiology, Munich (Germany); Wiesmann, Martin; Freiherr, Jessica [RWTH Aachen University, Department of Neuroradiology, Aachen (Germany); Boehm, Christian [Ludwig-Maximilians-University of Munich, Department of Computer Science, Munich (Germany); Schnell, Oliver; Tonn, Joerg-Christian [Ludwig-Maximilians-University of Munich, Department of Neurosurgery, Munich (Germany)

    2012-08-15

    MR-based differentiation between low- and high-grade gliomas is predominately based on contrast-enhanced T1-weighted images (CE-T1w). However, functional MR sequences as perfusion- and diffusion-weighted sequences can provide additional information on tumor grade. Here, we tested the potential of a recently developed similarity search based method that integrates information of CE-T1w and perfusion maps for non-invasive MR-based glioma grading. We prospectively included 37 untreated glioma patients (23 grade I/II, 14 grade III gliomas), in whom 3T MRI with FLAIR, pre- and post-contrast T1-weighted, and perfusion sequences was performed. Cerebral blood volume, cerebral blood flow, and mean transit time maps as well as CE-T1w images were used as input for the similarity search. Data sets were preprocessed and converted to four-dimensional Gaussian Mixture Models that considered correlations between the different MR sequences. For each patient, a so-called tumor feature vector (= probability-based classifier) was defined and used for grading. Biopsy was used as gold standard, and similarity based grading was compared to grading solely based on CE-T1w. Accuracy, sensitivity, and specificity of pure CE-T1w based glioma grading were 64.9%, 78.6%, and 56.5%, respectively. Similarity search based tumor grading allowed differentiation between low-grade (I or II) and high-grade (III) gliomas with an accuracy, sensitivity, and specificity of 83.8%, 78.6%, and 87.0%. Our findings indicate that integration of perfusion parameters and CE-T1w information in a semi-automatic similarity search based analysis improves the potential of MR-based glioma grading compared to CE-T1w data alone. (orig.)

  1. New similarity search based glioma grading

    International Nuclear Information System (INIS)

    MR-based differentiation between low- and high-grade gliomas is predominately based on contrast-enhanced T1-weighted images (CE-T1w). However, functional MR sequences as perfusion- and diffusion-weighted sequences can provide additional information on tumor grade. Here, we tested the potential of a recently developed similarity search based method that integrates information of CE-T1w and perfusion maps for non-invasive MR-based glioma grading. We prospectively included 37 untreated glioma patients (23 grade I/II, 14 grade III gliomas), in whom 3T MRI with FLAIR, pre- and post-contrast T1-weighted, and perfusion sequences was performed. Cerebral blood volume, cerebral blood flow, and mean transit time maps as well as CE-T1w images were used as input for the similarity search. Data sets were preprocessed and converted to four-dimensional Gaussian Mixture Models that considered correlations between the different MR sequences. For each patient, a so-called tumor feature vector (= probability-based classifier) was defined and used for grading. Biopsy was used as gold standard, and similarity based grading was compared to grading solely based on CE-T1w. Accuracy, sensitivity, and specificity of pure CE-T1w based glioma grading were 64.9%, 78.6%, and 56.5%, respectively. Similarity search based tumor grading allowed differentiation between low-grade (I or II) and high-grade (III) gliomas with an accuracy, sensitivity, and specificity of 83.8%, 78.6%, and 87.0%. Our findings indicate that integration of perfusion parameters and CE-T1w information in a semi-automatic similarity search based analysis improves the potential of MR-based glioma grading compared to CE-T1w data alone. (orig.)

  2. Efficient Video Similarity Measurement and Search

    Energy Technology Data Exchange (ETDEWEB)

    Cheung, S-C S

    2002-12-19

    The amount of information on the world wide web has grown enormously since its creation in 1990. Duplication of content is inevitable because there is no central management on the web. Studies have shown that many similar versions of the same text documents can be found throughout the web. This redundancy problem is more severe for multimedia content such as web video sequences, as they are often stored in multiple locations and different formats to facilitate downloading and streaming. Similar versions of the same video can also be found, unknown to content creators, when web users modify and republish original content using video editing tools. Identifying similar content can benefit many web applications and content owners. For example, it will reduce the number of similar answers to a web search and identify inappropriate use of copyright content. In this dissertation, they present a system architecture and corresponding algorithms to efficiently measure, search, and organize similar video sequences found on any large database such as the web.

  3. Outsourced similarity search on metric data assets

    KAUST Repository

    Yiu, Man Lung

    2012-02-01

    This paper considers a cloud computing setting in which similarity querying of metric data is outsourced to a service provider. The data is to be revealed only to trusted users, not to the service provider or anyone else. Users query the server for the most similar data objects to a query example. Outsourcing offers the data owner scalability and a low-initial investment. The need for privacy may be due to the data being sensitive (e.g., in medicine), valuable (e.g., in astronomy), or otherwise confidential. Given this setting, the paper presents techniques that transform the data prior to supplying it to the service provider for similarity queries on the transformed data. Our techniques provide interesting trade-offs between query cost and accuracy. They are then further extended to offer an intuitive privacy guarantee. Empirical studies with real data demonstrate that the techniques are capable of offering privacy while enabling efficient and accurate processing of similarity queries.

  4. Earthquake detection through computationally efficient similarity search.

    Science.gov (United States)

    Yoon, Clara E; O'Reilly, Ossian; Bergen, Karianne J; Beroza, Gregory C

    2015-12-01

    Seismology is experiencing rapid growth in the quantity of data, which has outpaced the development of processing algorithms. Earthquake detection-identification of seismic events in continuous data-is a fundamental operation for observational seismology. We developed an efficient method to detect earthquakes using waveform similarity that overcomes the disadvantages of existing detection methods. Our method, called Fingerprint And Similarity Thresholding (FAST), can analyze a week of continuous seismic waveform data in less than 2 hours, or 140 times faster than autocorrelation. FAST adapts a data mining algorithm, originally designed to identify similar audio clips within large databases; it first creates compact "fingerprints" of waveforms by extracting key discriminative features, then groups similar fingerprints together within a database to facilitate fast, scalable search for similar fingerprint pairs, and finally generates a list of earthquake detections. FAST detected most (21 of 24) cataloged earthquakes and 68 uncataloged earthquakes in 1 week of continuous data from a station located near the Calaveras Fault in central California, achieving detection performance comparable to that of autocorrelation, with some additional false detections. FAST is expected to realize its full potential when applied to extremely long duration data sets over a distributed network of seismic stations. The widespread application of FAST has the potential to aid in the discovery of unexpected seismic signals, improve seismic monitoring, and promote a greater understanding of a variety of earthquake processes. PMID:26665176

  5. Earthquake detection through computationally efficient similarity search

    Science.gov (United States)

    Yoon, Clara E.; O’Reilly, Ossian; Bergen, Karianne J.; Beroza, Gregory C.

    2015-01-01

    Seismology is experiencing rapid growth in the quantity of data, which has outpaced the development of processing algorithms. Earthquake detection—identification of seismic events in continuous data—is a fundamental operation for observational seismology. We developed an efficient method to detect earthquakes using waveform similarity that overcomes the disadvantages of existing detection methods. Our method, called Fingerprint And Similarity Thresholding (FAST), can analyze a week of continuous seismic waveform data in less than 2 hours, or 140 times faster than autocorrelation. FAST adapts a data mining algorithm, originally designed to identify similar audio clips within large databases; it first creates compact “fingerprints” of waveforms by extracting key discriminative features, then groups similar fingerprints together within a database to facilitate fast, scalable search for similar fingerprint pairs, and finally generates a list of earthquake detections. FAST detected most (21 of 24) cataloged earthquakes and 68 uncataloged earthquakes in 1 week of continuous data from a station located near the Calaveras Fault in central California, achieving detection performance comparable to that of autocorrelation, with some additional false detections. FAST is expected to realize its full potential when applied to extremely long duration data sets over a distributed network of seismic stations. The widespread application of FAST has the potential to aid in the discovery of unexpected seismic signals, improve seismic monitoring, and promote a greater understanding of a variety of earthquake processes. PMID:26665176

  6. Performance Evaluation and Optimization of Math-Similarity Search

    OpenAIRE

    Zhang, Qun; Youssef, Abdou

    2015-01-01

    Similarity search in math is to find mathematical expressions that are similar to a user's query. We conceptualized the similarity factors between mathematical expressions, and proposed an approach to math similarity search (MSS) by defining metrics based on those similarity factors [11]. Our preliminary implementation indicated the advantage of MSS compared to non-similarity based search. In order to more effectively and efficiently search similar math expressions, MSS is further optimized. ...

  7. Highly accurate recommendation algorithm based on high-order similarities

    CERN Document Server

    Liu, Jian-Guo; Wang, Bing-Hong; Zhang, Yi-Cheng

    2008-01-01

    In this Letter, we introduce a modified collaborative filtering (MCF) algorithm, which has remarkably higher accuracy than the standard collaborative filtering. In the MCF, instead of the standard Pearson coefficient, the user-user similarities are obtained by a diffusion process. Furthermore, by considering the second order similarities, we design an effective algorithm that depresses the influence of mainstream preferences. The corresponding algorithmic accuracy, measured by the ranking score, is further improved by 24.9% in the optimal case. In addition, two significant criteria of algorithmic performance, diversity and popularity, are also taken into account. Numerical results show that the algorithm based on second order similarity can outperform the MCF simultaneously in all three criteria.

  8. Outsourced similarity search on metric data assets

    DEFF Research Database (Denmark)

    Yiu, Man Lung; Assent, Ira; Jensen, Christian Søndergaard;

    2012-01-01

    This paper considers a cloud computing setting in which similarity querying of metric data is outsourced to a service provider. The data is to be revealed only to trusted users, not to the service provider or anyone else. Users query the server for the most similar data objects to a query example...

  9. A Similarity Search Using Molecular Topological Graphs

    Directory of Open Access Journals (Sweden)

    Yoshifumi Fukunishi

    2009-01-01

    Full Text Available A molecular similarity measure has been developed using molecular topological graphs and atomic partial charges. Two kinds of topological graphs were used. One is the ordinary adjacency matrix and the other is a matrix which represents the minimum path length between two atoms of the molecule. The ordinary adjacency matrix is suitable to compare the local structures of molecules such as functional groups, and the other matrix is suitable to compare the global structures of molecules. The combination of these two matrices gave a similarity measure. This method was applied to in silico drug screening, and the results showed that it was effective as a similarity measure.

  10. Fast similarity search in peer-to-peer networks

    OpenAIRE

    Bocek, T; Hunt, E; Hausheer, D; Stiller, B.

    2008-01-01

    Peer-to-peer (P2P) systems show numerous advantages over centralized systems, such as load balancing, scalability, and fault tolerance, and they require certain functionality, such as search, repair, and message and data transfer. In particular, structured P2P networks perform an exact search in logarithmic time proportional to the number of peers. However, keyword similarity search in a structured P2P network remains a challenge. Similarity search for service discovery can significantly impr...

  11. The Time Course of Similarity Effects in Visual Search

    Science.gov (United States)

    Guest, Duncan; Lamberts, Koen

    2011-01-01

    It is well established that visual search becomes harder when the similarity between target and distractors is increased and the similarity between distractors is decreased. However, in models of visual search, similarity is typically treated as a static, time-invariant property of the relation between objects. Data from other perceptual tasks…

  12. Learning Style Similarity for Searching Infographics

    OpenAIRE

    Saleh, Babak; Dontcheva, Mira; Hertzmann, Aaron; Liu, Zhicheng

    2015-01-01

    Infographics are complex graphic designs integrating text, images, charts and sketches. Despite the increasing popularity of infographics and the rapid growth of online design portfolios, little research investigates how we can take advantage of these design resources. In this paper we present a method for measuring the style similarity between infographics. Based on human perception data collected from crowdsourced experiments, we use computer vision and machine learning algorithms to learn ...

  13. A Similarity Search Using Molecular Topological Graphs

    OpenAIRE

    2009-01-01

    A molecular similarity measure has been developed using molecular topological graphs and atomic partial charges. Two kinds of topological graphs were used. One is the ordinary adjacency matrix and the other is a matrix which represents the minimum path length between two atoms of the molecule. The ordinary adjacency matrix is suitable to compare the local structures of molecules such as functional groups, and the other matrix is suitable to compare the global structures of molecules. The comb...

  14. Visual similarity is stronger than semantic similarity in guiding visual search for numbers

    OpenAIRE

    Godwin, H.J.; Hout, M.C.; Menneer, T.

    2014-01-01

    Using a visual search task, we explored how behavior is influenced by both visual and semantic information. We recorded participants’ eye movements as they searched for a single target number in a search array of single-digit numbers (0–9). We examined the probability of fixating the various distractors as a function of two key dimensions: the visual similarity between the target and each distractor, and the semantic similarity (i.e., the numerical distance) between the target and each distra...

  15. Fast and secure similarity search in high dimensional space

    OpenAIRE

    Furon, Teddy; Jégou, Hervé; Amsaleg, Laurent; Mathon, Benjamin

    2013-01-01

    Similarity search in high dimensional space database is split into two worlds: i) fast, scalable, and approximate search algorithms which are not secure, and ii) search protocols based on secure computation which are not scalable. This paper presents a one-way privacy protocol that lies in between these two worlds. Approximate metrics for the cosine similarity allows speed. Elements of large random matrix theory provides security evidences if the size of the database is not too big with respe...

  16. Visual similarity is stronger than semantic similarity in guiding visual search for numbers.

    Science.gov (United States)

    Godwin, Hayward J; Hout, Michael C; Menneer, Tamaryn

    2014-06-01

    Using a visual search task, we explored how behavior is influenced by both visual and semantic information. We recorded participants' eye movements as they searched for a single target number in a search array of single-digit numbers (0-9). We examined the probability of fixating the various distractors as a function of two key dimensions: the visual similarity between the target and each distractor, and the semantic similarity (i.e., the numerical distance) between the target and each distractor. Visual similarity estimates were obtained using multidimensional scaling based on the independent observer similarity ratings. A linear mixed-effects model demonstrated that both visual and semantic similarity influenced the probability that distractors would be fixated. However, the visual similarity effect was substantially larger than the semantic similarity effect. We close by discussing the potential value of using this novel methodological approach and the implications for both simple and complex visual search displays.

  17. Distributed efficient similarity search mechanism in wireless sensor networks.

    Science.gov (United States)

    Ahmed, Khandakar; Gregory, Mark A

    2015-01-01

    The Wireless Sensor Network similarity search problem has received considerable research attention due to sensor hardware imprecision and environmental parameter variations. Most of the state-of-the-art distributed data centric storage (DCS) schemes lack optimization for similarity queries of events. In this paper, a DCS scheme with metric based similarity searching (DCSMSS) is proposed. DCSMSS takes motivation from vector distance index, called iDistance, in order to transform the issue of similarity searching into the problem of an interval search in one dimension. In addition, a sector based distance routing algorithm is used to efficiently route messages. Extensive simulation results reveal that DCSMSS is highly efficient and significantly outperforms previous approaches in processing similarity search queries. PMID:25751081

  18. Distributed Efficient Similarity Search Mechanism in Wireless Sensor Networks

    OpenAIRE

    Khandakar Ahmed; Gregory, Mark A.

    2015-01-01

    The Wireless Sensor Network similarity search problem has received considerable research attention due to sensor hardware imprecision and environmental parameter variations. Most of the state-of-the-art distributed data centric storage (DCS) schemes lack optimization for similarity queries of events. In this paper, a DCS scheme with metric based similarity searching (DCSMSS) is proposed. DCSMSS takes motivation from vector distance index, called iDistance, in order to transform the issue of s...

  19. CellMontage: similar expression profile search server.

    Science.gov (United States)

    Fujibuchi, Wataru; Kiseleva, Larisa; Taniguchi, Takeaki; Harada, Hajime; Horton, Paul

    2007-11-15

    The establishment and rapid expansion of microarray databases has created a need for new search tools. Here we present CellMontage, the first server for expression profile similarity search over a large database-69 000 microarray experiments derived from NCBI's; GEO site. CellMontage provides a novel, content-based search engine for accessing gene expression data. Microarray experiments with similar overall expression to a user-provided expression profile (e.g. microarray experiment) are computed and displayed-usually within 20 s. The core search engine software is downloadable from the site. PMID:17895274

  20. Activity-relevant similarity values for fingerprints and implications for similarity searching

    OpenAIRE

    Swarit Jasial; Ye Hu; Martin Vogt; Jürgen Bajorath

    2016-01-01

    A largely unsolved problem in chemoinformatics is the issue of how calculated compound similarity relates to activity similarity, which is central to many applications. In general, activity relationships are predicted from calculated similarity values. However, there is no solid scientific foundation to bridge between calculated molecular and observed activity similarity. Accordingly, the success rate of identifying new active compounds by similarity searching is limited. Although various att...

  1. Fast and accurate protein substructure searching with simulated annealing and GPUs

    Directory of Open Access Journals (Sweden)

    Stivala Alex D

    2010-09-01

    Full Text Available Abstract Background Searching a database of protein structures for matches to a query structure, or occurrences of a structural motif, is an important task in structural biology and bioinformatics. While there are many existing methods for structural similarity searching, faster and more accurate approaches are still required, and few current methods are capable of substructure (motif searching. Results We developed an improved heuristic for tableau-based protein structure and substructure searching using simulated annealing, that is as fast or faster and comparable in accuracy, with some widely used existing methods. Furthermore, we created a parallel implementation on a modern graphics processing unit (GPU. Conclusions The GPU implementation achieves up to 34 times speedup over the CPU implementation of tableau-based structure search with simulated annealing, making it one of the fastest available methods. To the best of our knowledge, this is the first application of a GPU to the protein structural search problem.

  2. How Google Web Search copes with very similar documents

    NARCIS (Netherlands)

    Mettrop, W.; Nieuwenhuysen, P.; Smulders, H.

    2006-01-01

    A significant portion of the computer files that carry documents, multimedia, programs etc. on the Web are identical or very similar to other files on the Web. How do search engines cope with this? Do they perform some kind of “deduplication”? How should users take into account that web search resul

  3. Effective and Efficient Similarity Search in Scientific Workflow Repositories

    OpenAIRE

    Starlinger, Johannes; Cohen-Boulakia, Sarah; Khanna, Sanjeev; Davidson, Susan; Leser, Ulf

    2015-01-01

    Scientific workflows have become a valuable tool for large-scale data processing and analysis. This has led to the creation of specialized online repositories to facilitate worflkow sharing and reuse. Over time, these repositories have grown to sizes that call for advanced methods to support workflow discovery, in particular for similarity search. Effective similarity search requires both high quality algorithms for the comparison of scientific workflows and efficient strategies for indexing,...

  4. Indexing schemes for similarity search: an illustrated paradigm

    OpenAIRE

    Pestov, Vladimir; Stojmirovic, Aleksandar

    2002-01-01

    We suggest a variation of the Hellerstein--Koutsoupias--Papadimitriou indexability model for datasets equipped with a similarity measure, with the aim of better understanding the structure of indexing schemes for similarity-based search and the geometry of similarity workloads. This in particular provides a unified approach to a great variety of schemes used to index into metric spaces and facilitates their transfer to more general similarity measures such as quasi-metrics. We discuss links b...

  5. Efficient Subgraph Similarity Search on Large Probabilistic Graph Databases

    CERN Document Server

    Yuan, Ye; Chen, Lei; Wang, Haixun

    2012-01-01

    Many studies have been conducted on seeking the efficient solution for subgraph similarity search over certain (deterministic) graphs due to its wide application in many fields, including bioinformatics, social network analysis, and Resource Description Framework (RDF) data management. All these works assume that the underlying data are certain. However, in reality, graphs are often noisy and uncertain due to various factors, such as errors in data extraction, inconsistencies in data integration, and privacy preserving purposes. Therefore, in this paper, we study subgraph similarity search on large probabilistic graph databases. Different from previous works assuming that edges in an uncertain graph are independent of each other, we study the uncertain graphs where edges' occurrences are correlated. We formally prove that subgraph similarity search over probabilistic graphs is #P-complete, thus, we employ a filter-and-verify framework to speed up the search. In the filtering phase,we develop tight lower and u...

  6. The breakfast effect: dogs (Canis familiaris) search more accurately when they are less hungry.

    Science.gov (United States)

    Miller, Holly C; Bender, Charlotte

    2012-11-01

    We investigated whether the consumption of a morning meal (breakfast) by dogs (Canis familiaris) would affect search accuracy on a working memory task following the exertion of self-control. Dogs were tested either 30 or 90 min after consuming half of their daily resting energy requirements (RER). During testing dogs were initially required to sit still for 10 min before searching for hidden food in a visible displacement task. We found that 30 min following the consumption of breakfast, and 10 min after the behavioral inhibition task, dogs searched more accurately than they did in a fasted state. Similar differences were not observed when dogs were tested 90 min after meal consumption. This pattern of behavior suggests that breakfast enhanced search accuracy following a behavioral inhibition task by providing energy for cognitive processes, and that search accuracy decreased as a function of energy depletion. PMID:23032958

  7. Exact score distribution computation for ontological similarity searches

    Directory of Open Access Journals (Sweden)

    Schulz Marcel H

    2011-11-01

    Full Text Available Abstract Background Semantic similarity searches in ontologies are an important component of many bioinformatic algorithms, e.g., finding functionally related proteins with the Gene Ontology or phenotypically similar diseases with the Human Phenotype Ontology (HPO. We have recently shown that the performance of semantic similarity searches can be improved by ranking results according to the probability of obtaining a given score at random rather than by the scores themselves. However, to date, there are no algorithms for computing the exact distribution of semantic similarity scores, which is necessary for computing the exact P-value of a given score. Results In this paper we consider the exact computation of score distributions for similarity searches in ontologies, and introduce a simple null hypothesis which can be used to compute a P-value for the statistical significance of similarity scores. We concentrate on measures based on Resnik's definition of ontological similarity. A new algorithm is proposed that collapses subgraphs of the ontology graph and thereby allows fast score distribution computation. The new algorithm is several orders of magnitude faster than the naive approach, as we demonstrate by computing score distributions for similarity searches in the HPO. It is shown that exact P-value calculation improves clinical diagnosis using the HPO compared to approaches based on sampling. Conclusions The new algorithm enables for the first time exact P-value calculation via exact score distribution computation for ontology similarity searches. The approach is applicable to any ontology for which the annotation-propagation rule holds and can improve any bioinformatic method that makes only use of the raw similarity scores. The algorithm was implemented in Java, supports any ontology in OBO format, and is available for non-commercial and academic usage under: https://compbio.charite.de/svn/hpo/trunk/src/tools/significance/

  8. SEARCH PROFILES BASED ON USER TO CLUSTER SIMILARITY

    Directory of Open Access Journals (Sweden)

    Ilija Subasic

    2007-12-01

    Full Text Available Privacy of web users' query search logs has, since last year's AOL dataset release, been treated as one of the central issues concerning privacy on the Internet, Therefore, the question of privacy preservation has also raised a lot of attention in different communities surrounding the search engines. Usage of clustering methods for providing low level contextual search, wriile retaining high privacy/utility is examined in this paper. By using only the user's cluster membership the search query terms could be no longer retained thus providing less privacy concerns both for the users and companies. The paper brings lightweight framework for combining query words, user similarities and clustering in order to provide a meaningful way of mining user searches while protecting their privacy. This differs from previous attempts for privacy preserving in the attempt to anonymize the queries instead of the users.

  9. Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision

    OpenAIRE

    Holliday John D; Kanoulas Evangelos; Malim Nurul; Willett Peter

    2011-01-01

    Abstract Background Data fusion methods are widely used in virtual screening, and make the implicit assumption that the more often a molecule is retrieved in multiple similarity searches, the more likely it is to be active. This paper tests the correctness of this assumption. Results Sets of 25 searches using either the same reference structure and 25 different similarity measures (similarity fusion) or 25 different reference structures and the same similarity measure (group fusion) show that...

  10. RAPSearch: a fast protein similarity search tool for short reads

    Directory of Open Access Journals (Sweden)

    Choi Jeong-Hyeon

    2011-05-01

    Full Text Available Abstract Background Next Generation Sequencing (NGS is producing enormous corpuses of short DNA reads, affecting emerging fields like metagenomics. Protein similarity search--a key step to achieve annotation of protein-coding genes in these short reads, and identification of their biological functions--faces daunting challenges because of the very sizes of the short read datasets. Results We developed a fast protein similarity search tool RAPSearch that utilizes a reduced amino acid alphabet and suffix array to detect seeds of flexible length. For short reads (translated in 6 frames we tested, RAPSearch achieved ~20-90 times speedup as compared to BLASTX. RAPSearch missed only a small fraction (~1.3-3.2% of BLASTX similarity hits, but it also discovered additional homologous proteins (~0.3-2.1% that BLASTX missed. By contrast, BLAT, a tool that is even slightly faster than RAPSearch, had significant loss of sensitivity as compared to RAPSearch and BLAST. Conclusions RAPSearch is implemented as open-source software and is accessible at http://omics.informatics.indiana.edu/mg/RAPSearch. It enables faster protein similarity search. The application of RAPSearch in metageomics has also been demonstrated.

  11. Online multiple kernel similarity learning for visual search.

    Science.gov (United States)

    Xia, Hao; Hoi, Steven C H; Jin, Rong; Zhao, Peilin

    2014-03-01

    Recent years have witnessed a number of studies on distance metric learning to improve visual similarity search in content-based image retrieval (CBIR). Despite their successes, most existing methods on distance metric learning are limited in two aspects. First, they usually assume the target proximity function follows the family of Mahalanobis distances, which limits their capacity of measuring similarity of complex patterns in real applications. Second, they often cannot effectively handle the similarity measure of multimodal data that may originate from multiple resources. To overcome these limitations, this paper investigates an online kernel similarity learning framework for learning kernel-based proximity functions which goes beyond the conventional linear distance metric learning approaches. Based on the framework, we propose a novel online multiple kernel similarity (OMKS) learning method which learns a flexible nonlinear proximity function with multiple kernels to improve visual similarity search in CBIR. We evaluate the proposed technique for CBIR on a variety of image data sets in which encouraging results show that OMKS outperforms the state-of-the-art techniques significantly. PMID:24457509

  12. Robust hashing with local models for approximate similarity search.

    Science.gov (United States)

    Song, Jingkuan; Yang, Yi; Li, Xuelong; Huang, Zi; Yang, Yang

    2014-07-01

    Similarity search plays an important role in many applications involving high-dimensional data. Due to the known dimensionality curse, the performance of most existing indexing structures degrades quickly as the feature dimensionality increases. Hashing methods, such as locality sensitive hashing (LSH) and its variants, have been widely used to achieve fast approximate similarity search by trading search quality for efficiency. However, most existing hashing methods make use of randomized algorithms to generate hash codes without considering the specific structural information in the data. In this paper, we propose a novel hashing method, namely, robust hashing with local models (RHLM), which learns a set of robust hash functions to map the high-dimensional data points into binary hash codes by effectively utilizing local structural information. In RHLM, for each individual data point in the training dataset, a local hashing model is learned and used to predict the hash codes of its neighboring data points. The local models from all the data points are globally aligned so that an optimal hash code can be assigned to each data point. After obtaining the hash codes of all the training data points, we design a robust method by employing l2,1 -norm minimization on the loss function to learn effective hash functions, which are then used to map each database point into its hash code. Given a query data point, the search process first maps it into the query hash code by the hash functions and then explores the buckets, which have similar hash codes to the query hash code. Extensive experimental results conducted on real-life datasets show that the proposed RHLM outperforms the state-of-the-art methods in terms of search quality and efficiency.

  13. Similarity preserving snippet-based visualization of web search results.

    Science.gov (United States)

    Gomez-Nieto, Erick; San Roman, Frizzi; Pagliosa, Paulo; Casaca, Wallace; Helou, Elias S; de Oliveira, Maria Cristina F; Nonato, Luis Gustavo

    2014-03-01

    Internet users are very familiar with the results of a search query displayed as a ranked list of snippets. Each textual snippet shows a content summary of the referred document (or webpage) and a link to it. This display has many advantages, for example, it affords easy navigation and is straightforward to interpret. Nonetheless, any user of search engines could possibly report some experience of disappointment with this metaphor. Indeed, it has limitations in particular situations, as it fails to provide an overview of the document collection retrieved. Moreover, depending on the nature of the query--for example, it may be too general, or ambiguous, or ill expressed--the desired information may be poorly ranked, or results may contemplate varied topics. Several search tasks would be easier if users were shown an overview of the returned documents, organized so as to reflect how related they are, content wise. We propose a visualization technique to display the results of web queries aimed at overcoming such limitations. It combines the neighborhood preservation capability of multidimensional projections with the familiar snippet-based representation by employing a multidimensional projection to derive two-dimensional layouts of the query search results that preserve text similarity relations, or neighborhoods. Similarity is computed by applying the cosine similarity over a "bag-of-words" vector representation of collection built from the snippets. If the snippets are displayed directly according to the derived layout, they will overlap considerably, producing a poor visualization. We overcome this problem by defining an energy functional that considers both the overlapping among snippets and the preservation of the neighborhood structure as given in the projected layout. Minimizing this energy functional provides a neighborhood preserving two-dimensional arrangement of the textual snippets with minimum overlap. The resulting visualization conveys both a global

  14. Self-Taught Hashing for Fast Similarity Search

    CERN Document Server

    Zhang, Dell; Cai, Deng; Lu, Jinsong

    2010-01-01

    The ability of fast similarity search at large scale is of great importance to many Information Retrieval (IR) applications. A promising way to accelerate similarity search is semantic hashing which designs compact binary codes for a large number of documents so that semantically similar documents are mapped to similar codes (within a short Hamming distance). Although some recently proposed techniques are able to generate high-quality codes for documents known in advance, obtaining the codes for previously unseen documents remains to be a very challenging problem. In this paper, we emphasise this issue and propose a novel Self-Taught Hashing (STH) approach to semantic hashing: we first find the optimal $l$-bit binary codes for all documents in the given corpus via unsupervised learning, and then train $l$ classifiers via supervised learning to predict the $l$-bit code for any query document unseen before. Our experiments on three real-world text datasets show that the proposed approach using binarised Laplaci...

  15. Computing Semantic Similarity Measure Between Words Using Web Search Engine

    Directory of Open Access Journals (Sweden)

    Pushpa C N

    2013-05-01

    Full Text Available Semantic Similarity measures between words plays an important role in information retrieval, natural language processing and in various tasks on the web. In this paper, we have proposed a Modified Pattern Extraction Algorithm to compute th e supervised semantic similarity measure between the words by combining both page count meth od and web snippets method. Four association measures are used to find semantic simi larity between words in page count method using web search engines. We use a Sequential Minim al Optimization (SMO support vector machines (SVM to find the optimal combination of p age counts-based similarity scores and top-ranking patterns from the web snippets method. The SVM is trained to classify synonymous word-pairs and non-synonymous word-pairs. The propo sed Modified Pattern Extraction Algorithm outperforms by 89.8 percent of correlatio n value.

  16. On optimizing distance-based similarity search for biological databases.

    Science.gov (United States)

    Mao, Rui; Xu, Weijia; Ramakrishnan, Smriti; Nuckolls, Glen; Miranker, Daniel P

    2005-01-01

    Similarity search leveraging distance-based index structures is increasingly being used for both multimedia and biological database applications. We consider distance-based indexing for three important biological data types, protein k-mers with the metric PAM model, DNA k-mers with Hamming distance and peptide fragmentation spectra with a pseudo-metric derived from cosine distance. To date, the primary driver of this research has been multimedia applications, where similarity functions are often Euclidean norms on high dimensional feature vectors. We develop results showing that the character of these biological workloads is different from multimedia workloads. In particular, they are not intrinsically very high dimensional, and deserving different optimization heuristics. Based on MVP-trees, we develop a pivot selection heuristic seeking centers and show it outperforms the most widely used corner seeking heuristic. Similarly, we develop a data partitioning approach sensitive to the actual data distribution in lieu of median splits. PMID:16447992

  17. Quick and easy implementation of approximate similarity search with Lucene

    OpenAIRE

    Amato, Giuseppe; Bolettieri, Paolo; Gennaro, Claudio; Rabitti, Fausto

    2013-01-01

    Similarity search technique has been proved to be an effective way for retrieving multimedia content. However, as the amount of available multimedia data increases, the cost of developing from scratch a robust and scalable system with content-based image retrieval facilities is quite prohibitive. In this paper, we propose to exploit an approach that allows us to convert low level features into a textual form. In this way, we are able to easily set up a retrieval system on top of the Lucene se...

  18. Query-dependent banding (QDB for faster RNA similarity searches.

    Directory of Open Access Journals (Sweden)

    Eric P Nawrocki

    2007-03-01

    Full Text Available When searching sequence databases for RNAs, it is desirable to score both primary sequence and RNA secondary structure similarity. Covariance models (CMs are probabilistic models well-suited for RNA similarity search applications. However, the computational complexity of CM dynamic programming alignment algorithms has limited their practical application. Here we describe an acceleration method called query-dependent banding (QDB, which uses the probabilistic query CM to precalculate regions of the dynamic programming lattice that have negligible probability, independently of the target database. We have implemented QDB in the freely available Infernal software package. QDB reduces the average case time complexity of CM alignment from LN(2.4 to LN(1.3 for a query RNA of N residues and a target database of L residues, resulting in a 4-fold speedup for typical RNA queries. Combined with other improvements to Infernal, including informative mixture Dirichlet priors on model parameters, benchmarks also show increased sensitivity and specificity resulting from improved parameterization.

  19. Rank-Based Similarity Search: Reducing the Dimensional Dependence.

    Science.gov (United States)

    Houle, Michael E; Nett, Michael

    2015-01-01

    This paper introduces a data structure for k-NN search, the Rank Cover Tree (RCT), whose pruning tests rely solely on the comparison of similarity values; other properties of the underlying space, such as the triangle inequality, are not employed. Objects are selected according to their ranks with respect to the query object, allowing much tighter control on the overall execution costs. A formal theoretical analysis shows that with very high probability, the RCT returns a correct query result in time that depends very competitively on a measure of the intrinsic dimensionality of the data set. The experimental results for the RCT show that non-metric pruning strategies for similarity search can be practical even when the representational dimension of the data is extremely high. They also show that the RCT is capable of meeting or exceeding the level of performance of state-of-the-art methods that make use of metric pruning or other selection tests involving numerical constraints on distance values. PMID:26353214

  20. Fast and accurate database searches with MS-GF+Percolator

    Energy Technology Data Exchange (ETDEWEB)

    Granholm, Viktor; Kim, Sangtae; Navarro, Jose' C.; Sjolund, Erik; Smith, Richard D.; Kall, Lukas

    2014-02-28

    To identify peptides and proteins from the large number of fragmentation spectra in mass spectrometrybased proteomics, researches commonly employ so called database search engines. Additionally, postprocessors like Percolator have been used on the results from such search engines, to assess confidence, infer peptides and generally increase the number of identifications. A recent search engine, MS-GF+, has previously been showed to out-perform these classical search engines in terms of the number of identified spectra. However, MS-GF+ generates only limited statistical estimates of the results, hence hampering the biological interpretation. Here, we enabled Percolator-processing for MS-GF+ output, and observed an increased number of identified peptides for a wide variety of datasets. In addition, Percolator directly reports false discovery rate estimates, such as q values and posterior error probabilities, as well as p values, for peptide-spectrum matches, peptides and proteins, functions useful for the whole proteomics community.

  1. Efficient Similarity Search Using the Earth Mover's Distance for Large Multimedia Databases

    DEFF Research Database (Denmark)

    Assent, Ira; Wichterich, Marc; Meisen, Tobias;

    2008-01-01

    Multimedia similarity search in large databases requires efficient query processing. The Earth mover's distance, introduced in computer vision, is successfully used as a similarity model in a number of small-scale applications. Its computational complexity hindered its adoption in large multimedia...... databases. We enable directly indexing the Earth mover's distance in structures such as the R-tree and the VA-file by providing the accurate 'MinDist' function to any bounding rectangle in the index. We exploit the computational structure of the new MinDist to derive a new lower bound for the EMD Min...

  2. Activity-relevant similarity values for fingerprints and implications for similarity searching [version 1; referees: 3 approved

    Directory of Open Access Journals (Sweden)

    Swarit Jasial

    2016-04-01

    Full Text Available A largely unsolved problem in chemoinformatics is the issue of how calculated compound similarity relates to activity similarity, which is central to many applications. In general, activity relationships are predicted from calculated similarity values. However, there is no solid scientific foundation to bridge between calculated molecular and observed activity similarity. Accordingly, the success rate of identifying new active compounds by similarity searching is limited. Although various attempts have been made to establish relationships between calculated fingerprint similarity values and biological activities, none of these has yielded generally applicable rules for similarity searching. In this study, we have addressed the question of molecular versus activity similarity in a more fundamental way. First, we have evaluated if activity-relevant similarity value ranges could in principle be identified for standard fingerprints and distinguished from similarity resulting from random compound comparisons. Then, we have analyzed if activity-relevant similarity values could be used to guide typical similarity search calculations aiming to identify active compounds in databases. It was found that activity-relevant similarity values can be identified as a characteristic feature of fingerprints. However, it was also shown that such values cannot be reliably used as thresholds for practical similarity search calculations. In addition, the analysis presented herein helped to rationalize differences in fingerprint search performance.

  3. Keyword search over data service integration for accurate results

    International Nuclear Information System (INIS)

    Virtual Data Integration provides a coherent interface for querying heterogeneous data sources (e.g., web services, proprietary systems) with minimum upfront effort. Still, this requires its users to learn a new query language and to get acquainted with data organization which may pose problems even to proficient users. We present a keyword search system, which proposes a ranked list of structured queries along with their explanations. It operates mainly on the metadata, such as the constraints on inputs accepted by services. It was developed as an integral part of the CMS data discovery service, and is currently available as open source.

  4. Keyword Search over Data Service Integration for Accurate Results

    CERN Document Server

    Zemleris, Vidmantas; Robert Gwadera

    2013-01-01

    Virtual data integration provides a coherent interface for querying heterogeneous data sources (e.g., web services, proprietary systems) with minimum upfront effort. Still, this requires its users to learn the query language and to get acquainted with data organization, which may pose problems even to proficient users. We present a keyword search system, which proposes a ranked list of structured queries along with their explanations. It operates mainly on the metadata, such as the constraints on inputs accepted by services. It was developed as an integral part of the CMS data discovery service, and is currently available as open source.

  5. Keyword Search over Data Service Integration for Accurate Results

    Science.gov (United States)

    Zemleris, Vidmantas; Kuznetsov, Valentin; Gwadera, Robert

    2014-06-01

    Virtual Data Integration provides a coherent interface for querying heterogeneous data sources (e.g., web services, proprietary systems) with minimum upfront effort. Still, this requires its users to learn a new query language and to get acquainted with data organization which may pose problems even to proficient users. We present a keyword search system, which proposes a ranked list of structured queries along with their explanations. It operates mainly on the metadata, such as the constraints on inputs accepted by services. It was developed as an integral part of the CMS data discovery service, and is currently available as open source.

  6. An accurate algorithm to calculate the Hurst exponent of self-similar processes

    International Nuclear Information System (INIS)

    In this paper, we introduce a new approach which generalizes the GM2 algorithm (introduced in Sánchez-Granero et al. (2008) [52]) as well as fractal dimension algorithms (FD1, FD2 and FD3) (first appeared in Sánchez-Granero et al. (2012) [51]), providing an accurate algorithm to calculate the Hurst exponent of self-similar processes. We prove that this algorithm performs properly in the case of short time series when fractional Brownian motions and Lévy stable motions are considered. We conclude the paper with a dynamic study of the Hurst exponent evolution in the S and P500 index stocks. - Highlights: • We provide a new approach to properly calculate the Hurst exponent. • This generalizes FD algorithms and GM2, introduced previously by the authors. • This method (FD4) results especially appropriate for short time series. • FD4 may be used in both unifractal and multifractal contexts. • As an empirical application, we show that S and P500 stocks improved their efficiency

  7. Activity-relevant similarity values for fingerprints and implications for similarity searching [version 2; referees: 3 approved

    OpenAIRE

    Swarit Jasial; Ye Hu; Martin Vogt; Jürgen Bajorath

    2016-01-01

    A largely unsolved problem in chemoinformatics is the issue of how calculated compound similarity relates to activity similarity, which is central to many applications. In general, activity relationships are predicted from calculated similarity values. However, there is no solid scientific foundation to bridge between calculated molecular and observed activity similarity. Accordingly, the success rate of identifying new active compounds by similarity searching is limited. Although various att...

  8. Similarity between Grover's quantum search algorithm and classical two-body collisions

    OpenAIRE

    Zhang, Jingfu; Lu, Zhiheng

    2001-01-01

    By studying the attribute of the inversion about average operation in quantum searching algorithm, we find the similarity between the quantum searching and the course of two rigid bodies'collision. Some related questions are discussed from this similarity.

  9. SHOP: scaffold hopping by GRID-based similarity searches

    DEFF Research Database (Denmark)

    Bergmann, Rikke; Linusson, Anna; Zamora, Ismael

    2007-01-01

    A new GRID-based method for scaffold hopping (SHOP) is presented. In a fully automatic manner, scaffolds were identified in a database based on three types of 3D-descriptors. SHOP's ability to recover scaffolds was assessed and validated by searching a database spiked with fragments of known...

  10. Gene expression module-based chemical function similarity search

    OpenAIRE

    Li, Yun; Hao, Pei; Zheng, Siyuan; Tu, Kang; Fan, Haiwei; Zhu, Ruixin; Ding, Guohui; Dong, Changzheng; Wang, Chuan; Li, Xuan; Thiesen, H.-J.; Chen, Y. Eugene; Jiang, HuaLiang; Liu, Lei; Li, Yixue

    2008-01-01

    Investigation of biological processes using selective chemical interventions is generally applied in biomedical research and drug discovery. Many studies of this kind make use of gene expression experiments to explore cellular responses to chemical interventions. Recently, some research groups constructed libraries of chemical related expression profiles, and introduced similarity comparison into chemical induced transcriptome analysis. Resembling sequence similarity alignment, expression pat...

  11. Cognitive Residues of Similarity: 'After-Effects' of Similarity Computations in Visual Search

    OpenAIRE

    O'Toole, Stephanie; Keane, Mark T.

    2013-01-01

    What are the 'cognitive after-effects' of making a similarity judgement? What, cognitively, is left behind and what effect might these residues have on subsequent processing? In this paper, we probe for such after-effects using a visual searcht ask, performed after a task in which pictures of real-world objects were compared. So, target objects were first presented in a comparison task (e.g., rate the similarity of this object to another) thus, presumably, modifying some of their features bef...

  12. G-Hash: Towards Fast Kernel-based Similarity Search in Large Graph Databases

    OpenAIRE

    Wang, Xiaohong; Smalter, Aaron; Huan, Jun; Lushington, Gerald H.

    2009-01-01

    Structured data including sets, sequences, trees and graphs, pose significant challenges to fundamental aspects of data management such as efficient storage, indexing, and similarity search. With the fast accumulation of graph databases, similarity search in graph databases has emerged as an important research topic. Graph similarity search has applications in a wide range of domains including cheminformatics, bioinformatics, sensor network management, social network management, and XML docum...

  13. Ranking and clustering of search results: Analysis of Similarity graph

    OpenAIRE

    Shevchuk, Ksenia Alexander

    2008-01-01

    Evaluate the clustering of the similarity matrix and confirm that it is high. Compare the ranking results of the eigenvector ranking and the Link Popularity ranking and confirm for the high clustered graph the correlation between those is larger than for the low clustered graph.

  14. Accurate estimation of influenza epidemics using Google search data via ARGO.

    Science.gov (United States)

    Yang, Shihao; Santillana, Mauricio; Kou, S C

    2015-11-24

    Accurate real-time tracking of influenza outbreaks helps public health officials make timely and meaningful decisions that could save lives. We propose an influenza tracking model, ARGO (AutoRegression with GOogle search data), that uses publicly available online search data. In addition to having a rigorous statistical foundation, ARGO outperforms all previously available Google-search-based tracking models, including the latest version of Google Flu Trends, even though it uses only low-quality search data as input from publicly available Google Trends and Google Correlate websites. ARGO not only incorporates the seasonality in influenza epidemics but also captures changes in people's online search behavior over time. ARGO is also flexible, self-correcting, robust, and scalable, making it a potentially powerful tool that can be used for real-time tracking of other social events at multiple temporal and spatial resolutions.

  15. Density-based similarity measures for content based search

    Energy Technology Data Exchange (ETDEWEB)

    Hush, Don R [Los Alamos National Laboratory; Porter, Reid B [Los Alamos National Laboratory; Ruggiero, Christy E [Los Alamos National Laboratory

    2009-01-01

    We consider the query by multiple example problem where the goal is to identify database samples whose content is similar to a coUection of query samples. To assess the similarity we use a relative content density which quantifies the relative concentration of the query distribution to the database distribution. If the database distribution is a mixture of the query distribution and a background distribution then it can be shown that database samples whose relative content density is greater than a particular threshold {rho} are more likely to have been generated by the query distribution than the background distribution. We describe an algorithm for predicting samples with relative content density greater than {rho} that is computationally efficient and possesses strong performance guarantees. We also show empirical results for applications in computer network monitoring and image segmentation.

  16. Content-Based Search on a Database of Geometric Models: Identifying Objects of Similar Shape

    Energy Technology Data Exchange (ETDEWEB)

    XAVIER, PATRICK G.; HENRY, TYSON R.; LAFARGE, ROBERT A.; MEIRANS, LILITA; RAY, LAWRENCE P.

    2001-11-01

    The Geometric Search Engine is a software system for storing and searching a database of geometric models. The database maybe searched for modeled objects similar in shape to a target model supplied by the user. The database models are generally from CAD models while the target model may be either a CAD model or a model generated from range data collected from a physical object. This document describes key generation, database layout, and search of the database.

  17. Perceptual Grouping in Haptic Search: The Influence of Proximity, Similarity, and Good Continuation

    Science.gov (United States)

    Overvliet, Krista E.; Krampe, Ralf Th.; Wagemans, Johan

    2012-01-01

    We conducted a haptic search experiment to investigate the influence of the Gestalt principles of proximity, similarity, and good continuation. We expected faster search when the distractors could be grouped. We chose edges at different orientations as stimuli because they are processed similarly in the haptic and visual modality. We therefore…

  18. Improving image similarity search effectiveness in a multimedia content management system

    OpenAIRE

    Amato, Giuseppe; Falchi, Fabrizio; Gennaro, Claudio; Rabitti, Fausto; Savino, Pasquale; Stanchev, Peter

    2004-01-01

    In this paper, a technique for making more effective the similarity search process of images in a Multimedia Content Management System is proposed. The content-based retrieval process integrates the search on different multimedia components, linked in XML structures. Depending on the specific characteristics of an image data set, some features can be more effective than others when performing similarity search. Starting from this observation, we propose a technique that predicts the effective...

  19. Searching the protein structure database for ligand-binding site similarities using CPASS v.2

    Directory of Open Access Journals (Sweden)

    Caprez Adam

    2011-01-01

    Full Text Available Abstract Background A recent analysis of protein sequences deposited in the NCBI RefSeq database indicates that ~8.5 million protein sequences are encoded in prokaryotic and eukaryotic genomes, where ~30% are explicitly annotated as "hypothetical" or "uncharacterized" protein. Our Comparison of Protein Active-Site Structures (CPASS v.2 database and software compares the sequence and structural characteristics of experimentally determined ligand binding sites to infer a functional relationship in the absence of global sequence or structure similarity. CPASS is an important component of our Functional Annotation Screening Technology by NMR (FAST-NMR protocol and has been successfully applied to aid the annotation of a number of proteins of unknown function. Findings We report a major upgrade to our CPASS software and database that significantly improves its broad utility. CPASS v.2 is designed with a layered architecture to increase flexibility and portability that also enables job distribution over the Open Science Grid (OSG to increase speed. Similarly, the CPASS interface was enhanced to provide more user flexibility in submitting a CPASS query. CPASS v.2 now allows for both automatic and manual definition of ligand-binding sites and permits pair-wise, one versus all, one versus list, or list versus list comparisons. Solvent accessible surface area, ligand root-mean square difference, and Cβ distances have been incorporated into the CPASS similarity function to improve the quality of the results. The CPASS database has also been updated. Conclusions CPASS v.2 is more than an order of magnitude faster than the original implementation, and allows for multiple simultaneous job submissions. Similarly, the CPASS database of ligand-defined binding sites has increased in size by ~ 38%, dramatically increasing the likelihood of a positive search result. The modification to the CPASS similarity function is effective in reducing CPASS similarity scores

  20. On a Probabilistic Approach to Determining the Similarity between Boolean Search Request Formulations.

    Science.gov (United States)

    Radecki, Tadeusz

    1982-01-01

    Presents and discusses the results of research into similarity measures for search request formulations which employ Boolean combinations of index terms. The use of a weighting mechanism to indicate the importance of attributes in a search formulation is described. A 16-item reference list is included. (JL)

  1. Comparative study on Authenticated Sub Graph Similarity Search in Outsourced Graph Database

    OpenAIRE

    N. D. Dhamale; Prof. S. R. Durugkar

    2015-01-01

    Today security is very important in the database system. Advanced database systems face a great challenge raised by the emergence of massive, complex structural data in bioinformatics, chem-informatics, and many other applications. Since exact matching is often too restrictive, similarity search of complex structures becomes a vital operation that must be supported efficiently. The Subgraph similarity search is used in graph databases to retrieve graphs whose subgraphs...

  2. Development of an accurate 3D blood vessel searching system using NIR light

    Science.gov (United States)

    Mizuno, Yoshifumi; Katayama, Tsutao; Nakamachi, Eiji

    2010-02-01

    Health monitoring system (HMS) and drug delivery system (DDS) require accurate puncture by needle for automatic blood sampling. In this study, we develop a miniature and high accurate automatic 3D blood vessel searching system. The size of detecting system is 40x25x10 mm. Our searching system use Near-Infrared (NIR) LEDs, CMOS camera modules and image processing units. We employ the stereo method for searching system to determine 3D blood vessel location. Blood vessel visualization system adopts hemoglobin's absorption characterization of NIR light. NIR LED is set behind the finger and it irradiates Near Infrared light for the finger. CMOS camera modules are set in front of the finger and it captures clear blood vessel images. Two dimensional location of the blood vessel is detected by luminance distribution of the image and its depth is calculated by the stereo method. 3D blood vessel location is automatically detected by our image processing system. To examine the accuracy of our detecting system, we carried out experiments using finger phantoms with blood vessel diameters, 0.5, 0.75, 1.0mm, at the depths, 0.5 ~ 2.0 mm, under the artificial tissue surface. Experimental results of depth obtained by our detecting system showed good agreements with given depths, and the availability of this system is confirmed.

  3. Effects of Part-based Similarity on Visual Search: The Frankenbear Experiment

    OpenAIRE

    Alexander, Robert G.; Zelinsky, Gregory J.

    2012-01-01

    Do the target-distractor and distractor-distractor similarity relationships known to exist for simple stimuli extend to real-world objects, and are these effects expressed in search guidance or target verification? Parts of photorealistic distractors were replaced with target parts to create four levels of target-distractor similarity under heterogenous and homogenous conditions. We found that increasing target-distractor similarity and decreasing distractor-distractor similarity impaired sea...

  4. Efficient and accurate nearest neighbor and closest pair search in high-dimensional space

    KAUST Repository

    Tao, Yufei

    2010-07-01

    Nearest Neighbor (NN) search in high-dimensional space is an important problem in many applications. From the database perspective, a good solution needs to have two properties: (i) it can be easily incorporated in a relational database, and (ii) its query cost should increase sublinearly with the dataset size, regardless of the data and query distributions. Locality-Sensitive Hashing (LSH) is a well-known methodology fulfilling both requirements, but its current implementations either incur expensive space and query cost, or abandon its theoretical guarantee on the quality of query results. Motivated by this, we improve LSH by proposing an access method called the Locality-Sensitive B-tree (LSB-tree) to enable fast, accurate, high-dimensional NN search in relational databases. The combination of several LSB-trees forms a LSB-forest that has strong quality guarantees, but improves dramatically the efficiency of the previous LSH implementation having the same guarantees. In practice, the LSB-tree itself is also an effective index which consumes linear space, supports efficient updates, and provides accurate query results. In our experiments, the LSB-tree was faster than: (i) iDistance (a famous technique for exact NN search) by two orders ofmagnitude, and (ii) MedRank (a recent approximate method with nontrivial quality guarantees) by one order of magnitude, and meanwhile returned much better results. As a second step, we extend our LSB technique to solve another classic problem, called Closest Pair (CP) search, in high-dimensional space. The long-term challenge for this problem has been to achieve subquadratic running time at very high dimensionalities, which fails most of the existing solutions. We show that, using a LSB-forest, CP search can be accomplished in (worst-case) time significantly lower than the quadratic complexity, yet still ensuring very good quality. In practice, accurate answers can be found using just two LSB-trees, thus giving a substantial

  5. A Theoretical Framework for Defining Similarity Measures for Boolean Search Request Formulations, Including Some Experimental Results.

    Science.gov (United States)

    Radecki, Tadeusz

    1985-01-01

    Reports research results into a methodology for determining similarity between queries characterized by Boolean search request formulations and discusses similarity measures for Boolean combinations of index terms. Rationale behind these measures is outlined, and conditions ensuring their equivalence are identified. Results of an experiment…

  6. SAPIR - Executing complex similarity queries over multi layer P2P search structures

    OpenAIRE

    Falchi, Fabrizio; Batko, Michal

    2009-01-01

    This deliverable reports the activities conducted within Task 5.4 "Executing complex similarity queries over multi layer P2P search structures" of the SAPIR project. In particular the deliverable discusses complex similarity queries issues and the implementation of the query processing over the P2P indexing. The document is accompanied by a zip file containing the javadoc for MUFIN.

  7. SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters

    Directory of Open Access Journals (Sweden)

    Lefkowitz Elliot J

    2004-10-01

    Full Text Available Abstract Background Large-scale sequence comparison is a powerful tool for biological inference in modern molecular biology. Comparing new sequences to those in annotated databases is a useful source of functional and structural information about these sequences. Using software such as the basic local alignment search tool (BLAST or HMMPFAM to identify statistically significant matches between newly sequenced segments of genetic material and those in databases is an important task for most molecular biologists. Searching algorithms are intrinsically slow and data-intensive, especially in light of the rapid growth of biological sequence databases due to the emergence of high throughput DNA sequencing techniques. Thus, traditional bioinformatics tools are impractical on PCs and even on dedicated UNIX servers. To take advantage of larger databases and more reliable methods, high performance computation becomes necessary. Results We describe the implementation of SS-Wrapper (Similarity Search Wrapper, a package of wrapper applications that can parallelize similarity search applications on a Linux cluster. Our wrapper utilizes a query segmentation-search (QS-search approach to parallelize sequence database search applications. It takes into consideration load balancing between each node on the cluster to maximize resource usage. QS-search is designed to wrap many different search tools, such as BLAST and HMMPFAM using the same interface. This implementation does not alter the original program, so newly obtained programs and program updates should be accommodated easily. Benchmark experiments using QS-search to optimize BLAST and HMMPFAM showed that QS-search accelerated the performance of these programs almost linearly in proportion to the number of CPUs used. We have also implemented a wrapper that utilizes a database segmentation approach (DS-BLAST that provides a complementary solution for BLAST searches when the database is too large to fit into

  8. MEASURING THE PERFORMANCE OF SIMILARITY PROPAGATION IN AN SEMANTIC SEARCH ENGINE

    Directory of Open Access Journals (Sweden)

    S. K. Jayanthi

    2013-10-01

    Full Text Available In the current scenario, web page result personalization is playing a vital role. Nearly 80 % of the users expect the best results in the first page itself without having any persistence to browse longer in URL mode. This research work focuses on two main themes: Semantic web search through online and Domain based search through offline. The first part is to find an effective method which allows grouping similar results together using BookShelf Data Structure and organizing the various clusters. The second one is focused on the academic domain based search through offline. This paper focuses on finding documents which are similar and how Vector space can be used to solve it. So more weightage is given for the principles and working methodology of similarity propagation. Cosine similarity measure is used for finding the relevancy among the documents.

  9. Comparative study on Authenticated Sub Graph Similarity Search in Outsourced Graph Database

    Directory of Open Access Journals (Sweden)

    N. D. Dhamale

    2015-11-01

    Full Text Available Today security is very important in the database system. Advanced database systems face a great challenge raised by the emergence of massive, complex structural data in bioinformatics, chem-informatics, and many other applications. Since exact matching is often too restrictive, similarity search of complex structures becomes a vital operation that must be supported efficiently. The Subgraph similarity search is used in graph databases to retrieve graphs whose subgraphs are similar to a given query graph. It has been proven successful in a wide range of applications including bioinformatics and chem-informatics, etc. Due to the cost of providing efficient similarity search services on everincreasing graph data, database outsourcing is apparently an appealing solution to database owners. In this paper, we are studying on authentication techniques that follow the popular filtering-and-verification framework. An authentication-friendly metric index called GMTree. Specifically, we transform the similarity search into a search in a graph metric space and derive small verification objects (VOs to-be-transmitted to query clients. To further optimize GMTree, we are studying on a sampling-based pivot selection method and an authenticated version of MCS computation.

  10. Efficient Retrieval of Images for Search Engine by Visual Similarity and Re Ranking

    Directory of Open Access Journals (Sweden)

    Viswa S S

    2013-06-01

    Full Text Available Nowadays, web scale image search engines (e.g. Google Image Search, Microsoft Live Image Search rely almost purely on surrounding text features. Users type keywords in hope of finding a certain type of images. The search engine returns thousands of images ranked by the text keywords extracted from the surrounding text. However, many of returned images are noisy, disorganized, or irrelevant. Even Google and Microsoft have no Visual Information for searching of images. Using visual information to re rank and improve text based image search results is the idea. This improves the precision of the text based image search ranking by incorporating the information conveyed by the visual modality. The typical assumption that the top- images in the text-based search result are equally relevant is relaxed by linking the relevance of the images to their initial rank positions. Then, a number of images from the initial search result are employed as the prototypes that serve to visually represent the query and that are subsequently used to construct meta re rankers .i.e. The most relevant images are found by visual similarity and the average scores are calculated. By applying different meta re rankers to an image from the initial result, re ranking scores are generated, which are then used to find the new rank position for an image in the re ranked search result. Human supervision is introduced to learn the model weights offline, prior to the online re ranking process. While model learning requires manual labelling of the results for a few queries, the resulting model is query independent and therefore applicable to any other query. The experimental results on a representative web image search dataset comprising 353 queries demonstrate that the proposed method outperforms the existing supervised and unsupervised Re ranking approaches. Moreover, it improves the performance over the text-based image search engine by more than 25.48%.

  11. Improving protein structure similarity searches using domain boundaries based on conserved sequence information

    OpenAIRE

    Madej Tom; Wang Yanli; Thompson Kenneth; Bryant Stephen H

    2009-01-01

    Abstract Background The identification of protein domains plays an important role in protein structure comparison. Domain query size and composition are critical to structure similarity search algorithms such as the Vector Alignment Search Tool (VAST), the method employed for computing related protein structures in NCBI Entrez system. Currently, domains identified on the basis of structural compactness are used for VAST computations. In this study, we have investigated how alternative definit...

  12. A comparison of field-based similarity searching methods: CatShape, FBSS, and ROCS.

    Science.gov (United States)

    Moffat, Kirstin; Gillet, Valerie J; Whittle, Martin; Bravi, Gianpaolo; Leach, Andrew R

    2008-04-01

    Three field-based similarity methods are compared in retrospective virtual screening experiments. The methods are the CatShape module of CATALYST, ROCS, and an in-house program developed at the University of Sheffield called FBSS. The programs are used in both rigid and flexible searches carried out in the MDL Drug Data Report. UNITY 2D fingerprints are also used to provide a comparison with a more traditional approach to similarity searching, and similarity based on simple whole-molecule properties is used to provide a baseline for the more sophisticated searches. Overall, UNITY 2D fingerprints and ROCS with the chemical force field option gave comparable performance and were superior to the shape-only 3D methods. When the flexible methods were compared with the rigid methods, it was generally found that the flexible methods gave slightly better results than their respective rigid methods; however, the increased performance did not justify the additional computational cost required.

  13. Similarity-based search of model organism, disease and drug effect phenotypes

    KAUST Repository

    Hoehndorf, Robert

    2015-02-19

    Background: Semantic similarity measures over phenotype ontologies have been demonstrated to provide a powerful approach for the analysis of model organism phenotypes, the discovery of animal models of human disease, novel pathways, gene functions, druggable therapeutic targets, and determination of pathogenicity. Results: We have developed PhenomeNET 2, a system that enables similarity-based searches over a large repository of phenotypes in real-time. It can be used to identify strains of model organisms that are phenotypically similar to human patients, diseases that are phenotypically similar to model organism phenotypes, or drug effect profiles that are similar to the phenotypes observed in a patient or model organism. PhenomeNET 2 is available at http://aber-owl.net/phenomenet. Conclusions: Phenotype-similarity searches can provide a powerful tool for the discovery and investigation of molecular mechanisms underlying an observed phenotypic manifestation. PhenomeNET 2 facilitates user-defined similarity searches and allows researchers to analyze their data within a large repository of human, mouse and rat phenotypes.

  14. Twin Similarities in Holland Types as Shown by Scores on the Self-Directed Search

    Science.gov (United States)

    Chauvin, Ida; McDaniel, Janelle R.; Miller, Mark J.; King, James M.; Eddlemon, Ondie L. M.

    2012-01-01

    This study examined the degree of similarity between scores on the Self-Directed Search from one set of identical twins. Predictably, a high congruence score was found. Results from a biographical sheet are discussed as well as implications of the results for career counselors.

  15. RScan: fast searching structural similarities for structured RNAs in large databases

    Directory of Open Access Journals (Sweden)

    Liu Guo-Ping

    2007-07-01

    Full Text Available Abstract Background Many RNAs have evolutionarily conserved secondary structures instead of primary sequences. Recently, there are an increasing number of methods being developed with focus on the structural alignments for finding conserved secondary structures as well as common structural motifs in pair-wise or multiple sequences. A challenging task is to search similar structures quickly for structured RNA sequences in large genomic databases since existing methods are too slow to be used in large databases. Results An implementation of a fast structural alignment algorithm, RScan, is proposed to fulfill the task. RScan is developed by levering the advantages of both hashing algorithms and local alignment algorithms. In our experiment, on the average, the times for searching a tRNA and an rRNA in the randomized A. pernix genome are only 256 seconds and 832 seconds respectively by using RScan, but need 3,178 seconds and 8,951 seconds respectively by using an existing method RSEARCH. Remarkably, RScan can handle large database queries, taking less than 4 minutes for searching similar structures for a microRNA precursor in human chromosome 21. Conclusion These results indicate that RScan is a preferable choice for real-life application of searching structural similarities for structured RNAs in large databases. RScan software is freely available at http://bioinfo.au.tsinghua.edu.cn/member/cxue/rscan/RScan.htm.

  16. Similarity and heterogeneity effects in visual search are mediated by "segmentability".

    Science.gov (United States)

    Utochkin, Igor S; Yurevich, Maria A

    2016-07-01

    The heterogeneity of our visual environment typically reduces the speed with which a singleton target can be found. Visual search theories explain this phenomenon via nontarget similarities and dissimilarities that affect grouping, perceptual noise, and so forth. In this study, we show that increasing the heterogeneity of a display can facilitate rather than inhibit visual search for size and orientation singletons when heterogeneous features smoothly fill the transition between highly distinguishable nontargets. We suggest that this smooth transition reduces the "segmentability" of dissimilar items to otherwise separate subsets, causing the visual system to treat them as a near-homogenous set standing apart from a singleton. (PsycINFO Database Record PMID:26784002

  17. Manifold Learning for Multivariate Variable-Length Sequences With an Application to Similarity Search.

    Science.gov (United States)

    Ho, Shen-Shyang; Dai, Peng; Rudzicz, Frank

    2016-06-01

    Multivariate variable-length sequence data are becoming ubiquitous with the technological advancement in mobile devices and sensor networks. Such data are difficult to compare, visualize, and analyze due to the nonmetric nature of data sequence similarity measures. In this paper, we propose a general manifold learning framework for arbitrary-length multivariate data sequences driven by similarity/distance (parameter) learning in both the original data sequence space and the learned manifold. Our proposed algorithm transforms the data sequences in a nonmetric data sequence space into feature vectors in a manifold that preserves the data sequence space structure. In particular, the feature vectors in the manifold representing similar data sequences remain close to one another and far from the feature points corresponding to dissimilar data sequences. To achieve this objective, we assume a semisupervised setting where we have knowledge about whether some of data sequences are similar or dissimilar, called the instance-level constraints. Using this information, one learns the similarity measure for the data sequence space and the distance measures for the manifold. Moreover, we describe an approach to handle the similarity search problem given user-defined instance level constraints in the learned manifold using a consensus voting scheme. Experimental results on both synthetic data and real tropical cyclone sequence data are presented to demonstrate the feasibility of our manifold learning framework and the robustness of performing similarity search in the learned manifold. PMID:25781959

  18. Manifold Learning for Multivariate Variable-Length Sequences With an Application to Similarity Search.

    Science.gov (United States)

    Ho, Shen-Shyang; Dai, Peng; Rudzicz, Frank

    2016-06-01

    Multivariate variable-length sequence data are becoming ubiquitous with the technological advancement in mobile devices and sensor networks. Such data are difficult to compare, visualize, and analyze due to the nonmetric nature of data sequence similarity measures. In this paper, we propose a general manifold learning framework for arbitrary-length multivariate data sequences driven by similarity/distance (parameter) learning in both the original data sequence space and the learned manifold. Our proposed algorithm transforms the data sequences in a nonmetric data sequence space into feature vectors in a manifold that preserves the data sequence space structure. In particular, the feature vectors in the manifold representing similar data sequences remain close to one another and far from the feature points corresponding to dissimilar data sequences. To achieve this objective, we assume a semisupervised setting where we have knowledge about whether some of data sequences are similar or dissimilar, called the instance-level constraints. Using this information, one learns the similarity measure for the data sequence space and the distance measures for the manifold. Moreover, we describe an approach to handle the similarity search problem given user-defined instance level constraints in the learned manifold using a consensus voting scheme. Experimental results on both synthetic data and real tropical cyclone sequence data are presented to demonstrate the feasibility of our manifold learning framework and the robustness of performing similarity search in the learned manifold.

  19. A genetic similarity algorithm for searching the Gene Ontology terms and annotating anonymous protein sequences.

    Science.gov (United States)

    Othman, Razib M; Deris, Safaai; Illias, Rosli M

    2008-02-01

    A genetic similarity algorithm is introduced in this study to find a group of semantically similar Gene Ontology terms. The genetic similarity algorithm combines semantic similarity measure algorithm with parallel genetic algorithm. The semantic similarity measure algorithm is used to compute the similitude strength between the Gene Ontology terms. Then, the parallel genetic algorithm is employed to perform batch retrieval and to accelerate the search in large search space of the Gene Ontology graph. The genetic similarity algorithm is implemented in the Gene Ontology browser named basic UTMGO to overcome the weaknesses of the existing Gene Ontology browsers which use a conventional approach based on keyword matching. To show the applicability of the basic UTMGO, we extend its structure to develop a Gene Ontology -based protein sequence annotation tool named extended UTMGO. The objective of developing the extended UTMGO is to provide a simple and practical tool that is capable of producing better results and requires a reasonable amount of running time with low computing cost specifically for offline usage. The computational results and comparison with other related tools are presented to show the effectiveness of the proposed algorithm and tools.

  20. SW#db: GPU-Accelerated Exact Sequence Similarity Database Search.

    Directory of Open Access Journals (Sweden)

    Matija Korpar

    Full Text Available In recent years we have witnessed a growth in sequencing yield, the number of samples sequenced, and as a result-the growth of publicly maintained sequence databases. The increase of data present all around has put high requirements on protein similarity search algorithms with two ever-opposite goals: how to keep the running times acceptable while maintaining a high-enough level of sensitivity. The most time consuming step of similarity search are the local alignments between query and database sequences. This step is usually performed using exact local alignment algorithms such as Smith-Waterman. Due to its quadratic time complexity, alignments of a query to the whole database are usually too slow. Therefore, the majority of the protein similarity search methods prior to doing the exact local alignment apply heuristics to reduce the number of possible candidate sequences in the database. However, there is still a need for the alignment of a query sequence to a reduced database. In this paper we present the SW#db tool and a library for fast exact similarity search. Although its running times, as a standalone tool, are comparable to the running times of BLAST, it is primarily intended to be used for exact local alignment phase in which the database of sequences has already been reduced. It uses both GPU and CPU parallelization and was 4-5 times faster than SSEARCH, 6-25 times faster than CUDASW++ and more than 20 times faster than SSW at the time of writing, using multiple queries on Swiss-prot and Uniref90 databases.

  1. Wikipedia Chemical Structure Explorer: substructure and similarity searching of molecules from Wikipedia

    OpenAIRE

    Ertl, Peter; Patiny, Luc; Sander, Thomas; Rufener, Christian; Zasso, Michaël

    2015-01-01

    Background Wikipedia, the world’s largest and most popular encyclopedia is an indispensable source of chemistry information. It contains among others also entries for over 15,000 chemicals including metabolites, drugs, agrochemicals and industrial chemicals. To provide an easy access to this wealth of information we decided to develop a substructure and similarity search tool for chemical structures referenced in Wikipedia. Results We extracted chemical structures from entries in Wikipedia an...

  2. Target enhanced 2D similarity search by using explicit biological activity annotations and profiles

    OpenAIRE

    Yu, Xiang; Geer, Lewis Y.; Han, Lianyi; Bryant, Stephen H

    2015-01-01

    Background The enriched biological activity information of compounds in large and freely-accessible chemical databases like the PubChem Bioassay Database has become a powerful research resource for the scientific research community. Currently, 2D fingerprint based conventional similarity search (CSS) is the most common widely used approach for database screening, but it does not typically incorporate the relative importance of fingerprint bits to biological activity. Results In this study, a ...

  3. Protein similarity search with subset seeds on a dedicated reconfigurable hardware

    OpenAIRE

    Peterlongo, Pierre; Noé, Laurent; Lavenier, Dominique; Georges, Gilles; Jacques, Julien; Kucherov, Gregory; Giraud, Mathieu

    2007-01-01

    Genome sequencing of numerous species raises the need of complete genome comparison with precise and fast similarity searches. Today, advanced seed-based techniques (spaced seeds, multiple seeds, subset seeds) provide better sensitivity/specificity ratios. We present an implementation of such a seed-based technique onto parallel specialized hardware embedding reconfigurable architecture (FPGA), where the FPGA is tightly connected to large capacity Flash memories. This parallel system allows l...

  4. Applying Statistical Models and Parametric Distance Measures for Music Similarity Search

    Science.gov (United States)

    Lukashevich, Hanna; Dittmar, Christian; Bastuck, Christoph

    Automatic deriving of similarity relations between music pieces is an inherent field of music information retrieval research. Due to the nearly unrestricted amount of musical data, the real-world similarity search algorithms have to be highly efficient and scalable. The possible solution is to represent each music excerpt with a statistical model (ex. Gaussian mixture model) and thus to reduce the computational costs by applying the parametric distance measures between the models. In this paper we discuss the combinations of applying different parametric modelling techniques and distance measures and weigh the benefits of each one against the others.

  5. WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTERN RETRIEVAL ALGORITHM

    Directory of Open Access Journals (Sweden)

    Pushpa C N

    2013-02-01

    Full Text Available Semantic Similarity measures plays an important role in information retrieval, natural language processing and various tasks on web such as relation extraction, community mining, document clustering, and automatic meta-data extraction. In this paper, we have proposed a Pattern Retrieval Algorithm [PRA] to compute the semantic similarity measure between the words by combining both page count method and web snippets method. Four association measures are used to find semantic similarity between words in page count method using web search engines. We use a Sequential Minimal Optimization (SMO support vector machines (SVM to find the optimal combination of page counts-based similarity scores and top-ranking patterns from the web snippets method. The SVM is trained to classify synonymous word-pairs and nonsynonymous word-pairs. The proposed approach aims to improve the Correlation values, Precision, Recall, and F-measures, compared to the existing methods. The proposed algorithm outperforms by 89.8 % of correlation value.

  6. Effects of multiple conformers per compound upon 3-D similarity search and bioassay data analysis

    Directory of Open Access Journals (Sweden)

    Kim Sunghwan

    2012-11-01

    Full Text Available Abstract Background To improve the utility of PubChem, a public repository containing biological activities of small molecules, the PubChem3D project adds computationally-derived three-dimensional (3-D descriptions to the small-molecule records contained in the PubChem Compound database and provides various search and analysis tools that exploit 3-D molecular similarity. Therefore, the efficient use of PubChem3D resources requires an understanding of the statistical and biological meaning of computed 3-D molecular similarity scores between molecules. Results The present study investigated effects of employing multiple conformers per compound upon the 3-D similarity scores between ten thousand randomly selected biologically-tested compounds (10-K set and between non-inactive compounds in a given biological assay (156-K set. When the “best-conformer-pair” approach, in which a 3-D similarity score between two compounds is represented by the greatest similarity score among all possible conformer pairs arising from a compound pair, was employed with ten diverse conformers per compound, the average 3-D similarity scores for the 10-K set increased by 0.11, 0.09, 0.15, 0.16, 0.07, and 0.18 for STST-opt, CTST-opt, ComboTST-opt, STCT-opt, CTCT-opt, and ComboTCT-opt, respectively, relative to the corresponding averages computed using a single conformer per compound. Interestingly, the best-conformer-pair approach also increased the average 3-D similarity scores for the non-inactive–non-inactive (NN pairs for a given assay, by comparable amounts to those for the random compound pairs, although some assays showed a pronounced increase in the per-assay NN-pair 3-D similarity scores, compared to the average increase for the random compound pairs. Conclusion These results suggest that the use of ten diverse conformers per compound in PubChem bioassay data analysis using 3-D molecular similarity is not expected to increase the separation of non

  7. Semantic similarity measures in the biomedical domain by leveraging a web search engine.

    Science.gov (United States)

    Hsieh, Sheau-Ling; Chang, Wen-Yung; Chen, Chi-Huang; Weng, Yung-Ching

    2013-07-01

    Various researches in web related semantic similarity measures have been deployed. However, measuring semantic similarity between two terms remains a challenging task. The traditional ontology-based methodologies have a limitation that both concepts must be resided in the same ontology tree(s). Unfortunately, in practice, the assumption is not always applicable. On the other hand, if the corpus is sufficiently adequate, the corpus-based methodologies can overcome the limitation. Now, the web is a continuous and enormous growth corpus. Therefore, a method of estimating semantic similarity is proposed via exploiting the page counts of two biomedical concepts returned by Google AJAX web search engine. The features are extracted as the co-occurrence patterns of two given terms P and Q, by querying P, Q, as well as P AND Q, and the web search hit counts of the defined lexico-syntactic patterns. These similarity scores of different patterns are evaluated, by adapting support vector machines for classification, to leverage the robustness of semantic similarity measures. Experimental results validating against two datasets: dataset 1 provided by A. Hliaoutakis; dataset 2 provided by T. Pedersen, are presented and discussed. In dataset 1, the proposed approach achieves the best correlation coefficient (0.802) under SNOMED-CT. In dataset 2, the proposed method obtains the best correlation coefficient (SNOMED-CT: 0.705; MeSH: 0.723) with physician scores comparing with measures of other methods. However, the correlation coefficients (SNOMED-CT: 0.496; MeSH: 0.539) with coder scores received opposite outcomes. In conclusion, the semantic similarity findings of the proposed method are close to those of physicians' ratings. Furthermore, the study provides a cornerstone investigation for extracting fully relevant information from digitizing, free-text medical records in the National Taiwan University Hospital database.

  8. Semantic similarity measures in the biomedical domain by leveraging a web search engine.

    Science.gov (United States)

    Hsieh, Sheau-Ling; Chang, Wen-Yung; Chen, Chi-Huang; Weng, Yung-Ching

    2013-07-01

    Various researches in web related semantic similarity measures have been deployed. However, measuring semantic similarity between two terms remains a challenging task. The traditional ontology-based methodologies have a limitation that both concepts must be resided in the same ontology tree(s). Unfortunately, in practice, the assumption is not always applicable. On the other hand, if the corpus is sufficiently adequate, the corpus-based methodologies can overcome the limitation. Now, the web is a continuous and enormous growth corpus. Therefore, a method of estimating semantic similarity is proposed via exploiting the page counts of two biomedical concepts returned by Google AJAX web search engine. The features are extracted as the co-occurrence patterns of two given terms P and Q, by querying P, Q, as well as P AND Q, and the web search hit counts of the defined lexico-syntactic patterns. These similarity scores of different patterns are evaluated, by adapting support vector machines for classification, to leverage the robustness of semantic similarity measures. Experimental results validating against two datasets: dataset 1 provided by A. Hliaoutakis; dataset 2 provided by T. Pedersen, are presented and discussed. In dataset 1, the proposed approach achieves the best correlation coefficient (0.802) under SNOMED-CT. In dataset 2, the proposed method obtains the best correlation coefficient (SNOMED-CT: 0.705; MeSH: 0.723) with physician scores comparing with measures of other methods. However, the correlation coefficients (SNOMED-CT: 0.496; MeSH: 0.539) with coder scores received opposite outcomes. In conclusion, the semantic similarity findings of the proposed method are close to those of physicians' ratings. Furthermore, the study provides a cornerstone investigation for extracting fully relevant information from digitizing, free-text medical records in the National Taiwan University Hospital database. PMID:25055314

  9. Gene network homology in prokaryotes using a similarity search approach: queries of quorum sensing signal transduction.

    Directory of Open Access Journals (Sweden)

    David N Quan

    Full Text Available Bacterial cell-cell communication is mediated by small signaling molecules known as autoinducers. Importantly, autoinducer-2 (AI-2 is synthesized via the enzyme LuxS in over 80 species, some of which mediate their pathogenicity by recognizing and transducing this signal in a cell density dependent manner. AI-2 mediated phenotypes are not well understood however, as the means for signal transduction appears varied among species, while AI-2 synthesis processes appear conserved. Approaches to reveal the recognition pathways of AI-2 will shed light on pathogenicity as we believe recognition of the signal is likely as important, if not more, than the signal synthesis. LMNAST (Local Modular Network Alignment Similarity Tool uses a local similarity search heuristic to study gene order, generating homology hits for the genomic arrangement of a query gene sequence. We develop and apply this tool for the E. coli lac and LuxS regulated (Lsr systems. Lsr is of great interest as it mediates AI-2 uptake and processing. Both test searches generated results that were subsequently analyzed through a number of different lenses, each with its own level of granularity, from a binary phylogenetic representation down to trackback plots that preserve genomic organizational information. Through a survey of these results, we demonstrate the identification of orthologs, paralogs, hitchhiking genes, gene loss, gene rearrangement within an operon context, and also horizontal gene transfer (HGT. We found a variety of operon structures that are consistent with our hypothesis that the signal can be perceived and transduced by homologous protein complexes, while their regulation may be key to defining subsequent phenotypic behavior.

  10. HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.

    Science.gov (United States)

    O'Driscoll, Aisling; Belogrudov, Vladislav; Carroll, John; Kropp, Kai; Walsh, Paul; Ghazal, Peter; Sleator, Roy D

    2015-04-01

    The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples. PMID:25625550

  11. PHOG-BLAST – a new generation tool for fast similarity search of protein families

    Directory of Open Access Journals (Sweden)

    Mironov Andrey A

    2006-06-01

    Full Text Available Abstract Background The need to compare protein profiles frequently arises in various protein research areas: comparison of protein families, domain searches, resolution of orthology and paralogy. The existing fast algorithms can only compare a protein sequence with a protein sequence and a profile with a sequence. Algorithms to compare profiles use dynamic programming and complex scoring functions. Results We developed a new algorithm called PHOG-BLAST for fast similarity search of profiles. This algorithm uses profile discretization to convert a profile to a finite alphabet and utilizes hashing for fast search. To determine the optimal alphabet, we analyzed columns in reliable multiple alignments and obtained column clusters in the 20-dimensional profile space by applying a special clustering procedure. We show that the clustering procedure works best if its parameters are chosen so that 20 profile clusters are obtained which can be interpreted as ancestral amino acid residues. With these clusters, only less than 2% of columns in multiple alignments are out of clusters. We tested the performance of PHOG-BLAST vs. PSI-BLAST on three well-known databases of multiple alignments: COG, PFAM and BALIBASE. On the COG database both algorithms showed the same performance, on PFAM and BALIBASE PHOG-BLAST was much superior to PSI-BLAST. PHOG-BLAST required 10–20 times less computer memory and computation time than PSI-BLAST. Conclusion Since PHOG-BLAST can compare multiple alignments of protein families, it can be used in different areas of comparative proteomics and protein evolution. For example, PHOG-BLAST helped to build the PHOG database of phylogenetic orthologous groups. An essential step in building this database was comparing protein complements of different species and orthologous groups of different taxons on a personal computer in reasonable time. When it is applied to detect weak similarity between protein families, PHOG-BLAST is less

  12. Quasi-metrics, Similarities and Searches: aspects of geometry of protein datasets

    CERN Document Server

    Stojmirovic, Aleksandar

    2008-01-01

    A quasi-metric is a distance function which satisfies the triangle inequality but is not symmetric: it can be thought of as an asymmetric metric. The central result of this thesis, developed in Chapter 3, is that a natural correspondence exists between similarity measures between biological (nucleotide or protein) sequences and quasi-metrics. Chapter 2 presents basic concepts of the theory of quasi-metric spaces and introduces a new examples of them: the universal countable rational quasi-metric space and its bicompletion, the universal bicomplete separable quasi-metric space. Chapter 4 is dedicated to development of a notion of the quasi-metric space with Borel probability measure, or pq-space. The main result of this chapter indicates that `a high dimensional quasi-metric space is close to being a metric space'. Chapter 5 investigates the geometric aspects of the theory of database similarity search in the context of quasi-metrics. The results about $pq$-spaces are used to produce novel theoretical bounds o...

  13. Lower Bounds on Performance of Metric Tree Indexing Schemes for Exact Similarity Search in High Dimensions

    CERN Document Server

    Pestov, Vladimir

    2008-01-01

    Within a mathematically rigorous model borrowed from statistical learning theory, we analyse the curse of dimensionality for similarity-based information retrieval in the context of a wide class of popular indexing schemes. The datasets $X$ are sampled randomly from a domain $\\Omega$, equipped with a distance, $\\rho$, and an underlying probability distribution, $\\mu$. The intrinsic dimension of the domain, $d$, is defined in terms of the concentration of measure phenomenon. For the purposes of asymptotic analysis, we send $d$ to infinity, and assume that the size of a dataset, $n$, grows faster than any polynomial function in $d$, yet slower than any exponential function in $d$. Exact similarity search refers to finding the nearest neighbour in the dataset $X$ to a query point $\\omega\\in\\Omega$, where the query points are subject to the same probability distribution $\\mu$ as datapoints. Let $\\mathscr F$ denote a class of all 1-Lipschitz functions on $\\Omega$ that can be used as decision functions in construct...

  14. Identification of protein biochemical functions by similarity search using the molecular surface database eF-site

    OpenAIRE

    Kinoshita, Kengo; Nakamura, Haruki

    2003-01-01

    The identification of protein biochemical functions based on their three-dimensional structures is strongly required in the post-genome-sequencing era. We have developed a new method to identify and predict protein biochemical functions using the similarity information of molecular surface geometries and electrostatic potentials on the surfaces. Our prediction system consists of a similarity search method based on a clique search algorithm and the molecular surface database eF-site (electrost...

  15. PSimScan: algorithm and utility for fast protein similarity search.

    Directory of Open Access Journals (Sweden)

    Anna Kaznadzey

    Full Text Available In the era of metagenomics and diagnostics sequencing, the importance of protein comparison methods of boosted performance cannot be overstated. Here we present PSimScan (Protein Similarity Scanner, a flexible open source protein similarity search tool which provides a significant gain in speed compared to BLASTP at the price of controlled sensitivity loss. The PSimScan algorithm introduces a number of novel performance optimization methods that can be further used by the community to improve the speed and lower hardware requirements of bioinformatics software. The optimization starts at the lookup table construction, then the initial lookup table-based hits are passed through a pipeline of filtering and aggregation routines of increasing computational complexity. The first step in this pipeline is a novel algorithm that builds and selects 'similarity zones' aggregated from neighboring matches on small arrays of adjacent diagonals. PSimScan performs 5 to 100 times faster than the standard NCBI BLASTP, depending on chosen parameters, and runs on commodity hardware. Its sensitivity and selectivity at the slowest settings are comparable to the NCBI BLASTP's and decrease with the increase of speed, yet stay at the levels reasonable for many tasks. PSimScan is most advantageous when used on large collections of query sequences. Comparing the entire proteome of Streptocuccus pneumoniae (2,042 proteins to the NCBI's non-redundant protein database of 16,971,855 records takes 6.5 hours on a moderately powerful PC, while the same task with the NCBI BLASTP takes over 66 hours. We describe innovations in the PSimScan algorithm in considerable detail to encourage bioinformaticians to improve on the tool and to use the innovations in their own software development.

  16. The Cost of Search for Multiple Targets: Effects of Practice and Target Similarity

    Science.gov (United States)

    Menneer, Tamaryn; Cave, Kyle R.; Donnelly, Nick

    2009-01-01

    With the use of X-ray images, performance in the simultaneous search for two target categories was compared with performance in two independent searches, one for each category. In all cases, displays contained one target at most. Dual-target search, for both categories simultaneously, produced a cost in accuracy, although the magnitude of this…

  17. Unbounded Binary Search for a Fast and Accurate Maximum Power Point Tracking

    Science.gov (United States)

    Kim, Yong Sin; Winston, Roland

    2011-12-01

    This paper presents a technique for maximum power point tracking (MPPT) of a concentrating photovoltaic system using cell level power optimization. Perturb and observe (P&O) has been a standard for an MPPT, but it introduces a tradeoff between the tacking speed and the accuracy of the maximum power delivered. The P&O algorithm is not suitable for a rapid environmental condition change by partial shading and self-shading due to its tracking time being linear to the length of the voltage range. Some of researches have been worked on fast tracking but they come with internal ad hoc parameters. In this paper, by using the proposed unbounded binary search algorithm for the MPPT, tracking time becomes a logarithmic function of the voltage search range without ad hoc parameters.

  18. Application of 3D Zernike descriptors to shape-based ligand similarity searching

    Directory of Open Access Journals (Sweden)

    Venkatraman Vishwesh

    2009-12-01

    Full Text Available Abstract Background The identification of promising drug leads from a large database of compounds is an important step in the preliminary stages of drug design. Although shape is known to play a key role in the molecular recognition process, its application to virtual screening poses significant hurdles both in terms of the encoding scheme and speed. Results In this study, we have examined the efficacy of the alignment independent three-dimensional Zernike descriptor (3DZD for fast shape based similarity searching. Performance of this approach was compared with several other methods including the statistical moments based ultrafast shape recognition scheme (USR and SIMCOMP, a graph matching algorithm that compares atom environments. Three benchmark datasets are used to thoroughly test the methods in terms of their ability for molecular classification, retrieval rate, and performance under the situation that simulates actual virtual screening tasks over a large pharmaceutical database. The 3DZD performed better than or comparable to the other methods examined, depending on the datasets and evaluation metrics used. Reasons for the success and the failure of the shape based methods for specific cases are investigated. Based on the results for the three datasets, general conclusions are drawn with regard to their efficiency and applicability. Conclusion The 3DZD has unique ability for fast comparison of three-dimensional shape of compounds. Examples analyzed illustrate the advantages and the room for improvements for the 3DZD.

  19. Early Visual Tagging: Effects of Target-Distractor Similarity and Old Age on Search, Subitization, and Counting

    Science.gov (United States)

    Watson, Derrick G.; Maylor, Elizabeth A.; Allen, Gareth E. J.; Bruce, Lucy A. M.

    2007-01-01

    Three experiments examined the effects of target-distractor (T-D) similarity and old age on the efficiency of searching for single targets and enumerating multiple targets. Experiment 1 showed that increasing T-D similarity selectively reduced the efficiency of enumerating small (less than 4) numerosities (subitizing) but had little effect on…

  20. Target-distractor similarity has a larger impact on visual search in school-age children than spacing.

    Science.gov (United States)

    Huurneman, Bianca; Boonstra, F Nienke

    2015-01-22

    In typically developing children, crowding decreases with increasing age. The influence of target-distractor similarity with respect to orientation and element spacing on visual search performance was investigated in 29 school-age children with normal vision (4- to 6-year-olds [N = 16], 7- to 8-year-olds [N = 13]). Children were instructed to search for a target E among distractor Es (feature search: all flanking Es pointing right; conjunction search: flankers in three orientations). Orientation of the target was manipulated in four directions: right (target absent), left (inversed), up, and down (vertical). Spacing was varied in four steps: 0.04°, 0.5°, 1°, and 2°. During feature search, high target-distractor similarity had a stronger impact on performance than spacing: Orientation affected accuracy until spacing was 1°, and spacing only influenced accuracy for identifying inversed targets. Spatial analyses showed that orientation affected oculomotor strategy: Children made more fixations in the "inversed" target area (4.6) than the vertical target areas (1.8 and 1.9). Furthermore, age groups differed in fixation duration: 4- to 6-year-old children showed longer fixation durations than 7- to 8-year-olds at the two largest element spacings (p = 0.039 and p = 0.027). Conjunction search performance was unaffected by spacing. Four conclusions can be drawn from this study: (a) Target-distractor similarity governs visual search performance in school-age children, (b) children make more fixations in target areas when target-distractor similarity is high, (c) 4- to 6-year-olds show longer fixation durations than 7- to 8-year-olds at 1° and 2° element spacing, and (d) spacing affects feature but not conjunction search-a finding that might indicate top-down control ameliorates crowding in children.

  1. Efficient Retrieval of Images for Search Engine by Visual Similarity and Re Ranking

    Directory of Open Access Journals (Sweden)

    Viswa S S

    2013-06-01

    Full Text Available Nowadays, web scale image search engines (e.g.Google Image Search, Microsoft Live ImageSearch rely almost purely on surrounding textfeatures. Users type keywords in hope of finding acertain type of images. The search engine returnsthousands of images ranked by the text keywordsextracted from the surrounding text. However,many of returned images are noisy, disorganized, orirrelevant. Even Google and Microsoft have noVisual Information for searching of images. Usingvisual information to re rank and improve textbased image search results is the idea. Thisimproves the precision of the text based imagesearch ranking by incorporating the informationconveyed by the visual modality.The typicalassumption that the top-images in the text-basedsearch result are equally relevant is relaxed bylinking the relevance of the images to their initialrank positions. Then, a number of images from theinitial search result are employed as the prototypesthat serve to visually represent the query and thatare subsequently used to construct meta re rankers.i.e. The most relevant images are found by visualsimilarity and the average scores are calculated. Byapplying different meta re rankers to an image fromthe initial result, re ranking scores are generated,which are then used to find the new rank positionfor an image in the re ranked search result.Humansupervision is introduced to learn the model weightsoffline, prior to the online re ranking process. Whilemodel learning requires manual labelling of theresults for a few queries, the resulting model isquery independent and therefore applicable to anyother query. The experimental results on arepresentative web image search dataset comprising353 queries demonstrate that the proposed methodoutperforms the existing supervised andunsupervised Re ranking approaches. Moreover, itimproves the performance over the text-based imagesearch engine by morethan 25.48%

  2. Accurate Image Search using Local Descriptors into a Compact Image Representation

    Directory of Open Access Journals (Sweden)

    Soumia Benkrama

    2013-01-01

    Full Text Available Progress in image retrieval by using low-level features, such as colors, textures and shapes, the performance is still unsatisfied as there are existing gaps between low-level features and high-level semantic concepts. In this work, we present an improved implementation for the bag of visual words approach. We propose a image retrieval system based on bag-of-features (BoF model by using scale invariant feature transform (SIFT and speeded up robust features (SURF. In literature SIFT and SURF give of good results. Based on this observation, we decide to use a bag-of-features approach over quaternion zernike moments (QZM. We compare the results of SIFT and SURF with those of QZM. We propose an indexing method for content based search task that aims to retrieve collection of images and returns a ranked list of objects in response to a query image. Experimental results with the Coil-100 and corel-1000 image database, demonstrate that QZM produces a better performance than known representations (SIFT and SURF.

  3. In Search of an Accurate Evaluation of Intrahepatic Cholestasis of Pregnancy

    Directory of Open Access Journals (Sweden)

    Manuela Martinefski

    2012-01-01

    Full Text Available Until now, biochemical parameter for diagnosis of intrahepatic cholestasis of pregnancy (ICP mostly used is the rise of total serum bile acids (TSBA above the upper normal limit of 11 μM. However, differential diagnosis is very difficult since overlapped values calculated on bile acids determinations, are observed in different conditions of pregnancy including the benign condition of pruritus gravidarum. The aim of this work was to determine the better markers in ICP for a precise diagnosis together with parameters associated with severity of symptoms and treatment evaluation. Serum bile acid profiles were evaluated using capillary electrophoresis in 38 healthy pregnant women and 32 ICP patients and it was calculated the sensitivity, specificity, accuracy, predictive values and the relationships of certain individual bile acids in pregnant women in order to replace TSBA determinations. The evaluation of the results shows that LCA and UDCA/LCA ratio provided information for a more complete and accurate diagnosis and evaluation of ICP than calculation of solely TSBA levels in pregnant women.

  4. Target-distractor similarity has a larger impact on visual search in school-age children than spacing

    NARCIS (Netherlands)

    Huurneman, B.; Boonstra, F.N.

    2015-01-01

    In typically developing children, crowding decreases with increasing age. The influence of target-distractor similarity with respect to orientation and element spacing on visual search performance was investigated in 29 school-age children with normal vision (4- to 6-year-olds [N = 16], 7- to 8-year

  5. Breast cancer stories on the internet : improving search facilities to help patients find stories of similar others

    NARCIS (Netherlands)

    Overberg, Regina Ingrid

    2013-01-01

    The primary aim of this thesis is to gain insight into which search facilities for spontaneously published stories facilitate breast cancer patients in finding stories by other patients in a similar situation. According to the narrative approach, social comparison theory, and social cognitive theory

  6. Finding and Reusing Learning Materials with Multimedia Similarity Search and Social Networks

    Science.gov (United States)

    Little, Suzanne; Ferguson, Rebecca; Ruger, Stefan

    2012-01-01

    The authors describe how content-based multimedia search technologies can be used to help learners find new materials and learning pathways by identifying semantic relationships between educational resources in a social learning network. This helps users--both learners and educators--to explore and find material to support their learning aims.…

  7. FSim: A Novel Functional Similarity Search Algorithm and Tool for Discovering Functionally Related Gene Products

    OpenAIRE

    2014-01-01

    Background. During the analysis of genomics data, it is often required to quantify the functional similarity of genes and their products based on the annotation information from gene ontology (GO) with hierarchical structure. A flexible and user-friendly way to estimate the functional similarity of genes utilizing GO annotation is therefore highly desired. Results. We proposed a novel algorithm using a level coefficient-weighted model to measure the functional similarity of gene products base...

  8. Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction

    DEFF Research Database (Denmark)

    Wichterich, Marc; Assent, Ira; Philipp, Kranen;

    2008-01-01

    The Earth Mover's Distance (EMD) was developed in computer vision as a flexible similarity model that utilizes similarities in feature space to define a high quality similarity measure in feature representation space. It has been successfully adopted in a multitude of applications with low to...... dimensionality reduction techniques for the EMD in a filter-and-refine architecture for efficient lossless retrieval. Thorough experimental evaluation on real world data sets demonstrates a substantial reduction of the number of expensive high-dimensional EMD computations and thus remarkably faster response...

  9. A Commodity Information Search Model of E-Commerce Search Engine Based on Semantic Similarity and Multi-Attribute Decision Method

    OpenAIRE

    Ziming Zeng

    2010-01-01

    The paper presented an intelligent commodity information search model, which integrates semantic retrieval andmulti-attribute decision method. First, semantic similarity is computed by constructing semantic vector-space, inorder to realize the semantic consistency between retrieved result and customer’s query. Besides, TOPSISmethod is also utilized to construct the comparison mechanism of commodity by calculating the utility value ofeach retrieved commodity. Finally, the experiment is conduct...

  10. Proposal for a Similar Question Search System on a Q&A Site

    Directory of Open Access Journals (Sweden)

    Katsutoshi Kanamori

    2014-06-01

    Full Text Available There is a service to help Internet users obtain answers to specific questions when they visit a Q&A site. A Q&A site is very useful for the Internet user, but posted questions are often not answered immediately. This delay in answering occurs because in most cases another site user is answering the question manually. In this study, we propose a system that can present a question that is similar to a question posted by a user. An advantage of this system is that a user can refer to an answer to a similar question. This research measures the similarity of a candidate question based on word and dependency parsing. In an experiment, we examined the effectiveness of the proposed system for questions actually posted on the Q&A site. The result indicates that the system can show the questioner the answer to a similar question. However, the system still has a number of aspects that should be improved.

  11. Integrating structure- and ligand-based virtual screening: comparison of individual, parallel, and fused molecular docking and similarity search calculations on multiple targets.

    Science.gov (United States)

    Tan, Lu; Geppert, Hanna; Sisay, Mihiret T; Gütschow, Michael; Bajorath, Jürgen

    2008-10-01

    Similarity searching is often used to preselect compounds for docking, thereby decreasing the size of screening databases. However, integrated structure- and ligand-based screening schemes are rare at present. Docking and similarity search calculations using 2D fingerprints were carried out in a comparative manner on nine target enzymes, for which significant numbers of diverse inhibitors could be obtained. In the absence of knowledge-based docking constraints and target-directed parameter optimisation, fingerprint searching displayed a clear preference over docking calculations. Alternative combinations of docking and similarity search results were investigated and found to further increase compound recall of individual methods in a number of instances. When the results of similarity searching and docking were combined, parallel selection of candidate compounds from individual rankings was generally superior to rank fusion. We suggest that complementary results from docking and similarity searching can be captured by integrated compound selection schemes. PMID:18651695

  12. SIMILARITY SEARCH FOR TRAJECTORIES OF RFID TAGS IN SUPPLY CHAIN TRAFFIC

    Directory of Open Access Journals (Sweden)

    Sabu Augustine

    2016-06-01

    Full Text Available In this fast developing period the use of RFID have become more significant in many application domaindue to drastic cut down in the price of the RFID tags. This technology is evolving as a means of tracking objects and inventory items. One such diversified application domain is in Supply Chain Management where RFID is being applied as the manufacturers and distributers need to analyse product and logistic information in order to get the right quantity of products arriving at the right time to the right locations. Usually the RFID tag information collected from RFID readers is stored in remote database and the RFID data is being analyzed by querying data from this database based on path encoding method by the property of prime numbers. In this paper we propose an improved encoding scheme that encodes the flows of objects in RFID tag movement. A Trajectory of moving RFID tags consists of a sequence of tagsthat changes over time. With the integration of wireless communications and positioning technologies, the concept of Trajectory Database has become increasingly important, and has posed great challenges to the data mining community.The support of efficient trajectory similarity techniques is indisputably very important for the quality of data analysis tasks in Supply Chain Traffic which will enable similar product movements.

  13. Web Similarity

    OpenAIRE

    Cohen, Andrew; Vitányi, Paul

    2015-01-01

    Normalized web distance (NWD) is a similarity or normalized semantic distance based on the World Wide Web or any other large electronic database, for instance Wikipedia, and a search engine that returns reliable aggregate page counts. For sets of search terms the NWD gives a similarity on a scale from 0 (identical) to 1 (completely different). The NWD approximates the similarity according to all (upper semi)computable properties. We develop the theory and give applications. The derivation of ...

  14. SimSearch : a new variant of dynamic programming based on distance series for optimal and near-optimal similarity discovery in biological sequences

    OpenAIRE

    Sérgio DEUSDADO; Carvalho, Paulo

    2009-01-01

    In this paper, we propose SimSearch, an algorithm implementing a new variant of dynamic programming based on distance series for optimal and near-optimal similarity discovery in biological sequences. The initial phase of SimSearch is devoted to fulfil the binary similarity matrices by signalling the distances between occurrences of the same symbol. The scoring scheme is further applied, when analysed the maximal extension of the pattern. Employing bit parallelism to analyse the global similar...

  15. Improving performance of content-based image retrieval schemes in searching for similar breast mass regions: an assessment

    International Nuclear Information System (INIS)

    This study aims to assess three methods commonly used in content-based image retrieval (CBIR) schemes and investigate the approaches to improve scheme performance. A reference database involving 3000 regions of interest (ROIs) was established. Among them, 400 ROIs were randomly selected to form a testing dataset. Three methods, namely mutual information, Pearson's correlation and a multi-feature-based k-nearest neighbor (KNN) algorithm, were applied to search for the 15 'the most similar' reference ROIs to each testing ROI. The clinical relevance and visual similarity of searching results were evaluated using the areas under receiver operating characteristic (ROC) curves (AZ) and average mean square difference (MSD) of the mass boundary spiculation level ratings between testing and selected ROIs, respectively. The results showed that the AZ values were 0.893 ± 0.009, 0.606 ± 0.021 and 0.699 ± 0.026 for the use of KNN, mutual information and Pearson's correlation, respectively. The AZ values increased to 0.724 ± 0.017 and 0.787 ± 0.016 for mutual information and Pearson's correlation when using ROIs with the size adaptively adjusted based on actual mass size. The corresponding MSD values were 2.107 ± 0.718, 2.301 ± 0.733 and 2.298 ± 0.743. The study demonstrates that due to the diversity of medical images, CBIR schemes using multiple image features and mass size-based ROIs can achieve significantly improved performance.

  16. Efficient generation, storage, and manipulation of fully flexible pharmacophore multiplets and their use in 3-D similarity searching.

    Science.gov (United States)

    Abrahamian, Edmond; Fox, Peter C; Naerum, Lars; Christensen, Inge Thøger; Thøgersen, Henning; Clark, Robert D

    2003-01-01

    Pharmacophore triplets and quartets have been used by many groups in recent years, primarily as a tool for molecular diversity analysis. In most cases, slow processing speeds and the very large size of the bitsets generated have forced researchers to compromise in terms of how such multiplets were stored, manipulated, and compared, e.g., by using simple unions to represent multiplets for sets of molecules. Here we report using bitmaps in place of bitsets to reduce storage demands and to improve processing speed. Here, a bitset is taken to mean a fully enumerated string of zeros and ones, from which a compressed bitmap is obtained by replacing uniform blocks ("runs") of digits in the bitset with a pair of values identifying the content and length of the block (run-length encoding compression). High-resolution multiplets involving four features are enabled by using 64 bit executables to create and manipulate bitmaps, which "connect" to the 32 bit executables used for database access and feature identification via an extensible mark-up language (XML) data stream. The encoding system used supports simple pairs, triplets, and quartets; multiplets in which a privileged substructure is used as an anchor point; and augmented multiplets in which an additional vertex is added to represent a contingent feature such as a hydrogen bond extension point linked to a complementary feature (e.g., a donor or an acceptor atom) in a base pair or triplet. It can readily be extended to larger, more complex multiplets as well. Database searching is one particular potential application for this technology. Consensus bitmaps built up from active ligands identified in preliminary screening can be used to generate hypothesis bitmaps, a process which includes allowance for differential weighting to allow greater emphasis to be placed on bits arising from multiplets expected to be particularly discriminating. Such hypothesis bitmaps are shown to be useful queries for database searching

  17. Retrieval of very large numbers of items in the Web of Science: an exercise to develop accurate search strategies

    CERN Document Server

    Arencibia-Jorge, Ricardo; Chinchilla-Rodriguez, Zaida; Rousseau, Ronald; Paris, Soren W

    2009-01-01

    The current communication presents a simple exercise with the aim of solving a singular problem: the retrieval of extremely large amounts of items in the Web of Science interface. As it is known, Web of Science interface allows a user to obtain at most 100,000 items from a single query. But what about queries that achieve a result of more than 100,000 items? The exercise developed one possible way to achieve this objective. The case study is the retrieval of the entire scientific production from the United States in a specific year. Different sections of items were retrieved using the field Source of the database. Then, a simple Boolean statement was created with the aim of eliminating overlapping and to improve the accuracy of the search strategy. The importance of team work in the development of advanced search strategies was noted.

  18. eF-seek: prediction of the functional sites of proteins by searching for similar electrostatic potential and molecular surface shape

    OpenAIRE

    Kinoshita, Kengo; Murakami, Yoichi; Nakamura, Haruki

    2007-01-01

    We have developed a method to predict ligand-binding sites in a new protein structure by searching for similar binding sites in the Protein Data Bank (PDB). The similarities are measured according to the shapes of the molecular surfaces and their electrostatic potentials. A new web server, eF-seek, provides an interface to our search method. It simply requires a coordinate file in the PDB format, and generates a prediction result as a virtual complex structure, with the putative ligands in a ...

  19. Similarity search for multivariate time series based on B+-tree index%基于B+-tree索引的多元时间序列相似查询

    Institute of Scientific and Technical Information of China (English)

    郭小芳; 李锋; 叶华

    2013-01-01

    To improve similarity search efficiency for multivariate time series datasets, distance-based index structure(Dbis) for similarity search is introduced. The dimension of MTS database is reduced by Principal Component Analysis(PCA)method, and the principal component of MTS are clustered, and the MTS items are mapped into one dimensional space based on clustering centre of each partition, on B+-tree indexing configuration, k MTS items are found out as most similar MTS sequences for given MTS sequence. Experimental results show that the proposed algorithm detects similar MTS more accurately and efficiently.%为提高多元时间序列相似查询执行效率,采用了基于距离索引结构的相似查询算法。利用主成分分析方法对多元时间序列数据降维并在此基础上进行聚类,以聚类质心为参考点,将各类变换到一维空间,利用B+-tree结构进行索引查询,找到与查询序列最相似的k个MTS序列。实验表明查询效率和准确性都有比较大的提高。

  20. Developing Molecular Interaction Database and Searching for Similar Pathways (MOLECULAR BIOLOGY AND INFORMATION-Biological Information Science)

    OpenAIRE

    Kawashima, Shuichi; Katayama, Toshiaki; Kanehisa, Minoru

    1998-01-01

    We have developed a database named BRITE, which contains knowledge of interacting molecules and/or genes concering cell cycle and early development. Here, we report an overview of the database and the method of automatic search for functionally common sub-pathways between two biological pathways in BRITE.

  1. Topology-based document similarity search algorithm%一种基于文档拓扑的相似性搜索算法

    Institute of Scientific and Technical Information of China (English)

    杨艳; 朱戈; 范文彬

    2011-01-01

    Searching for similar documents from the large number of documents quickly and efficiently is an important and time-consuming problem.The existing algorithms first find the candidate document set, and then sort them based on a document related evaluation to identify the most relevant ones.A topology-based document similarity search algorithm--Hub-Nis put forward, and the document similarity search problem is transformed into graph search problem, applying the pruning techniques, reducing the scope of scanned documents, and significantly improving retrieval efficiency.lt proves to be effective and feasible through experiment.%从海量文档中快速有效地搜索到相似文档是一个重要且耗时的问题.现有的文档相似性搜索算法是先找出候选文档集,再对候选文档进行相关性排序,找出最相关的文档.提出了一种基于文档拓扑的相似性搜索算法-Hub-N,将文档相似性搜索问题转化为图搜索问题,应用相应的剪枝技术,缩小了扫描文档的范围,提高了搜索效率.通过实验验证了算法的有效性和可行性.

  2. eF-seek: prediction of the functional sites of proteins by searching for similar electrostatic potential and molecular surface shape

    Science.gov (United States)

    Kinoshita, Kengo; Murakami, Yoichi; Nakamura, Haruki

    2007-01-01

    We have developed a method to predict ligand-binding sites in a new protein structure by searching for similar binding sites in the Protein Data Bank (PDB). The similarities are measured according to the shapes of the molecular surfaces and their electrostatic potentials. A new web server, eF-seek, provides an interface to our search method. It simply requires a coordinate file in the PDB format, and generates a prediction result as a virtual complex structure, with the putative ligands in a PDB format file as the output. In addition, the predicted interacting interface is displayed to facilitate the examination of the virtual complex structure on our own applet viewer with the web browser (URL: http://eF-site.hgc.jp/eF-seek). PMID:17567616

  3. Using argumentation to retrieve articles with similar citations: an inquiry into improving related articles search in the MEDLINE digital library.

    Science.gov (United States)

    Tbahriti, Imad; Chichester, Christine; Lisacek, Frédérique; Ruch, Patrick

    2006-06-01

    The aim of this study is to investigate the relationships between citations and the scientific argumentation found abstracts. We design a related article search task and observe how the argumentation can affect the search results. We extracted citation lists from a set of 3200 full-text papers originating from a narrow domain. In parallel, we recovered the corresponding MEDLINE records for analysis of the argumentative moves. Our argumentative model is founded on four classes: PURPOSE, METHODS, RESULTS and CONCLUSION. A Bayesian classifier trained on explicitly structured MEDLINE abstracts generates these argumentative categories. The categories are used to generate four different argumentative indexes. A fifth index contains the complete abstract, together with the title and the list of Medical Subject Headings (MeSH) terms. To appraise the relationship of the moves to the citations, the citation lists were used as the criteria for determining relatedness of articles, establishing a benchmark; it means that two articles are considered as "related" if they share a significant set of co-citations. Our results show that the average precision of queries with the PURPOSE and CONCLUSION features is the highest, while the precision of the RESULTS and METHODS features was relatively low. A linear weighting combination of the moves is proposed, which significantly improves retrieval of related articles.

  4. PHASE-RESOLVED INFRARED SPECTROSCOPY AND PHOTOMETRY OF V1500 CYGNI, AND A SEARCH FOR SIMILAR OLD CLASSICAL NOVAE

    International Nuclear Information System (INIS)

    We present phase-resolved near-infrared photometry and spectroscopy of the classical nova (CN) V1500 Cyg to explore whether cyclotron emission is present in this system. While the spectroscopy do not indicate the presence of discrete cyclotron harmonic emission, the light curves suggest that a sizable fraction of its near-infrared fluxes are due to this component. The light curves of V1500 Cyg appear to remain dominated by emission from the heated face of the secondary star in this system. We have used infrared spectroscopy and photometry to search for other potential magnetic systems among old CNe. We have found that the infrared light curves of V1974 Cyg superficially resemble those of V1500 Cyg, suggesting a highly irradiated companion. The old novae V446 Her and QV Vul have light curves with large amplitude variations like those seen in polars, suggesting they might have magnetic primaries. We extract photometry for 79 old novae from the Two Micron All Sky Survey Point Source Catalog and use those data to derive the mean, un-reddened infrared colors of quiescent novae. We also extract WISE data for these objects and find that 45 of them were detected. Surprisingly, a number of these systems were detected in the WISE 22 μm band. While two of those objects produced significant dust shells (V705 Cas and V445 Pup), the others did not. It appears that line emission from their ionized ejected shells is the most likely explanation for those detections

  5. Improving gene expression similarity measurement using pathway-based analytic dimension

    OpenAIRE

    2009-01-01

    Background Gene expression similarity measuring methods were developed and applied to search rapidly growing public microarray databases. However, current expression similarity measuring methods need to be improved to accurately measure similarity between gene expression profiles from different platforms or different experiments. Results We devised new gene expression similarity measuring method based on pathway information. In short, newly devised method measure similarity between gene expre...

  6. QAR数据多维子序列的相似性搜索%Similarity search for multidimensional QAR data subsequence

    Institute of Scientific and Technical Information of China (English)

    杨慧; 张国振

    2013-01-01

    High dimensionality of QAR and the uncertain relevance among them which make the method to do the similarity search for time series in the low dimensionality are no longer applicable in such situation. Taking into account the specificity of the civil aviation industry, with the similarity search for QAR to ascertain the plane faults requires a special definition of the similarity. In this paper, expertise and analytic hierarchy process algorithm are combined to be used to calculate the weightiness of different dimensionalities for the plane fault. It translates the QAR data with the symbolic method, and then builds a k-d tree index, which makes it possible to do the similarity search on multidimensional QAR data subsequences. Shape and distance are used toghther to define similarity. The high precision and the low cost are proved by the experiments in this paper.%QAR数据的高维度以及维度之间不确定的相互关联性,使得原有低维空间上度量时间序列的相似性的方法不再适用,另一方面由于民航行业的特殊性,利用QAR数据进行相似性搜索来确定飞行故障,对相似性的定义也有特殊的要求.通过专家经验结合一种层次分析算法来确定飞行故障所关联的属性维度的重要性,对QAR数据的多维子序列进行符号化表示,并利用k-d树的特殊性质建立索引,使QAR数据多维子序列的快速相似性搜索成为可能,结合形状和距离对相似性进行定义和度量,实验证明查找速度快,准确度较为满意.

  7. SPOT-Ligand: Fast and effective structure-based virtual screening by binding homology search according to ligand and receptor similarity.

    Science.gov (United States)

    Yang, Yuedong; Zhan, Jian; Zhou, Yaoqi

    2016-07-01

    Structure-based virtual screening usually involves docking of a library of chemical compounds onto the functional pocket of the target receptor so as to discover novel classes of ligands. However, the overall success rate remains low and screening a large library is computationally intensive. An alternative to this "ab initio" approach is virtual screening by binding homology search. In this approach, potential ligands are predicted based on similar interaction pairs (similarity in receptors and ligands). SPOT-Ligand is an approach that integrates ligand similarity by Tanimoto coefficient and receptor similarity by protein structure alignment program SPalign. The method was found to yield a consistent performance in DUD and DUD-E docking benchmarks even if model structures were employed. It improves over docking methods (DOCK6 and AUTODOCK Vina) and has a performance comparable to or better than other binding-homology methods (FINDsite and PoLi) with higher computational efficiency. The server is available at http://sparks-lab.org. © 2016 Wiley Periodicals, Inc. PMID:27074979

  8. Similarity Search in Document Collections

    OpenAIRE

    Jordanov, Dimitar Dimitrov

    2009-01-01

    Hlavním cílem této práce je odhadnout výkonnost volně šířeni balík  Sémantický Vektory a třída MoreLikeThis z balíku Apache Lucene. Tato práce nabízí porovnání těchto dvou přístupů a zavádí metody, které mohou vést ke zlepšení kvality vyhledávání.

  9. Compression-based similarity

    OpenAIRE

    Vitányi, Paul

    2011-01-01

    First we consider pair-wise distances for literal objects consisting of finite binary files. These files are taken to contain all of their meaning, like genomes or books. The distances are based on compression of the objects concerned, normalized, and can be viewed as similarity distances. Second, we consider pair-wise distances between names of objects, like "red" or "christianity." In this case the distances are based on searches of the Internet. Such a search can be performed by any search...

  10. Similarity Search in Data Stream with Adaptive Segmental Approximations%基于适应性分段估计的数据流相似性搜索

    Institute of Scientific and Technical Information of China (English)

    吴枫; 仲妍; 吴泉源; 贾焰; 杨树强

    2009-01-01

    Similarity search has attracted many researchers from various communities (real-time stock quotes, network security, sensor networks). Due to the infinite, continuous, fast and real-time properties of the data from these communities, a method is needed for online similarity search in data stream. This paper first proposes the lower bound function LB_seg_WF_(global) for DTW (dynamic time warping) in the presence of global warping constraints and LB_seg_WF for DTW without global warping constraints, which are not applied to any index structures. They are segmented DTW techniques, and can be applied to sequences and queries of varying lengths in data stream. Next, several tighter lower bounds are proposed to improve the approximate degree of the LB_seg_WF_(global) and LB_seg_WF. Finally, to deal with the possible continuously non-effective problem of LB_seg_WF_(global) or LB_seg_WF in data stream, it is believed that lower-bound LB_WF_(global) (in the presence of global warping constraints) and lower-bound LB_WF, upper-bound UB_WF (without global warping constraints) can fast estimate DTW and hence reduce a lot of redundant computations by incrementally computing. The theoretical analysis and statistical experiments confirm the validity of the proposed methods.%相似性搜索在股票交易行情、网络安全、传感器网络等众多领域应用广泛.由于这些领域中产生的数据具有无限的、连续的、快速的、实时的特性,所以需要适合数据流上的在线相似性搜索算法.首先,在具有或不具有全局约束条件下,分别提出了没有索引结构的DTW(dynamic time warping)下限函数LB_seg_WF_(global)和LB_seg_WF,它们是一种分段DTW技术,能够处理数据流上的非等长序列间在线相似性匹配问题.然后,为了进一步提高LB_seg_WF_(global)和LB_seg_WF的近似程度,提出了一系列的改进方法.最后,针对流上使用LB_seg_WF_(global)或LB_seg_WF可能会出现连续失效的情况,分别提

  11. Textual Spatial Cosine Similarity

    OpenAIRE

    Crocetti, Giancarlo

    2015-01-01

    When dealing with document similarity many methods exist today, like cosine similarity. More complex methods are also available based on the semantic analysis of textual information, which are computationally expensive and rarely used in the real time feeding of content as in enterprise-wide search environments. To address these real-time constraints, we developed a new measure of document similarity called Textual Spatial Cosine Similarity, which is able to detect similitude at the semantic ...

  12. Concept Search

    OpenAIRE

    Giunchiglia, Fausto; Kharkevich, Uladzimir; Zaihrayeu, Ilya

    2008-01-01

    In this paper we present a novel approach, called Concept Search, which extends syntactic search, i.e., search based on the computation of string similarity between words, with semantic search, i.e., search based on the computation of semantic relations between concepts. The key idea of Concept Search is to operate on complex concepts and to maximally exploit the semantic information available, reducing to syntactic search only when necessary, i.e., when no semantic information is available. ...

  13. Modal Similarity

    OpenAIRE

    Vigo , Dr. Ronaldo

    2009-01-01

    Just as Boolean rules define Boolean categories, the Boolean operators define higher-order Boolean categories referred to as modal categories. We examine the similarity order between these categories and the standard category of logical identity (i.e. the modal category defined by the biconditional or equivalence operator). Our goal is 4-fold: first, to introduce a similarity measure for determining this similarity order; second, to show that such a measure is a good predictor of the similari...

  14. An Accurate FOA and TOA Estimation Algorithm for Galileo Search and Rescue Signal%伽利略搜救信号FOA和TOA精确估计算法

    Institute of Scientific and Technical Information of China (English)

    王堃; 吴嗣亮; 韩月涛

    2011-01-01

    According to the high precision demand of Frequency of Arrival(FOA) and Time of Arrival(TOA) estimation in Galileo search and rescue(SAR) system and considering the fact that the message bit width is unknown in real received beacons,a new FOA and TOA estimation algorithm which combines the multi-dimensional joint maximum likelihood estimation algorithm and barycenter calculation algorithm is proposed.The principle of the algorithm is derived after the signal model is introduced,and the concrete realization of the estimation algorithm is given.Monte Carlo simulation results and measurement results show that when CNR equals the threshold of 34.8 dBHz,FOA and TOA estimation rmse(root-mean-square error) of this algorithm are respectively within 0.03 Hz and 9.5 μs,which are better than the system requirements of 0.05 Hz and 11 μs.This algorithm has been applied to the Galileo Medium-altitude Earth Orbit Local User Terminal(MEOLUT station).%针对伽利略搜救系统中到达频率(FOA)和到达时间(TOA)高精度估计的需求,考虑到实际接收的信标信号中信息位宽未知的情况,提出了多维联合极大似然估计算法和体积重心算法相结合的FOA和TOA估计算法。在介绍信号模型的基础上推导了算法原理,给出了估计算法的具体实现过程。Monte Carlo仿真和实测结果表明,在34.8 dBHz的处理门限下,该算法得到的FOA和TOA估计的均方根误差分别小于0.03 Hz和9.5μs,优于0.05 Hz和11μs的系统指标要求。该算法目前已应用于伽利略中轨卫星地面用户终端(MEOLUT地面站)。

  15. Combination of 2D/3D Ligand-Based Similarity Search in Rapid Virtual Screening from Multimillion Compound Repositories. Selection and Biological Evaluation of Potential PDE4 and PDE5 Inhibitors

    Directory of Open Access Journals (Sweden)

    Krisztina Dobi

    2014-05-01

    Full Text Available Rapid in silico selection of target focused libraries from commercial repositories is an attractive and cost effective approach. If structures of active compounds are available rapid 2D similarity search can be performed on multimillion compound databases but the generated library requires further focusing by various 2D/3D chemoinformatics tools. We report here a combination of the 2D approach with a ligand-based 3D method (Screen3D which applies flexible matching to align reference and target compounds in a dynamic manner and thus to assess their structural and conformational similarity. In the first case study we compared the 2D and 3D similarity scores on an existing dataset derived from the biological evaluation of a PDE5 focused library. Based on the obtained similarity metrices a fusion score was proposed. The fusion score was applied to refine the 2D similarity search in a second case study where we aimed at selecting and evaluating a PDE4B focused library. The application of this fused 2D/3D similarity measure led to an increase of the hit rate from 8.5% (1st round, 47% inhibition at 10 µM to 28.5% (2nd round at 50% inhibition at 10 µM and the best two hits had 53 nM inhibitory activities.

  16. Active browsing using similarity pyramids

    Science.gov (United States)

    Chen, Jau-Yuen; Bouman, Charles A.; Dalton, John C.

    1998-12-01

    In this paper, we describe a new approach to managing large image databases, which we call active browsing. Active browsing integrates relevance feedback into the browsing environment, so that users can modify the database's organization to suit the desired task. Our method is based on a similarity pyramid data structure, which hierarchically organizes the database, so that it can be efficiently browsed. At coarse levels, the similarity pyramid allows users to view the database as large clusters of similar images. Alternatively, users can 'zoom into' finer levels to view individual images. We discuss relevance feedback for the browsing process, and argue that it is fundamentally different from relevance feedback for more traditional search-by-query tasks. We propose two fundamental operations for active browsing: pruning and reorganization. Both of these operations depend on a user-defined relevance set, which represents the image or set of images desired by the user. We present statistical methods for accurately pruning the database, and we propose a new 'worm hole' distance metric for reorganizing the database, so that members of the relevance set are grouped together.

  17. Cognitive residues of similarity

    OpenAIRE

    OToole, Stephanie; Keane, Mark T.

    2013-01-01

    What are the cognitive after-effects of making a similarity judgement? What, cognitively, is left behind and what effect might these residues have on subsequent processing? In this paper, we probe for such after-effects using a visual search task, performed after a task in which pictures of real-world objects were compared. So, target objects were first presented in a comparison task (e.g., rate the similarity of this object to another) thus, presumably, modifying some of their features befor...

  18. Including Biological Literature Improves Homology Search

    OpenAIRE

    Chang, Jeffrey T.; Raychaudhuri, Soumya; Altman, Russ B

    2001-01-01

    Annotating the tremendous amount of sequence information being generated requires accurate automated methods for recognizing homology. Although sequence similarity is only one of many indicators of evolutionary homology, it is often the only one used. Here we find that supplementing sequence similarity with information from biomedical literature is successful in increasing the accuracy of homology search results. We modified the PSI-BLAST algorithm to use literature similarity in each iterati...

  19. A kind of efficient similarity search algorithm based on B+-tree index%一种基于B+-tree索引的有效相似查询算法

    Institute of Scientific and Technical Information of China (English)

    郭小芳; 叶华

    2012-01-01

    A multivariate time series similarity search algorithm is proposed. Distance-based index structure(Dbis) for similarity search, principal component analysis (PCA) method, and the principal component of MTS were clustered, and the MTS items were mapped into one dimensional space based on clustering centre of each partition, on B+-tree indexing configuration, k MTS items were find out as most similar MTS sequences for given MTS sequence. The experimental results show that candidate ratio and query time of this algorithm was significantly lower than that of Muse algorithm, and the candidate ratio and querying time are not affected significantly by the number of clusters, the algorithm has certain superiority in comparison with other algorithm.%提出了一种多元时间序列相似查询算法.在距离索引结构相似查询算法的基础上,利用主成分分析方法对多元时间序列进行降维,并对主成分进行聚类,在聚类质心与各类之间的范数所构成的一维空间上,对聚类建立B+-tree索引结构,然后利用k近邻查询算法查找出与查询序列最相似的k个MTS序列.实验结果表明,文中算法的候选比率与查询时间明显低于Muse算法,且候选比率与查询时间受聚类个数影响不大,说明文中算法具有一定的优越性.

  20. Gene functional similarity search tool (GFSST)

    OpenAIRE

    Russo James J; Sheng Huitao; Zhang Jinghui; Zhang Peisen; Osborne Brian; Buetow Kenneth

    2006-01-01

    Abstract Background With the completion of the genome sequences of human, mouse, and other species and the advent of high throughput functional genomic research technologies such as biomicroarray chips, more and more genes and their products have been discovered and their functions have begun to be understood. Increasing amounts of data about genes, gene products and their functions have been stored in databases. To facilitate selection of candidate genes for gene-disease research, genetic as...

  1. Personalized Search

    CERN Document Server

    AUTHOR|(SzGeCERN)749939

    2015-01-01

    As the volume of electronically available information grows, relevant items become harder to find. This work presents an approach to personalizing search results in scientific publication databases. This work focuses on re-ranking search results from existing search engines like Solr or ElasticSearch. This work also includes the development of Obelix, a new recommendation system used to re-rank search results. The project was proposed and performed at CERN, using the scientific publications available on the CERN Document Server (CDS). This work experiments with re-ranking using offline and online evaluation of users and documents in CDS. The experiments conclude that the personalized search result outperform both latest first and word similarity in terms of click position in the search result for global search in CDS.

  2. Applying ligands profiling using multiple extended electron distribution based field templates and feature trees similarity searching in the discovery of new generation of urea-based antineoplastic kinase inhibitors.

    Directory of Open Access Journals (Sweden)

    Eman M Dokla

    Full Text Available This study provides a comprehensive computational procedure for the discovery of novel urea-based antineoplastic kinase inhibitors while focusing on diversification of both chemotype and selectivity pattern. It presents a systematic structural analysis of the different binding motifs of urea-based kinase inhibitors and the corresponding configurations of the kinase enzymes. The computational model depends on simultaneous application of two protocols. The first protocol applies multiple consecutive validated virtual screening filters including SMARTS, support vector-machine model (ROC = 0.98, Bayesian model (ROC = 0.86 and structure-based pharmacophore filters based on urea-based kinase inhibitors complexes retrieved from literature. This is followed by hits profiling against different extended electron distribution (XED based field templates representing different kinase targets. The second protocol enables cancericidal activity verification by using the algorithm of feature trees (Ftrees similarity searching against NCI database. Being a proof-of-concept study, this combined procedure was experimentally validated by its utilization in developing a novel series of urea-based derivatives of strong anticancer activity. This new series is based on 3-benzylbenzo[d]thiazol-2(3H-one scaffold which has interesting chemical feasibility and wide diversification capability. Antineoplastic activity of this series was assayed in vitro against NCI 60 tumor-cell lines showing very strong inhibition of GI(50 as low as 0.9 uM. Additionally, its mechanism was unleashed using KINEX™ protein kinase microarray-based small molecule inhibitor profiling platform and cell cycle analysis showing a peculiar selectivity pattern against Zap70, c-src, Mink1, csk and MeKK2 kinases. Interestingly, it showed activity on syk kinase confirming the recent studies finding of the high activity of diphenyl urea containing compounds against this kinase. Allover, the new series

  3. Speaking Fluently And Accurately

    Institute of Scientific and Technical Information of China (English)

    JosephDeVeto

    2004-01-01

    Even after many years of study,students make frequent mistakes in English. In addition, many students still need a long time to think of what they want to say. For some reason, in spite of all the studying, students are still not quite fluent.When I teach, I use one technique that helps students not only speak more accurately, but also more fluently. That technique is dictations.

  4. Accurate Finite Difference Algorithms

    Science.gov (United States)

    Goodrich, John W.

    1996-01-01

    Two families of finite difference algorithms for computational aeroacoustics are presented and compared. All of the algorithms are single step explicit methods, they have the same order of accuracy in both space and time, with examples up to eleventh order, and they have multidimensional extensions. One of the algorithm families has spectral like high resolution. Propagation with high order and high resolution algorithms can produce accurate results after O(10(exp 6)) periods of propagation with eight grid points per wavelength.

  5. A cross-species analysis method to analyze animal models' similarity to human's disease state

    OpenAIRE

    Yu Shuhao; Zheng Lulu; Li Yun; Li Chunyan; Ma Chenchen; Li Yixue; Li Xuan; Hao Pei

    2012-01-01

    Abstract Background Animal models are indispensable tools in studying the cause of human diseases and searching for the treatments. The scientific value of an animal model depends on the accurate mimicry of human diseases. The primary goal of the current study was to develop a cross-species method by using the animal models' expression data to evaluate the similarity to human diseases' and assess drug molecules' efficiency in drug research. Therefore, we hoped to reveal that it is feasible an...

  6. Accurate backgrounds to Higgs production at the LHC

    CERN Document Server

    Kauer, N

    2007-01-01

    Corrections of 10-30% for backgrounds to the H --> WW --> l^+l^-\\sla{p}_T search in vector boson and gluon fusion at the LHC are reviewed to make the case for precise and accurate theoretical background predictions.

  7. Custom Search Engines: Tools & Tips

    Science.gov (United States)

    Notess, Greg R.

    2008-01-01

    Few have the resources to build a Google or Yahoo! from scratch. Yet anyone can build a search engine based on a subset of the large search engines' databases. Use Google Custom Search Engine or Yahoo! Search Builder or any of the other similar programs to create a vertical search engine targeting sites of interest to users. The basic steps to…

  8. Persistent Homology and Partial Similarity of Shapes

    OpenAIRE

    Di Fabio, Barbara; Landi, Claudia

    2011-01-01

    The ability to perform shape retrieval based not only on full similarity, but also partial similarity is a key property for any content-based search engine. We prove that persistence diagrams can reveal a partial similarity between two shapes by showing a common subset of points. This can be explained using the Mayer-Vietoris formulas that we develop for ordinary, relative and extended persistent homology. An experiment outlines the potential of persistence diagrams as shape descriptors in re...

  9. Niche Genetic Algorithm with Accurate Optimization Performance

    Institute of Scientific and Technical Information of China (English)

    LIU Jian-hua; YAN De-kun

    2005-01-01

    Based on crowding mechanism, a novel niche genetic algorithm was proposed which can record evolutionary direction dynamically during evolution. After evolution, the solutions's precision can be greatly improved by means of the local searching along the recorded direction. Simulation shows that this algorithm can not only keep population diversity but also find accurate solutions. Although using this method has to take more time compared with the standard GA, it is really worth applying to some cases that have to meet a demand for high solution precision.

  10. Are Defect Profile Similarity Criteria Different Than Velocity Profile Similarity Criteria for the Turbulent Boundary Layer?

    OpenAIRE

    Weyburne, David

    2015-01-01

    The use of the defect profile instead of the experimentally observed velocity profile for the search for similarity parameters has become firmly imbedded in the turbulent boundary layer literature. However, a search of the literature reveals that there are no theoretical reasons for this defect profile preference over the more traditional velocity profile. In the report herein, we use the flow governing equation approach to develop similarity criteria for the two profiles. Results show that t...

  11. Image Tracking for the High Similarity Drug Tablets Based on Light Intensity Reflective Energy and Artificial Neural Network

    OpenAIRE

    Zhongwei Liang; Liang Zhou; Xiaochu Liu; Xiaogang Wang

    2014-01-01

    It is obvious that tablet image tracking exerts a notable influence on the efficiency and reliability of high-speed drug mass production, and, simultaneously, it also emerges as a big difficult problem and targeted focus during production monitoring in recent years, due to the high similarity shape and random position distribution of those objectives to be searched for. For the purpose of tracking tablets accurately in random distribution, through using surface fitting approach and transition...

  12. Similarity transformations of MAPs

    Directory of Open Access Journals (Sweden)

    Andersen Allan T.

    1999-01-01

    Full Text Available We introduce the notion of similar Markovian Arrival Processes (MAPs and show that the event stationary point processes related to two similar MAPs are stochastically equivalent. This holds true for the time stationary point processes too. We show that several well known stochastical equivalences as e.g. that between the H 2 renewal process and the Interrupted Poisson Process (IPP can be expressed by the similarity transformations of MAPs. In the appendix the valid region of similarity transformations for two-state MAPs is characterized.

  13. Finding Protein and Nucleotide Similarities with FASTA.

    Science.gov (United States)

    Pearson, William R

    2016-01-01

    The FASTA programs provide a comprehensive set of rapid similarity searching tools (fasta36, fastx36, tfastx36, fasty36, tfasty36), similar to those provided by the BLAST package, as well as programs for slower, optimal, local, and global similarity searches (ssearch36, ggsearch36), and for searching with short peptides and oligonucleotides (fasts36, fastm36). The FASTA programs use an empirical strategy for estimating statistical significance that accommodates a range of similarity scoring matrices and gap penalties, improving alignment boundary accuracy and search sensitivity. The FASTA programs can produce "BLAST-like" alignment and tabular output, for ease of integration into existing analysis pipelines, and can search small, representative databases, and then report results for a larger set of sequences, using links from the smaller dataset. The FASTA programs work with a wide variety of database formats, including mySQL and postgreSQL databases. The programs also provide a strategy for integrating domain and active site annotations into alignments and highlighting the mutational state of functionally critical residues. These protocols describe how to use the FASTA programs to characterize protein and DNA sequences, using protein:protein, protein:DNA, and DNA:DNA comparisons. © 2016 by John Wiley & Sons, Inc. PMID:27010337

  14. Clustering by Pattern Similarity

    Institute of Scientific and Technical Information of China (English)

    Hai-xun Wang; Jian Pei

    2008-01-01

    The task of clustering is to identify classes of similar objects among a set of objects. The definition of similarity varies from one clustering model to another. However, in most of these models the concept of similarity is often based on such metrics as Manhattan distance, Euclidean distance or other Lp distances. In other words, similar objects must have close values in at least a set of dimensions. In this paper, we explore a more general type of similarity. Under the pCluster model we proposed, two objects are similar if they exhibit a coherent pattern on a subset of dimensions. The new similarity concept models a wide range of applications. For instance, in DNA microarray analysis, the expression levels of two genes may rise and fall synchronously in response to a set of environmental stimuli. Although the magnitude of their expression levels may not be close, the patterns they exhibit can be very much alike. Discovery of such clusters of genes is essential in revealing significant connections in gene regulatory networks. E-commerce applications, such as collaborative filtering, can also benefit from the new model, because it is able to capture not only the closeness of values of certain leading indicators but also the closeness of (purchasing, browsing, etc.) patterns exhibited by the customers. In addition to the novel similarity model, this paper also introduces an effective and efficient algorithm to detect such clusters, and we perform tests on several real and synthetic data sets to show its performance.

  15. New Similarity Functions

    DEFF Research Database (Denmark)

    Yazdani, Hossein; Ortiz-Arroyo, Daniel; Kwasnicka, Halina

    2016-01-01

    In data science, there are important parameters that affect the accuracy of the algorithms used. Some of these parameters are: the type of data objects, the membership assignments, and distance or similarity functions. This paper discusses similarity functions as fundamental elements in membership...

  16. Similar component analysis

    Institute of Scientific and Technical Information of China (English)

    ZHANG Hong; WANG Xin; LI Junwei; CAO Xianguang

    2006-01-01

    A new unsupervised feature extraction method called similar component analysis (SCA) is proposed in this paper. SCA method has a self-aggregation property that the data objects will move towards each other to form clusters through SCA theoretically,which can reveal the inherent pattern of similarity hidden in the dataset. The inputs of SCA are just the pairwise similarities of the dataset,which makes it easier for time series analysis due to the variable length of the time series. Our experimental results on many problems have verified the effectiveness of SCA on some engineering application.

  17. Gender similarities and differences.

    Science.gov (United States)

    Hyde, Janet Shibley

    2014-01-01

    Whether men and women are fundamentally different or similar has been debated for more than a century. This review summarizes major theories designed to explain gender differences: evolutionary theories, cognitive social learning theory, sociocultural theory, and expectancy-value theory. The gender similarities hypothesis raises the possibility of theorizing gender similarities. Statistical methods for the analysis of gender differences and similarities are reviewed, including effect sizes, meta-analysis, taxometric analysis, and equivalence testing. Then, relying mainly on evidence from meta-analyses, gender differences are reviewed in cognitive performance (e.g., math performance), personality and social behaviors (e.g., temperament, emotions, aggression, and leadership), and psychological well-being. The evidence on gender differences in variance is summarized. The final sections explore applications of intersectionality and directions for future research.

  18. Gender similarities and differences.

    Science.gov (United States)

    Hyde, Janet Shibley

    2014-01-01

    Whether men and women are fundamentally different or similar has been debated for more than a century. This review summarizes major theories designed to explain gender differences: evolutionary theories, cognitive social learning theory, sociocultural theory, and expectancy-value theory. The gender similarities hypothesis raises the possibility of theorizing gender similarities. Statistical methods for the analysis of gender differences and similarities are reviewed, including effect sizes, meta-analysis, taxometric analysis, and equivalence testing. Then, relying mainly on evidence from meta-analyses, gender differences are reviewed in cognitive performance (e.g., math performance), personality and social behaviors (e.g., temperament, emotions, aggression, and leadership), and psychological well-being. The evidence on gender differences in variance is summarized. The final sections explore applications of intersectionality and directions for future research. PMID:23808917

  19. Cluster Tree Based Hybrid Document Similarity Measure

    Directory of Open Access Journals (Sweden)

    M. Varshana Devi

    2015-10-01

    Full Text Available similarity measure is established to measure the hybrid similarity. In cluster tree, the hybrid similarity measure can be calculated for the random data even it may not be the co-occurred and generate different views. Different views of tree can be combined and choose the one which is significant in cost. A method is proposed to combine the multiple views. Multiple views are represented by different distance measures into a single cluster. Comparing the cluster tree based hybrid similarity with the traditional statistical methods it gives the better feasibility for intelligent based search. It helps in improving the dimensionality reduction and semantic analysis.

  20. Information Extraction Using Distant Supervision and Semantic Similarities

    Directory of Open Access Journals (Sweden)

    PARK, Y.

    2016-02-01

    Full Text Available Information extraction is one of the main research tasks in natural language processing and text mining that extracts useful information from unstructured sentences. Information extraction techniques include named entity recognition, relation extraction, and co-reference resolution. Among them, relation extraction refers to a task that extracts semantic relations between entities such as personal and geographic names in documents. This is an important research area, which is used in knowledge base construction and question and answering systems. This study presents relation extraction using a distant supervision learning technique among semi-supervised learning methods, which have been spotlighted in recent years to reduce human manual work and costs required for supervised learning. That is, this study proposes a method that can improve relation extraction by improving a distant supervision learning technique by applying a clustering method to create a learning corpus and semantic analysis for relation extraction that is difficult to identify using existing distant supervision. Through comparison experiments of various semantic similarity comparison methods, similarity calculation methods that are useful to relation extraction using distant supervision are searched, and a large number of accurate relation triples can be extracted using the proposed structural advantages and semantic similarity comparison.

  1. Music Retrieval based on Melodic Similarity

    NARCIS (Netherlands)

    Typke, R.

    2007-01-01

    This thesis introduces a method for measuring melodic similarity for notated music such as MIDI files. This music search algorithm views music as sets of notes that are represented as weighted points in the two-dimensional space of time and pitch. Two point sets can be compared by calculating how mu

  2. The Qualitative Similarity Hypothesis

    Science.gov (United States)

    Paul, Peter V.; Lee, Chongmin

    2010-01-01

    Evidence is presented for the qualitative similarity hypothesis (QSH) with respect to children and adolescents who are d/Deaf or hard of hearing. The primary focus is on the development of English language and literacy skills, and some information is provided on the acquisition of English as a second language. The QSH is briefly discussed within…

  3. Similarity of molecular shape.

    Science.gov (United States)

    Meyer, A Y; Richards, W G

    1991-10-01

    The similarity of one molecule to another has usually been defined in terms of electron densities or electrostatic potentials or fields. Here it is expressed as a function of the molecular shape. Formulations of similarity (S) reduce to very simple forms, thus rendering the computerised calculation straightforward and fast. 'Elements of similarity' are identified, in the same spirit as 'elements of chirality', except that the former are understood to be variable rather than present-or-absent. Methods are presented which bypass the time-consuming mathematical optimisation of the relative orientation of the molecules. Numerical results are presented and examined, with emphasis on the similarity of isomers. At the extreme, enantiomeric pairs are considered, where it is the dissimilarity (D = 1 - S) that is of consequence. We argue that chiral molecules can be graded by dissimilarity, and show that D is the shape-analog of the 'chirality coefficient', with the simple form of the former opening up numerical access to the latter. PMID:1770379

  4. Limiting Similarity Revisited

    OpenAIRE

    Szabo, P; Meszena, G.

    2005-01-01

    We reinvestigate the validity of the limiting similarity principle via numerical simulations of the Lotka-Volterra model. A Gaussian competition kernel is employed to describe decreasing competition with increasing difference in a one-dimensional phenotype variable. The simulations are initiated by a large number of species, evenly distributed along the phenotype axis. Exceptionally, the Gaussian carrying capacity supports coexistence of all species, initially present. In case of any other, d...

  5. An efficient and accurate 3D displacements tracking strategy for digital volume correlation

    KAUST Repository

    Pan, Bing

    2014-07-01

    Owing to its inherent computational complexity, practical implementation of digital volume correlation (DVC) for internal displacement and strain mapping faces important challenges in improving its computational efficiency. In this work, an efficient and accurate 3D displacement tracking strategy is proposed for fast DVC calculation. The efficiency advantage is achieved by using three improvements. First, to eliminate the need of updating Hessian matrix in each iteration, an efficient 3D inverse compositional Gauss-Newton (3D IC-GN) algorithm is introduced to replace existing forward additive algorithms for accurate sub-voxel displacement registration. Second, to ensure the 3D IC-GN algorithm that converges accurately and rapidly and avoid time-consuming integer-voxel displacement searching, a generalized reliability-guided displacement tracking strategy is designed to transfer accurate and complete initial guess of deformation for each calculation point from its computed neighbors. Third, to avoid the repeated computation of sub-voxel intensity interpolation coefficients, an interpolation coefficient lookup table is established for tricubic interpolation. The computational complexity of the proposed fast DVC and the existing typical DVC algorithms are first analyzed quantitatively according to necessary arithmetic operations. Then, numerical tests are performed to verify the performance of the fast DVC algorithm in terms of measurement accuracy and computational efficiency. The experimental results indicate that, compared with the existing DVC algorithm, the presented fast DVC algorithm produces similar precision and slightly higher accuracy at a substantially reduced computational cost. © 2014 Elsevier Ltd.

  6. The application of similar image retrieval in electronic commerce.

    Science.gov (United States)

    Hu, YuPing; Yin, Hua; Han, Dezhi; Yu, Fei

    2014-01-01

    Traditional online shopping platform (OSP), which searches product information by keywords, faces three problems: indirect search mode, large search space, and inaccuracy in search results. For solving these problems, we discuss and research the application of similar image retrieval in electronic commerce. Aiming at improving the network customers' experience and providing merchants with the accuracy of advertising, we design a reasonable and extensive electronic commerce application system, which includes three subsystems: image search display subsystem, image search subsystem, and product information collecting subsystem. This system can provide seamless connection between information platform and OSP, on which consumers can automatically and directly search similar images according to the pictures from information platform. At the same time, it can be used to provide accuracy of internet marketing for enterprises. The experiment shows the efficiency of constructing the system. PMID:24883411

  7. The application of similar image retrieval in electronic commerce.

    Science.gov (United States)

    Hu, YuPing; Yin, Hua; Han, Dezhi; Yu, Fei

    2014-01-01

    Traditional online shopping platform (OSP), which searches product information by keywords, faces three problems: indirect search mode, large search space, and inaccuracy in search results. For solving these problems, we discuss and research the application of similar image retrieval in electronic commerce. Aiming at improving the network customers' experience and providing merchants with the accuracy of advertising, we design a reasonable and extensive electronic commerce application system, which includes three subsystems: image search display subsystem, image search subsystem, and product information collecting subsystem. This system can provide seamless connection between information platform and OSP, on which consumers can automatically and directly search similar images according to the pictures from information platform. At the same time, it can be used to provide accuracy of internet marketing for enterprises. The experiment shows the efficiency of constructing the system.

  8. The Application of Similar Image Retrieval in Electronic Commerce

    Directory of Open Access Journals (Sweden)

    YuPing Hu

    2014-01-01

    Full Text Available Traditional online shopping platform (OSP, which searches product information by keywords, faces three problems: indirect search mode, large search space, and inaccuracy in search results. For solving these problems, we discuss and research the application of similar image retrieval in electronic commerce. Aiming at improving the network customers’ experience and providing merchants with the accuracy of advertising, we design a reasonable and extensive electronic commerce application system, which includes three subsystems: image search display subsystem, image search subsystem, and product information collecting subsystem. This system can provide seamless connection between information platform and OSP, on which consumers can automatically and directly search similar images according to the pictures from information platform. At the same time, it can be used to provide accuracy of internet marketing for enterprises. The experiment shows the efficiency of constructing the system.

  9. Aiming for Efficiency by Detecting Structural Similarity

    Science.gov (United States)

    Winter, Judith; Jeliazkov, Nikolay; Kühne, Gerold

    When applying XML-Retrieval in a distributed setting, efficiency issues have to be considered, e.g. reducing the network traffic involved in an swering a given query. The new Efficiency Track of INEX gave us the opportu nity to explore the possibility of improving both effectiveness and efficiency by exploiting structural similarity. We ran some of the track’s highly structured queries on our top-k search engine to analyze the impact of various structural similarity functions. We applied those functions first to the ranking and based on that to the query routing process. Our results indicate that detection of structural similarity can be used in order to re duce the amount of messages sent between distributed nodes and thus lead to more efficiency of the search.

  10. A new approach for finding semantic similar scientific articles

    OpenAIRE

    Masumeh Islami Nasab; Reza Javidan

    2015-01-01

    Calculating article similarities enables users to find similar articles and documents in a collection of articles. Two similar documents are extremely helpful for text applications such as document-to-document similarity search, plagiarism checker, text mining for repetition, and text filtering. This paper proposes a new method for calculating the semantic similarities of articles. WordNet is used to find word semantic associations. The proposed technique first compares the similarity of each...

  11. Accurate guitar tuning by cochlear implant musicians.

    Directory of Open Access Journals (Sweden)

    Thomas Lu

    Full Text Available Modern cochlear implant (CI users understand speech but find difficulty in music appreciation due to poor pitch perception. Still, some deaf musicians continue to perform with their CI. Here we show unexpected results that CI musicians can reliably tune a guitar by CI alone and, under controlled conditions, match simultaneously presented tones to <0.5 Hz. One subject had normal contralateral hearing and produced more accurate tuning with CI than his normal ear. To understand these counterintuitive findings, we presented tones sequentially and found that tuning error was larger at ∼ 30 Hz for both subjects. A third subject, a non-musician CI user with normal contralateral hearing, showed similar trends in performance between CI and normal hearing ears but with less precision. This difference, along with electric analysis, showed that accurate tuning was achieved by listening to beats rather than discriminating pitch, effectively turning a spectral task into a temporal discrimination task.

  12. Towards accurate emergency response behavior

    International Nuclear Information System (INIS)

    Nuclear reactor operator emergency response behavior has persisted as a training problem through lack of information. The industry needs an accurate definition of operator behavior in adverse stress conditions, and training methods which will produce the desired behavior. Newly assembled information from fifty years of research into human behavior in both high and low stress provides a more accurate definition of appropriate operator response, and supports training methods which will produce the needed control room behavior. The research indicates that operator response in emergencies is divided into two modes, conditioned behavior and knowledge based behavior. Methods which assure accurate conditioned behavior, and provide for the recovery of knowledge based behavior, are described in detail

  13. Concept Search: Semantics Enabled Information Retrieval

    OpenAIRE

    Giunchiglia, Fausto; Kharkevich, Uladzimir; Zaihrayeu, Ilya

    2010-01-01

    In this paper we present a novel approach, called Concept Search, which extends syntactic search, i.e., search based on the computation of string similarity between words, with semantic search, i.e., search based on the computation of semantic relations between concepts. The key idea of Concept Search is to operate on complex concepts and to maximally exploit the semantic information available, reducing to syntactic search only when necessary, i.e., when no semantic information is available. ...

  14. Similar dissection of sets

    CERN Document Server

    Akiyama, Shigeki; Okazaki, Ryotaro; Steiner, Wolfgang; Thuswaldner, Jörg

    2010-01-01

    In 1994, Martin Gardner stated a set of questions concerning the dissection of a square or an equilateral triangle in three similar parts. Meanwhile, Gardner's questions have been generalized and some of them are already solved. In the present paper, we solve more of his questions and treat them in a much more general context. Let $D\\subset \\mathbb{R}^d$ be a given set and let $f_1,...,f_k$ be injective continuous mappings. Does there exist a set $X$ such that $D = X \\cup f_1(X) \\cup ... \\cup f_k(X)$ is satisfied with a non-overlapping union? We prove that such a set $X$ exists for certain choices of $D$ and $\\{f_1,...,f_k\\}$. The solutions $X$ often turn out to be attractors of iterated function systems with condensation in the sense of Barnsley. Coming back to Gardner's setting, we use our theory to prove that an equilateral triangle can be dissected in three similar copies whose areas have ratio $1:1:a$ for $a \\ge (3+\\sqrt{5})/2$.

  15. Accurate Modeling of Advanced Reflectarrays

    DEFF Research Database (Denmark)

    Zhou, Min

    Analysis and optimization methods for the design of advanced printed re ectarrays have been investigated, and the study is focused on developing an accurate and efficient simulation tool. For the analysis, a good compromise between accuracy and efficiency can be obtained using the spectral domain...

  16. Self Similar Optical Fiber

    Science.gov (United States)

    Lai, Zheng-Xuan

    This research proposes Self Similar optical fiber (SSF) as a new type of optical fiber. It has a special core that consists of self similar structure. Such a structure is obtained by following the formula for generating iterated function systems (IFS) in Fractal Theory. The resulted SSF can be viewed as a true fractal object in optical fibers. In addition, the method of fabricating SSF makes it possible to generate desired structures exponentially in numbers, whereas it also allows lower scale units in the structure to be reduced in size exponentially. The invention of SSF is expected to greatly ease the production of optical fiber when a large number of small hollow structures are needed in the core of the optical fiber. This dissertation will analyze the core structure of SSF based on fractal theory. Possible properties from the structural characteristics and the corresponding applications are explained. Four SSF samples were obtained through actual fabrication in a laboratory environment. Different from traditional conductive heating fabrication system, I used an in-house designed furnace that incorporated a radiation heating method, and was equipped with automated temperature control system. The obtained samples were examined through spectrum tests. Results from the tests showed that SSF does have the optical property of delivering light in a certain wavelength range. However, SSF as a new type of optical fiber requires a systematic research to find out the theory that explains its structure and the associated optical properties. The fabrication and quality of SSF also needs to be improved for product deployment. As a start of this extensive research, this dissertation work opens the door to a very promising new area in optical fiber research.

  17. Professional Microsoft search fast search, Sharepoint search, and search server

    CERN Document Server

    Bennett, Mark; Kehoe, Miles; Voskresenskaya, Natalya

    2010-01-01

    Use Microsoft's latest search-based technology-FAST search-to plan, customize, and deploy your search solutionFAST is Microsoft's latest intelligent search-based technology that boasts robustness and an ability to integrate business intelligence with Search. This in-depth guide provides you with advanced coverage on FAST search and shows you how to use it to plan, customize, and deploy your search solution, with an emphasis on SharePoint 2010 and Internet-based search solutions.With a particular appeal for anyone responsible for implementing and managing enterprise search, this book presents t

  18. Profitable capitation requires accurate costing.

    Science.gov (United States)

    West, D A; Hicks, L L; Balas, E A; West, T D

    1996-01-01

    In the name of costing accuracy, nurses are asked to track inventory use on per treatment basis when more significant costs, such as general overhead and nursing salaries, are usually allocated to patients or treatments on an average cost basis. Accurate treatment costing and financial viability require analysis of all resources actually consumed in treatment delivery, including nursing services and inventory. More precise costing information enables more profitable decisions as is demonstrated by comparing the ratio-of-cost-to-treatment method (aggregate costing) with alternative activity-based costing methods (ABC). Nurses must participate in this costing process to assure that capitation bids are based upon accurate costs rather than simple averages. PMID:8788799

  19. Search Cloud

    Science.gov (United States)

    ... this page: https://medlineplus.gov/cloud.html Search Cloud To use the sharing features on this page, ... Top 110 zoster vaccine Share the MedlinePlus search cloud with your users by embedding our search cloud ...

  20. Accurate pose estimation for forensic identification

    Science.gov (United States)

    Merckx, Gert; Hermans, Jeroen; Vandermeulen, Dirk

    2010-04-01

    In forensic authentication, one aims to identify the perpetrator among a series of suspects or distractors. A fundamental problem in any recognition system that aims for identification of subjects in a natural scene is the lack of constrains on viewing and imaging conditions. In forensic applications, identification proves even more challenging, since most surveillance footage is of abysmal quality. In this context, robust methods for pose estimation are paramount. In this paper we will therefore present a new pose estimation strategy for very low quality footage. Our approach uses 3D-2D registration of a textured 3D face model with the surveillance image to obtain accurate far field pose alignment. Starting from an inaccurate initial estimate, the technique uses novel similarity measures based on the monogenic signal to guide a pose optimization process. We will illustrate the descriptive strength of the introduced similarity measures by using them directly as a recognition metric. Through validation, using both real and synthetic surveillance footage, our pose estimation method is shown to be accurate, and robust to lighting changes and image degradation.

  1. SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY

    Directory of Open Access Journals (Sweden)

    Yoshihisa Udagawa

    2013-07-01

    Full Text Available Duplicate code adversely affects the quality of software systems and hence should be detected. We discuss an approach that improves source code retrieval using structural information of source code. A lexical parser is developed to extract control statements and method identifiers from Java programs. We propose a similarity measure that is defined by the ratio of the number of sequential fully matching statements to the number of sequential partially matching statements. The defined similarity measure is an extension of the set-based Sorensen-Dice similarity index. This research primarily contributes to the development of a similarity retrieval algorithm that derives meaningful search conditions from a given sequence, and then performs retrieval using all derived conditions. Experiments show that our retrieval model shows an improvement of up to 90.9% over other retrieval models relative to the number of retrieved methods.

  2. Integrated Semantic Similarity Model Based on Ontology

    Institute of Scientific and Technical Information of China (English)

    LIU Ya-Jun; ZHAO Yun

    2004-01-01

    To solve the problem of the inadequacy of semantic processing in the intelligent question answering system, an integrated semantic similarity model which calculates the semantic similarity using the geometric distance and information content is presented in this paper.With the help of interrelationship between concepts, the information content of concepts and the strength of the edges in the ontology network, we can calculate the semantic similarity between two concepts and provide information for the further calculation of the semantic similarity between user's question and answers in knowlegdge base.The results of the experiments on the prototype have shown that the semantic problem in natural language processing can also be solved with the help of the knowledge and the abundant semantic information in ontology.More than 90% accuracy with less than 50 ms average searching time in the intelligent question answering prototype system based on ontology has been reached.The result is very satisfied.

  3. The similarity principle - on using models correctly

    DEFF Research Database (Denmark)

    Landberg, L.; Mortensen, N.G.; Rathmann, O.;

    2003-01-01

    This paper will present some guiding principles on the most accurate use of the WAsP program in particular, but the principle can be applied to the use of any linear model which predicts some quantity at one location based on another. We have felt a need to lay out these principles out explicitly......, due to the many, many users and the uses (and misuses) of the WAsP program. Put simply, the similarity principle states that one should chose a predictor site which – in as many ways as possible – is similar to the predicted site....

  4. Accurate determination of antenna directivity

    DEFF Research Database (Denmark)

    Dich, Mikael

    1997-01-01

    The derivation of a formula for accurate estimation of the total radiated power from a transmitting antenna for which the radiated power density is known in a finite number of points on the far-field sphere is presented. The main application of the formula is determination of directivity from power......-pattern measurements. The derivation is based on the theory of spherical wave expansion of electromagnetic fields, which also establishes a simple criterion for the required number of samples of the power density. An array antenna consisting of Hertzian dipoles is used to test the accuracy and rate of convergence...

  5. Parameters for accurate genome alignment

    Directory of Open Access Journals (Sweden)

    Hamada Michiaki

    2010-02-01

    Full Text Available Abstract Background Genome sequence alignments form the basis of much research. Genome alignment depends on various mundane but critical choices, such as how to mask repeats and which score parameters to use. Surprisingly, there has been no large-scale assessment of these choices using real genomic data. Moreover, rigorous procedures to control the rate of spurious alignment have not been employed. Results We have assessed 495 combinations of score parameters for alignment of animal, plant, and fungal genomes. As our gold-standard of accuracy, we used genome alignments implied by multiple alignments of proteins and of structural RNAs. We found the HOXD scoring schemes underlying alignments in the UCSC genome database to be far from optimal, and suggest better parameters. Higher values of the X-drop parameter are not always better. E-values accurately indicate the rate of spurious alignment, but only if tandem repeats are masked in a non-standard way. Finally, we show that γ-centroid (probabilistic alignment can find highly reliable subsets of aligned bases. Conclusions These results enable more accurate genome alignment, with reliability measures for local alignments and for individual aligned bases. This study was made possible by our new software, LAST, which can align vertebrate genomes in a few hours http://last.cbrc.jp/.

  6. Measuring Personalization of Web Search

    DEFF Research Database (Denmark)

    Hannak, Aniko; Sapiezynski, Piotr; Kakhki, Arash Molavi;

    2013-01-01

    Web search is an integral part of our daily lives. Recently, there has been a trend of personalization in Web search, where different users receive different results for the same search query. The increasing personalization is leading to concerns about Filter Bubble effects, where certain users...... are simply unable to access information that the search engines’ algorithm decidesis irrelevant. Despitetheseconcerns, there has been little quantification of the extent of personalization in Web search today, or the user attributes that cause it. In light of this situation, we make three contributions....... First, we develop a methodology for measuring personalization in Web search results. While conceptually simple, there are numerous details that our methodology must handle in order to accurately attribute differences in search results to personalization. Second, we apply our methodology to 200 users...

  7. Search Patterns

    CERN Document Server

    Morville, Peter

    2010-01-01

    What people are saying about Search Patterns "Search Patterns is a delight to read -- very thoughtful and thought provoking. It's the most comprehensive survey of designing effective search experiences I've seen." --Irene Au, Director of User Experience, Google "I love this book! Thanks to Peter and Jeffery, I now know that search (yes, boring old yucky who cares search) is one of the coolest ways around of looking at the world." --Dan Roam, author, The Back of the Napkin (Portfolio Hardcover) "Search Patterns is a playful guide to the practical concerns of search interface design. It cont

  8. Notions of similarity for computational biology models

    KAUST Repository

    Waltemath, Dagmar

    2016-03-21

    Computational models used in biology are rapidly increasing in complexity, size, and numbers. To build such large models, researchers need to rely on software tools for model retrieval, model combination, and version control. These tools need to be able to quantify the differences and similarities between computational models. However, depending on the specific application, the notion of similarity may greatly vary. A general notion of model similarity, applicable to various types of models, is still missing. Here, we introduce a general notion of quantitative model similarities, survey the use of existing model comparison methods in model building and management, and discuss potential applications of model comparison. To frame model comparison as a general problem, we describe a theoretical approach to defining and computing similarities based on different model aspects. Potentially relevant aspects of a model comprise its references to biological entities, network structure, mathematical equations and parameters, and dynamic behaviour. Future similarity measures could combine these model aspects in flexible, problem-specific ways in order to mimic users\\' intuition about model similarity, and to support complex model searches in databases.

  9. Accurate ab initio spin densities

    CERN Document Server

    Boguslawski, Katharina; Legeza, Örs; Reiher, Markus

    2012-01-01

    We present an approach for the calculation of spin density distributions for molecules that require very large active spaces for a qualitatively correct description of their electronic structure. Our approach is based on the density-matrix renormalization group (DMRG) algorithm to calculate the spin density matrix elements as basic quantity for the spatially resolved spin density distribution. The spin density matrix elements are directly determined from the second-quantized elementary operators optimized by the DMRG algorithm. As an analytic convergence criterion for the spin density distribution, we employ our recently developed sampling-reconstruction scheme [J. Chem. Phys. 2011, 134, 224101] to build an accurate complete-active-space configuration-interaction (CASCI) wave function from the optimized matrix product states. The spin density matrix elements can then also be determined as an expectation value employing the reconstructed wave function expansion. Furthermore, the explicit reconstruction of a CA...

  10. The Accurate Particle Tracer Code

    CERN Document Server

    Wang, Yulei; Qin, Hong; Yu, Zhi

    2016-01-01

    The Accurate Particle Tracer (APT) code is designed for large-scale particle simulations on dynamical systems. Based on a large variety of advanced geometric algorithms, APT possesses long-term numerical accuracy and stability, which are critical for solving multi-scale and non-linear problems. Under the well-designed integrated and modularized framework, APT serves as a universal platform for researchers from different fields, such as plasma physics, accelerator physics, space science, fusion energy research, computational mathematics, software engineering, and high-performance computation. The APT code consists of seven main modules, including the I/O module, the initialization module, the particle pusher module, the parallelization module, the field configuration module, the external force-field module, and the extendible module. The I/O module, supported by Lua and Hdf5 projects, provides a user-friendly interface for both numerical simulation and data analysis. A series of new geometric numerical methods...

  11. Accurate thickness measurement of graphene

    Science.gov (United States)

    Shearer, Cameron J.; Slattery, Ashley D.; Stapleton, Andrew J.; Shapter, Joseph G.; Gibson, Christopher T.

    2016-03-01

    Graphene has emerged as a material with a vast variety of applications. The electronic, optical and mechanical properties of graphene are strongly influenced by the number of layers present in a sample. As a result, the dimensional characterization of graphene films is crucial, especially with the continued development of new synthesis methods and applications. A number of techniques exist to determine the thickness of graphene films including optical contrast, Raman scattering and scanning probe microscopy techniques. Atomic force microscopy (AFM), in particular, is used extensively since it provides three-dimensional images that enable the measurement of the lateral dimensions of graphene films as well as the thickness, and by extension the number of layers present. However, in the literature AFM has proven to be inaccurate with a wide range of measured values for single layer graphene thickness reported (between 0.4 and 1.7 nm). This discrepancy has been attributed to tip-surface interactions, image feedback settings and surface chemistry. In this work, we use standard and carbon nanotube modified AFM probes and a relatively new AFM imaging mode known as PeakForce tapping mode to establish a protocol that will allow users to accurately determine the thickness of graphene films. In particular, the error in measuring the first layer is reduced from 0.1-1.3 nm to 0.1-0.3 nm. Furthermore, in the process we establish that the graphene-substrate adsorbate layer and imaging force, in particular the pressure the tip exerts on the surface, are crucial components in the accurate measurement of graphene using AFM. These findings can be applied to other 2D materials.

  12. Accurate thickness measurement of graphene.

    Science.gov (United States)

    Shearer, Cameron J; Slattery, Ashley D; Stapleton, Andrew J; Shapter, Joseph G; Gibson, Christopher T

    2016-03-29

    Graphene has emerged as a material with a vast variety of applications. The electronic, optical and mechanical properties of graphene are strongly influenced by the number of layers present in a sample. As a result, the dimensional characterization of graphene films is crucial, especially with the continued development of new synthesis methods and applications. A number of techniques exist to determine the thickness of graphene films including optical contrast, Raman scattering and scanning probe microscopy techniques. Atomic force microscopy (AFM), in particular, is used extensively since it provides three-dimensional images that enable the measurement of the lateral dimensions of graphene films as well as the thickness, and by extension the number of layers present. However, in the literature AFM has proven to be inaccurate with a wide range of measured values for single layer graphene thickness reported (between 0.4 and 1.7 nm). This discrepancy has been attributed to tip-surface interactions, image feedback settings and surface chemistry. In this work, we use standard and carbon nanotube modified AFM probes and a relatively new AFM imaging mode known as PeakForce tapping mode to establish a protocol that will allow users to accurately determine the thickness of graphene films. In particular, the error in measuring the first layer is reduced from 0.1-1.3 nm to 0.1-0.3 nm. Furthermore, in the process we establish that the graphene-substrate adsorbate layer and imaging force, in particular the pressure the tip exerts on the surface, are crucial components in the accurate measurement of graphene using AFM. These findings can be applied to other 2D materials.

  13. SProt: sphere-based protein structure similarity algorithm

    OpenAIRE

    2011-01-01

    Background Similarity search in protein databases is one of the most essential issues in computational proteomics. With the growing number of experimentally resolved protein structures, the focus shifted from sequences to structures. The area of structure similarity forms a big challenge since even no standard definition of optimal structure similarity exists in the field. Results We propose a protein structure similarity measure called SProt. SProt concentrates on high-quality modeling of lo...

  14. Estimating similarity of XML Schemas using path similarity measure

    Directory of Open Access Journals (Sweden)

    Veena Trivedi

    2012-07-01

    Full Text Available In this paper, an attempt has been made to develop an algorithm which estimates the similarity for XML Schemas using multiple similarity measures. For performing the task, the XML Schema element information has been represented in the form of string and four different similarity measure approaches have been employed. To further improve the similarity measure, an overall similarity measure has also been calculated. The approach used in this paper is a distinguished one, as it calculates the similarity between two XML schemas using four approaches and gives an integrated values for the similarity measure. Keywords-componen

  15. Accurate guitar tuning by cochlear implant musicians.

    Science.gov (United States)

    Lu, Thomas; Huang, Juan; Zeng, Fan-Gang

    2014-01-01

    Modern cochlear implant (CI) users understand speech but find difficulty in music appreciation due to poor pitch perception. Still, some deaf musicians continue to perform with their CI. Here we show unexpected results that CI musicians can reliably tune a guitar by CI alone and, under controlled conditions, match simultaneously presented tones to tuning with CI than his normal ear. To understand these counterintuitive findings, we presented tones sequentially and found that tuning error was larger at ∼ 30 Hz for both subjects. A third subject, a non-musician CI user with normal contralateral hearing, showed similar trends in performance between CI and normal hearing ears but with less precision. This difference, along with electric analysis, showed that accurate tuning was achieved by listening to beats rather than discriminating pitch, effectively turning a spectral task into a temporal discrimination task. PMID:24651081

  16. Stochastic Self-Similar and Fractal Universe

    CERN Document Server

    Iovane, G; Tortoriello, F S

    2004-01-01

    The structures formation of the Universe appears as if it were a classically self-similar random process at all astrophysical scales. An agreement is demonstrated for the present hypotheses of segregation with a size of astrophysical structures by using a comparison between quantum quantities and astrophysical ones. We present the observed segregated Universe as the result of a fundamental self-similar law, which generalizes the Compton wavelength relation. It appears that the Universe has a memory of its quantum origin as suggested by R.Penrose with respect to quasi-crystal. A more accurate analysis shows that the present theory can be extended from the astrophysical to the nuclear scale by using generalized (stochastically) self-similar random process. This transition is connected to the relevant presence of the electromagnetic and nuclear interactions inside the matter. In this sense, the presented rule is correct from a subatomic scale to an astrophysical one. We discuss the near full agreement at organic...

  17. P2P Concept Search: Some Preliminary Results

    OpenAIRE

    Giunchiglia, Fausto; Kharkevich, Uladzimir; Noori, S.R.H

    2009-01-01

    Concept Search extends syntactic search, i.e., search based on the computation of string similarity between words, with semantic search, i.e., search based on the computation of semantic relations between complex concepts. It allows us to deal with ambiguity of natural language. P2P Concept Search extends Concept Search by allowing distributed semantic search over structured P2P network. The key idea is to exploit distributed, rather than centralized, background knowledge and indices.

  18. A More Accurate Fourier Transform

    CERN Document Server

    Courtney, Elya

    2015-01-01

    Fourier transform methods are used to analyze functions and data sets to provide frequencies, amplitudes, and phases of underlying oscillatory components. Fast Fourier transform (FFT) methods offer speed advantages over evaluation of explicit integrals (EI) that define Fourier transforms. This paper compares frequency, amplitude, and phase accuracy of the two methods for well resolved peaks over a wide array of data sets including cosine series with and without random noise and a variety of physical data sets, including atmospheric $\\mathrm{CO_2}$ concentrations, tides, temperatures, sound waveforms, and atomic spectra. The FFT uses MIT's FFTW3 library. The EI method uses the rectangle method to compute the areas under the curve via complex math. Results support the hypothesis that EI methods are more accurate than FFT methods. Errors range from 5 to 10 times higher when determining peak frequency by FFT, 1.4 to 60 times higher for peak amplitude, and 6 to 10 times higher for phase under a peak. The ability t...

  19. Predicting user click behaviour in search engine advertisements

    Science.gov (United States)

    Daryaie Zanjani, Mohammad; Khadivi, Shahram

    2015-10-01

    According to the specific requirements and interests of users, search engines select and display advertisements that match user needs and have higher probability of attracting users' attention based on their previous search history. New objects such as user, advertisement or query cause a deterioration of precision in targeted advertising due to their lack of history. This article surveys this challenge. In the case of new objects, we first extract similar observed objects to the new object and then we use their history as the history of new object. Similarity between objects is measured based on correlation, which is a relation between user and advertisement when the advertisement is displayed to the user. This method is used for all objects, so it has helped us to accurately select relevant advertisements for users' queries. In our proposed model, we assume that similar users behave in a similar manner. We find that users with few queries are similar to new users. We will show that correlation between users and advertisements' keywords is high. Thus, users who pay attention to advertisements' keywords, click similar advertisements. In addition, users who pay attention to specific brand names might have similar behaviours too.

  20. Analytical Searching.

    Science.gov (United States)

    Pappas, Marjorie L.

    1995-01-01

    Discusses analytical searching, a process that enables searchers of electronic resources to develop a planned strategy by combining words or phrases with Boolean operators. Defines simple and complex searching, and describes search strategies developed with Boolean logic and truncation. Provides guidelines for teaching students analytical…

  1. Improved Search Techniques

    Science.gov (United States)

    Albornoz, Caleb Ronald

    2012-01-01

    Thousands of millions of documents are stored and updated daily in the World Wide Web. Most of the information is not efficiently organized to build knowledge from the stored data. Nowadays, search engines are mainly used by users who rely on their skills to look for the information needed. This paper presents different techniques search engine users can apply in Google Search to improve the relevancy of search results. According to the Pew Research Center, the average person spends eight hours a month searching for the right information. For instance, a company that employs 1000 employees wastes $2.5 million dollars on looking for nonexistent and/or not found information. The cost is very high because decisions are made based on the information that is readily available to use. Whenever the information necessary to formulate an argument is not available or found, poor decisions may be made and mistakes will be more likely to occur. Also, the survey indicates that only 56% of Google users feel confident with their current search skills. Moreover, just 76% of the information that is available on the Internet is accurate.

  2. Acoustic Similarity and Dichotic Listening.

    Science.gov (United States)

    Benson, Peter

    1978-01-01

    An experiment tests conjectures that right ear advantage (REA) has an auditory origin in competition or interference between acoustically similar stimuli and that feature-sharing effect (FSE) has its origin in assignment of features of phonetically similar stimuli. No effect on the REA for acoustic similarity, and a clear effect of acoustic…

  3. Functional Similarity and Interpersonal Attraction.

    Science.gov (United States)

    Neimeyer, Greg J.; Neimeyer, Robert A.

    1981-01-01

    Students participated in dyadic disclosure exercises over a five-week period. Results indicated members of high functional similarity dyads evidenced greater attraction to one another than did members of low functional similarity dyads. "Friendship" pairs of male undergraduates displayed greater functional similarity than did "nominal" pairs from…

  4. Perceived and actual similarities in biological and adoptive families: does perceived similarity bias genetic inferences?

    Science.gov (United States)

    Scarr, S; Scarf, E; Weinberg, R A

    1980-09-01

    Critics of the adoption method to estimate the relative effects of genetic and environmental differences on behavioral development claim that important biases are created by the knowledge of biological relatedness or adoptive status. Since the 1950s, agency policy has led to nearly all adopted children knowing that they are adopted. To test the hypothesis that knowledge of biological or adoptive status influences actual similarity, we correlated absolute differences in objective test scores with ratings of similarity by adolescents and their parents in adoptive and biological families. Although biological family members see themselves as more similar than adoptive family members, there are also important generational and gender differences in perceived similarity that cut across family type. There is moderate agreement among family members on the degree of perceived similarity, but there is no correlation between perceived and actual similarity in intelligence or temperament. However, family members are more accurate about shared social attitudes. Knowledge of adoptive or biological relatedness is related to the degree of perceived similarity, but perceptions of similarity are not related to objective similarities and thus do not constitute a bias in comparisons of measured differences in intelligence or temperament in adoptive and biological families.

  5. Source Code Retrieval Using Sequence Based Similarity

    Directory of Open Access Journals (Sweden)

    Yoshihisa Udagawa

    2013-07-01

    Full Text Available Duplicate code adversely affects the quality of software systems and hence should be detected. We discussan approach that improves source code retrieval using structural information of source code. A lexicalparser is developed to extract control statements and method identifiers from Java programs. We propose asimilarity measure that is defined by the ratio ofthe number of sequential fully matching statementsto thenumber of sequential partially matching statements.The defined similarity measure is an extension oftheset-based Sorensen-Dice similarity index. This research primarily contributes to the development of asimilarity retrieval algorithm that derives meaningful search conditions from a given sequence, and thenperforms retrieval using all derived conditions. Experiments show that our retrieval model shows animprovement of up to 90.9% over other retrieval models relative to the number of retrieved methods.

  6. Aggregated search: a new information retrieval paradigm

    OpenAIRE

    Kopliku, Arlind; Pinel-Sauvagnat, Karen; Boughanem, Mohand

    2014-01-01

    Traditional search engines return ranked lists of search results. It is up to the user to scroll this list, scan within different documents and assemble information that fulfill his/her information need. Aggregated search represents a new class of approaches where the information is not only retrieved but also assembled. This is the current evolution in Web search, where diverse content (images, videos, ...) and relational content (similar entities, features) are included in search results. I...

  7. 38 CFR 4.46 - Accurate measurement.

    Science.gov (United States)

    2010-07-01

    ... 38 Pensions, Bonuses, and Veterans' Relief 1 2010-07-01 2010-07-01 false Accurate measurement. 4... RATING DISABILITIES Disability Ratings The Musculoskeletal System § 4.46 Accurate measurement. Accurate measurement of the length of stumps, excursion of joints, dimensions and location of scars with respect...

  8. A new adaptive fast motion estimation algorithm based on local motion similarity degree (LMSD)

    Institute of Scientific and Technical Information of China (English)

    LIU Long; HAN Chongzhao; BAI Yan

    2005-01-01

    In the motion vector field adaptive search technique (MVFAST) and the predictive motion vector field adaptive search technique (PMVFAST), the size of the largest motion vector from the three adjacent blocks (left, top, top-right) is compared with the threshold to select different search scheme. But a suitable search center and search pattern will not be selected in the adaptive search technique when the adjacent motion vectors are not coherent in local region. This paper presents an efficient adaptive search algorithm. The motion vector variation degree (MVVD) is considered a reasonable factor for adaptive search selection. By the relationship between local motion similarity degree (LMSD) and the variation degree of motion vector (MVVD), the motion vectors are classified as three categories according to corresponding LMSD; then different proposed search schemes are adopted for motion estimation. The experimental results show that the proposed algorithm has a significant computational speedup compared with MVFAST and PMVFAST algorithms, and offers a similar, even better performance.

  9. Learning Multi-modal Similarity

    CERN Document Server

    McFee, Brian

    2010-01-01

    In many applications involving multi-media data, the definition of similarity between items is integral to several key tasks, e.g., nearest-neighbor retrieval, classification, and recommendation. Data in such regimes typically exhibits multiple modalities, such as acoustic and visual content of video. Integrating such heterogeneous data to form a holistic similarity space is therefore a key challenge to be overcome in many real-world applications. We present a novel multiple kernel learning technique for integrating heterogeneous data into a single, unified similarity space. Our algorithm learns an optimal ensemble of kernel transfor- mations which conform to measurements of human perceptual similarity, as expressed by relative comparisons. To cope with the ubiquitous problems of subjectivity and inconsistency in multi- media similarity, we develop graph-based techniques to filter similarity measurements, resulting in a simplified and robust training procedure.

  10. Roget's Thesaurus and Semantic Similarity

    OpenAIRE

    Jarmasz, Mario; Szpakowicz, Stan

    2012-01-01

    We have implemented a system that measures semantic similarity using a computerized 1987 Roget's Thesaurus, and evaluated it by performing a few typical tests. We compare the results of these tests with those produced by WordNet-based similarity measures. One of the benchmarks is Miller and Charles' list of 30 noun pairs to which human judges had assigned similarity measures. We correlate these measures with those computed by several NLP systems. The 30 pairs can be traced back to Rubenstein ...

  11. Search Fatigue

    OpenAIRE

    Bruce Ian Carlin; Florian Ederer

    2012-01-01

    Consumer search is not only costly but also tiring. We characterize the intertemporal effects that search fatigue has on oligopoly prices, product proliferation, and the provision of consumer assistance (i.e., advice). These effects vary based on whether search is all-or-nothing or sequential in nature, whether learning takes place, and whether consumers exhibit brand loyalty. We perform welfare analysis and highlight the novel empirical implications that our analysis generates.

  12. Faceted Search

    CERN Document Server

    Tunkelang, Daniel

    2009-01-01

    We live in an information age that requires us, more than ever, to represent, access, and use information. Over the last several decades, we have developed a modern science and technology for information retrieval, relentlessly pursuing the vision of a "memex" that Vannevar Bush proposed in his seminal article, "As We May Think." Faceted search plays a key role in this program. Faceted search addresses weaknesses of conventional search approaches and has emerged as a foundation for interactive information retrieval. User studies demonstrate that faceted search provides more

  13. Dynamic similarity in erosional processes

    Science.gov (United States)

    Scheidegger, A.E.

    1963-01-01

    A study is made of the dynamic similarity conditions obtaining in a variety of erosional processes. The pertinent equations for each type of process are written in dimensionless form; the similarity conditions can then easily be deduced. The processes treated are: raindrop action, slope evolution and river erosion. ?? 1963 Istituto Geofisico Italiano.

  14. Personalized recommendation with corrected similarity

    International Nuclear Information System (INIS)

    Personalized recommendation has attracted a surge of interdisciplinary research. Especially, similarity-based methods in applications of real recommendation systems have achieved great success. However, the computations of similarities are overestimated or underestimated, in particular because of the defective strategy of unidirectional similarity estimation. In this paper, we solve this drawback by leveraging mutual correction of forward and backward similarity estimations, and propose a new personalized recommendation index, i.e., corrected similarity based inference (CSI). Through extensive experiments on four benchmark datasets, the results show a greater improvement of CSI in comparison with these mainstream baselines. And a detailed analysis is presented to unveil and understand the origin of such difference between CSI and mainstream indices. (paper)

  15. Hash: a program to accurately predict protein H{sup {alpha}} shifts from neighboring backbone shifts

    Energy Technology Data Exchange (ETDEWEB)

    Zeng Jianyang, E-mail: zengjy@gmail.com [Tsinghua University, Institute for Interdisciplinary Information Sciences (China); Zhou Pei [Duke University Medical Center, Department of Biochemistry (United States); Donald, Bruce Randall [Duke University, Department of Computer Science (United States)

    2013-01-15

    Chemical shifts provide not only peak identities for analyzing nuclear magnetic resonance (NMR) data, but also an important source of conformational information for studying protein structures. Current structural studies requiring H{sup {alpha}} chemical shifts suffer from the following limitations. (1) For large proteins, the H{sup {alpha}} chemical shifts can be difficult to assign using conventional NMR triple-resonance experiments, mainly due to the fast transverse relaxation rate of C{sup {alpha}} that restricts the signal sensitivity. (2) Previous chemical shift prediction approaches either require homologous models with high sequence similarity or rely heavily on accurate backbone and side-chain structural coordinates. When neither sequence homologues nor structural coordinates are available, we must resort to other information to predict H{sup {alpha}} chemical shifts. Predicting accurate H{sup {alpha}} chemical shifts using other obtainable information, such as the chemical shifts of nearby backbone atoms (i.e., adjacent atoms in the sequence), can remedy the above dilemmas, and hence advance NMR-based structural studies of proteins. By specifically exploiting the dependencies on chemical shifts of nearby backbone atoms, we propose a novel machine learning algorithm, called Hash, to predict H{sup {alpha}} chemical shifts. Hash combines a new fragment-based chemical shift search approach with a non-parametric regression model, called the generalized additive model, to effectively solve the prediction problem. We demonstrate that the chemical shifts of nearby backbone atoms provide a reliable source of information for predicting accurate H{sup {alpha}} chemical shifts. Our testing results on different possible combinations of input data indicate that Hash has a wide rage of potential NMR applications in structural and biological studies of proteins.

  16. Similarity of samples and trimming

    CERN Document Server

    Álvarez-Esteban, Pedro C; Cuesta-Albertos, Juan A; Matrán, Carlos; 10.3150/11-BEJ351

    2012-01-01

    We say that two probabilities are similar at level $\\alpha$ if they are contaminated versions (up to an $\\alpha$ fraction) of the same common probability. We show how this model is related to minimal distances between sets of trimmed probabilities. Empirical versions turn out to present an overfitting effect in the sense that trimming beyond the similarity level results in trimmed samples that are closer than expected to each other. We show how this can be combined with a bootstrap approach to assess similarity from two data samples.

  17. Capacity Planning for Vertical Search Engines

    CERN Document Server

    Badue, Claudine; Almeida, Virgilio; Baeza-Yates, Ricardo; Ribeiro-Neto, Berthier; Ziviani, Artur; Ziviani, Nivio

    2010-01-01

    Vertical search engines focus on specific slices of content, such as the Web of a single country or the document collection of a large corporation. Despite this, like general open web search engines, they are expensive to maintain, expensive to operate, and hard to design. Because of this, predicting the response time of a vertical search engine is usually done empirically through experimentation, requiring a costly setup. An alternative is to develop a model of the search engine for predicting performance. However, this alternative is of interest only if its predictions are accurate. In this paper we propose a methodology for analyzing the performance of vertical search engines. Applying the proposed methodology, we present a capacity planning model based on a queueing network for search engines with a scale typically suitable for the needs of large corporations. The model is simple and yet reasonably accurate and, in contrast to previous work, considers the imbalance in query service times among homogeneous...

  18. Self-similar aftershock rates

    CERN Document Server

    Davidsen, Jörn

    2016-01-01

    In many important systems exhibiting crackling noise --- intermittent avalanche-like relaxation response with power-law and, thus, self-similar distributed event sizes --- the "laws" for the rate of activity after large events are not consistent with the overall self-similar behavior expected on theoretical grounds. This is in particular true for the case of seismicity and a satisfying solution to this paradox has remained outstanding. Here, we propose a generalized description of the aftershock rates which is both self-similar and consistent with all other known self-similar features. Comparing our theoretical predictions with high resolution earthquake data from Southern California we find excellent agreement, providing in particular clear evidence for a unified description of aftershocks and foreshocks. This may offer an improved way of time-dependent seismic hazard assessment and earthquake forecasting.

  19. Self-similar aftershock rates

    Science.gov (United States)

    Davidsen, Jörn; Baiesi, Marco

    2016-08-01

    In many important systems exhibiting crackling noise—an intermittent avalanchelike relaxation response with power-law and, thus, self-similar distributed event sizes—the "laws" for the rate of activity after large events are not consistent with the overall self-similar behavior expected on theoretical grounds. This is particularly true for the case of seismicity, and a satisfying solution to this paradox has remained outstanding. Here, we propose a generalized description of the aftershock rates which is both self-similar and consistent with all other known self-similar features. Comparing our theoretical predictions with high-resolution earthquake data from Southern California we find excellent agreement, providing particularly clear evidence for a unified description of aftershocks and foreshocks. This may offer an improved framework for time-dependent seismic hazard assessment and earthquake forecasting.

  20. A Novel Personalized Web Search Model

    Institute of Scientific and Technical Information of China (English)

    ZHU Zhengyu; XU Jingqiu; TIAN Yunyan; REN Xiang

    2007-01-01

    A novel personalized Web search model is proposed.The new system, as a middleware between a user and a Web search engine, is set up on the client machine. It can learn a user's preference implicitly and then generate the user profile automatically. When the user inputs query keywords, the system can automatically generate a few personalized expansion words by computing the term-term associations according to the current user profile, and then these words together with the query keywords are submitted to a popular search engine such as Yahoo or Google.These expansion words help to express accurately the user's search intention. The new Web search model can make a common search engine personalized, that is, the search engine can return different search results to different users who input the same keywords. The experimental results show the feasibility and applicability of the presented work.

  1. Community Detection by Neighborhood Similarity

    Institute of Scientific and Technical Information of China (English)

    LIU Xu; XIE Zheng; YI Dong-Yun

    2012-01-01

    Detection of the community structure in a network is important for understanding the structure and dynamics of the network.By exploring the neighborhood of vertices,a local similarity metric is proposed,which can be quickly computed.The resulting similarity matrix retains the same support as the adjacency matrix.Based on local similarity,an agglomerative hierarchical clustering algorithm is proposed for community detection.The algorithm is implemented by an efficient max-heap data structure and runs in nearly linear time,thus is capable of dealing with large sparse networks with tens of thousands of nodes.Experiments on synthesized and real-world networks demonstrate that our method is efficient to detect community structures,and the proposed metric is the most suitable one among all the tested similarity indices.%Detection of the community structure in a network is important for understanding the structure and dynamics of the network. By exploring the neighborhood of vertices, a local similarity metric is proposed, which can be quickly computed. The resulting similarity matrix retains the same support as the adjacency matrix. Based on local similarity, an agglomerative hierarchical clustering algorithm is proposed for community detection. The algorithm is implemented by an efficient max-heap data structure and runs in nearly linear time, thus is capable of dealing with large sparse networks with tens of thousands of nodes. Experiments on synthesized and real-world networks demonstrate that our method is efficient to detect community structures, and the proposed metric is the most suitable one among all the tested similarity indices.

  2. SELF-SIMILAR TRAFFIC GENERATOR

    OpenAIRE

    Linawati Linawati; I Made Suartika

    2009-01-01

    Network traffic generator can be produced using OPNET. OPNET generates the traffic as explicit traffic or background traffic. This paper demonstrates generating traffic in OPNET 7.0 as background traffic. The traffi generator that was simulated is self-similar traffic with different Hurst parameter. The simulation results proved that OPNET with background traffic function can be as a qualified self-similar traffic generator. These results can help in investigating and analysing network perfor...

  3. Similarity measures for protein ensembles

    DEFF Research Database (Denmark)

    Lindorff-Larsen, Kresten; Ferkinghoff-Borg, Jesper

    2009-01-01

    Analyses of similarities and changes in protein conformation can provide important information regarding protein function and evolution. Many scores, including the commonly used root mean square deviation, have therefore been developed to quantify the similarities of different protein conformations...... a synthetic example from molecular dynamics simulations. We then apply the algorithms to revisit the problem of ensemble averaging during structure determination of proteins, and find that an ensemble refinement method is able to recover the correct distribution of conformations better than standard single...

  4. Self-Similar Isentropic Implosions

    Energy Technology Data Exchange (ETDEWEB)

    Rodriguez, M.; Amable, L.

    1980-07-01

    The self-similar compression of an isentropic spherical gas pellet Is analyzed for large values of the ratio of the final to initial densities. An asymptotic analysis provides the solution corresponding to a prescribed value of the final density when it is high. In addition an approximate solution is given when the specific heat ratio is not constant. The time evolution of the pressure on the outer surface leading to the self-similar solutions, is calculated for large density ratios. (Author)

  5. Molecular similarity of MDR inhibitors

    OpenAIRE

    Simon Gibbons; Mire Zloh

    2004-01-01

    Abstract: The molecular similarity of multidrug resistance (MDR) inhibitors was evaluated using the point centred atom charge approach in an attempt to find some common features of structurally unrelated inhibitors. A series of inhibitors of bacterial MDR were studied and there is a high similarity between these in terms of their shape, presence and orientation of aromatic ring moieties. A comparison of the lipophilic properties of these molecules has also been conducted suggesting that this ...

  6. Representation is representation of similarities.

    Science.gov (United States)

    Edelman, S

    1998-08-01

    Advanced perceptual systems are faced with the problem of securing a principled (ideally, veridical) relationship between the world and its internal representation. I propose a unified approach to visual representation, addressing the need for superordinate and basic-level categorization and for the identification of specific instances of familiar categories. According to the proposed theory, a shape is represented internally by the responses of a small number of tuned modules, each broadly selective for some reference shape, whose similarity to the stimulus it measures. This amounts to embedding the stimulus in a low-dimensional proximal shape space spanned by the outputs of the active modules. This shape space supports representations of distal shape similarities that are veridical as Shepard's (1968) second-order isomorphisms (i.e., correspondence between distal and proximal similarities among shapes, rather than between distal shapes and their proximal representations). Representation in terms of similarities to reference shapes supports processing (e.g., discrimination) of shapes that are radically different from the reference ones, without the need for the computationally problematic decomposition into parts required by other theories. Furthermore, a general expression for similarity between two stimuli, based on comparisons to reference shapes, can be used to derive models of perceived similarity ranging from continuous, symmetric, and hierarchical ones, as in multidimensional scaling (Shepard 1980), to discrete and nonhierarchical ones, as in the general contrast models (Shepard & Arabie 1979; Tversky 1977). PMID:10097019

  7. Contextual Factors for Finding Similar Experts

    DEFF Research Database (Denmark)

    Hofmann, Katja; Balog, Krisztian; Bogers, Toine;

    2010-01-01

    Expertise-seeking research studies how people search for expertise and choose whom to contact in the context of a specific task. An important outcome are models that identify factors that influence expert finding. Expertise retrieval addresses the same problem, expert finding, but from a system......-seeking models, are rarely taken into account. In this article, we extend content-based expert-finding approaches with contextual factors that have been found to influence human expert finding. We focus on a task of science communicators in a knowledge-intensive environment, the task of finding similar experts......, given an example expert. Our approach combines expertise-seeking and retrieval research. First, we conduct a user study to identify contextual factors that may play a role in the studied task and environment. Then, we design expert retrieval models to capture these factors. We combine these with content...

  8. Practical fulltext search in medical records

    Directory of Open Access Journals (Sweden)

    Vít Volšička

    2015-09-01

    Full Text Available Performing a search through previously existing documents, including medical reports, is an integral part of acquiring new information and educational processes. Unfortunately, finding relevant information is not always easy, since many documents are saved in free text formats, thereby making it difficult to search through them. A full-text search is a viable solution for searching through documents. The full-text search makes it possible to efficiently search through large numbers of documents and to find those that contain specific search phrases in a short time. All leading database systems currently offer full-text search, but some do not support the complex morphology of the Czech language. Apache Solr provides full support options and some full-text libraries. This programme provides the good support of the Czech language in the basic installation, and a wide range of settings and options for its deployment over any platform. The library had been satisfactorily tested using real data from the hospitals. Solr provided useful, fast, and accurate searches. However, there is still a need to make adjustments in order to receive effective search results, particularly by correcting typographical errors made not only in the text, but also when entering words in the search box and creating a list of frequently used abbreviations and synonyms for more accurate results.

  9. Accurate hydrocarbon estimates attained with radioactive isotope

    International Nuclear Information System (INIS)

    To make accurate economic evaluations of new discoveries, an oil company needs to know how much gas and oil a reservoir contains. The porous rocks of these reservoirs are not completely filled with gas or oil, but contain a mixture of gas, oil and water. It is extremely important to know what volume percentage of this water--called connate water--is contained in the reservoir rock. The percentage of connate water can be calculated from electrical resistivity measurements made downhole. The accuracy of this method can be improved if a pure sample of connate water can be analyzed or if the chemistry of the water can be determined by conventional logging methods. Because of the similarity of the mud filtrate--the water in a water-based drilling fluid--and the connate water, this is not always possible. If the oil company cannot distinguish between connate water and mud filtrate, its oil-in-place calculations could be incorrect by ten percent or more. It is clear that unless an oil company can be sure that a sample of connate water is pure, or at the very least knows exactly how much mud filtrate it contains, its assessment of the reservoir's water content--and consequently its oil or gas content--will be distorted. The oil companies have opted for the Repeat Formation Tester (RFT) method. Label the drilling fluid with small doses of tritium--a radioactive isotope of hydrogen--and it will be easy to detect and quantify in the sample

  10. Combinatorial Approaches to Accurate Identification of Orthologous Genes

    OpenAIRE

    Shi, Guanqun

    2011-01-01

    The accurate identification of orthologous genes across different species is a critical and challenging problem in comparative genomics and has a wide spectrum of biological applications including gene function inference, evolutionary studies and systems biology. During the past several years, many methods have been proposed for ortholog assignment based on sequence similarity, phylogenetic approaches, synteny information, and genome rearrangement. Although these methods share many commonly a...

  11. Similarity measures for face recognition

    CERN Document Server

    Vezzetti, Enrico

    2015-01-01

    Face recognition has several applications, including security, such as (authentication and identification of device users and criminal suspects), and in medicine (corrective surgery and diagnosis). Facial recognition programs rely on algorithms that can compare and compute the similarity between two sets of images. This eBook explains some of the similarity measures used in facial recognition systems in a single volume. Readers will learn about various measures including Minkowski distances, Mahalanobis distances, Hansdorff distances, cosine-based distances, among other methods. The book also summarizes errors that may occur in face recognition methods. Computer scientists "facing face" and looking to select and test different methods of computing similarities will benefit from this book. The book is also useful tool for students undertaking computer vision courses.

  12. Search Engines Selection Based on Relevance Terms%基于相关术语集的搜索引擎选择

    Institute of Scientific and Technical Information of China (English)

    欧洁

    2003-01-01

    Metasearch can effectively search distributed immense electronic resources. It is built on top of severalsearch engines, providing user with uniform access to these engines. Metasearch first passes user's query to underly-ing useful search engines, and then collects and reorganizes the results from the search engines used. It is calledsearch engines selection when metasearch selects underlying useful search engines. In this paper, we present a statis-tical method based on relevance terms to estimate the usefulness of a search engine for any given query, which is suit-able for both Boolean query and vector query. Experimental results indicate that the proposed estimation method isquite accurate, especially when the critical similarity is high between the query and the results.

  13. Interfacial Molecular Searching Using Forager Dynamics

    Science.gov (United States)

    Monserud, Jon H.; Schwartz, Daniel K.

    2016-03-01

    Many biological and technological systems employ efficient non-Brownian intermittent search strategies where localized searches alternate with long flights. Coincidentally, molecular species exhibit intermittent behavior at the solid-liquid interface, where periods of slow motion are punctuated by fast flights through the liquid phase. Single-molecule tracking was used here to observe the interfacial search process of DNA for complementary DNA. Measured search times were qualitatively consistent with an intermittent-flight model, and ˜10 times faster than equivalent Brownian searches, suggesting that molecular searches for reactive sites benefit from similar efficiencies as biological organisms.

  14. Similarity Measures for Comparing Biclusterings.

    Science.gov (United States)

    Horta, Danilo; Campello, Ricardo J G B

    2014-01-01

    The comparison of ordinary partitions of a set of objects is well established in the clustering literature, which comprehends several studies on the analysis of the properties of similarity measures for comparing partitions. However, similarity measures for clusterings are not readily applicable to biclusterings, since each bicluster is a tuple of two sets (of rows and columns), whereas a cluster is only a single set (of rows). Some biclustering similarity measures have been defined as minor contributions in papers which primarily report on proposals and evaluation of biclustering algorithms or comparative analyses of biclustering algorithms. The consequence is that some desirable properties of such measures have been overlooked in the literature. We review 14 biclustering similarity measures. We define eight desirable properties of a biclustering measure, discuss their importance, and prove which properties each of the reviewed measures has. We show examples drawn and inspired from important studies in which several biclustering measures convey misleading evaluations due to the absence of one or more of the discussed properties. We also advocate the use of a more general comparison approach that is based on the idea of transforming the original problem of comparing biclusterings into an equivalent problem of comparing clustering partitions with overlapping clusters. PMID:26356865

  15. Sparse Similarity-Based Fisherfaces

    DEFF Research Database (Denmark)

    Fagertun, Jens; Gomez, David Delgado; Hansen, Mads Fogtmann;

    2011-01-01

    intensities are used by Sparse Principal Component Analysis and Fisher Linear Discriminant Analysis to assign a one dimensional subspace projection to each person belonging to a reference data set. Experimental results performed in the AR dataset show that Similarity-based Fisherfaces in a sparse version can...

  16. HOW DISSIMILARLY SIMILAR ARE BIOSIMILARS?

    Directory of Open Access Journals (Sweden)

    Ramshankar Vijayalakshmi

    2012-05-01

    Full Text Available Recently Biopharmaceuticals are the new chemotherapeutical agents that are called as “Biosimilars” or “follow on protein products” by the European Medicines Agency (EMA and the American regulatory agencies (Food and Drug Administration respectively. Biosimilars are extremely similar to the reference molecule but not identical, however close their similarities may be. A regulatory framework is therefore in place to assess the application for marketing authorisation of biosimilars. When a biosimilar is similar to the reference biopharmaceutical in terms of safety, quality, and efficacy, it can be registered. It is important to document data from clinical trials with a view of similar safety and efficacy. If the development time for a generic medicine is around 3 years, a biosimilar takes about 6-9 years. Generic medicines need to demonstrate bioequivalence only unlike biosimilars that need to conduct phase I and Phase III clinical trials. In this review, different biosimilars that are already being used successfully in the field on Oncology is discussed. Their similarity, differences and guidelines to be followed before a clinically informed decision to be taken, is discussed. More importantly the regulatory guidelines that are operational in India with a work flow of making a biosimilar with relevant dos and dont’s are discussed. For a large populous country like India, where with improved treatments in all sectors including oncology, our ageing population is increasing. For the health care of this sector, we need more newer, cheaper and effective biosimilars in the market. It becomes therefore important to understand the regulatory guidelines and steps to come up with more biosimilars for the existing population and also more information is mandatory for the practicing clinicians to translate these effectively into clinical practice.

  17. AN FFT-BASED SELF-SIMILAR TRAFFIC GENERATOR

    Institute of Scientific and Technical Information of China (English)

    施建俊; 薛质; 诸鸿文

    2001-01-01

    The self-similarity of the network traffic has great influences on the performance. But there are few analytical or even numerical solutions for such a model. So simulation becomes the most efficient method for research. Fractal Gaussian noise (FGN) is the most popularly used self-similar model. This paper presented an FGN generator based on fast Fourier transform (FFT). The study indicates that this algorithm is fairly fast and accurate.

  18. Turning Search into Knowledge Management.

    Science.gov (United States)

    Kaufman, David

    2002-01-01

    Discussion of knowledge management for electronic data focuses on creating a high quality similarity ranking algorithm. Topics include similarity ranking and unstructured data management; searching, categorization, and summarization of documents; query evaluation; considering sentences in addition to keywords; and vector models. (LRW)

  19. Intelligent Search Technology Combining Semantic Grid and Clustering

    Directory of Open Access Journals (Sweden)

    Cuncun Wei

    2013-08-01

    Full Text Available It is a critical problem of P2P network about how to efficiently and accurately search resources on P2P network. This thesis mainly starts from improving query efficiency to establish an intelligent search framework. On the basis of Gnutella-flooding search technology, it applies theories of semantic ontology search combined with semantic hash routing table technology and searches accurate answer from the resource library through problem traversal query in the network and node provided in the routing table. Meanwhile, in view of the flaws of the current structuralized and non-structuralized P2P network, it applies a hierarchical clustering method to form a hierarchical semantic web through semantic clustering in the node, domain clustering and global clustering, etc. Experiment shows that this framework could well improve search, query efficiency and fault tolerance in P2P environment, extend and understand user query demands and reach the aim of accurate search.

  20. Retrieval of similar chess positions

    OpenAIRE

    Ganguly, Debasis; LEVELING, JOHANNES; Jones, Gareth J.F.

    2014-01-01

    We address the problem of retrieving chess game positions similar to a given query position from a collection of archived chess games. We investigate this problem from an information retrieval (IR) perspective. The advantage of our proposed IR-based approach is that it allows using the standard inverted organization of stored chess positions, leading to an ecient retrieval. Moreover, in contrast to retrieving exactly identical board positions, the IR-based approach is able to provide approxim...

  1. Internet Search Engines

    OpenAIRE

    Fatmaa El Zahraa Mohamed Abdou

    2004-01-01

    A general study about the internet search engines, the study deals main 7 points; the differance between search engines and search directories, components of search engines, the percentage of sites covered by search engines, cataloging of sites, the needed time for sites appearance in search engines, search capabilities, and types of search engines.

  2. Internet Search Engines

    Directory of Open Access Journals (Sweden)

    Fatmaa El Zahraa Mohamed Abdou

    2004-09-01

    Full Text Available A general study about the internet search engines, the study deals main 7 points; the differance between search engines and search directories, components of search engines, the percentage of sites covered by search engines, cataloging of sites, the needed time for sites appearance in search engines, search capabilities, and types of search engines.

  3. Landscape similarity, retrieval, and machine mapping of physiographic units

    Science.gov (United States)

    Jasiewicz, Jaroslaw; Netzel, Pawel; Stepinski, Tomasz F.

    2014-09-01

    We introduce landscape similarity - a numerical measure that assesses affinity between two landscapes on the basis of similarity between the patterns of their constituent landform elements. Such a similarity function provides core technology for a landscape search engine - an algorithm that parses the topography of a study area and finds all places with landscapes broadly similar to a landscape template. A landscape search can yield answers to a query in real time, enabling a highly effective means to explore large topographic datasets. In turn, a landscape search facilitates auto-mapping of physiographic units within a study area. The country of Poland serves as a test bed for these novel concepts. The topography of Poland is given by a 30 m resolution DEM. The geomorphons method is applied to this DEM to classify the topography into ten common types of landform elements. A local landscape is represented by a square tile cut out of a map of landform elements. A histogram of cell-pair features is used to succinctly encode the composition and texture of a pattern within a local landscape. The affinity between two local landscapes is assessed using the Wave-Hedges similarity function applied to the two corresponding histograms. For a landscape search the study area is organized into a lattice of local landscapes. During the search the algorithm calculates the similarity between each local landscape and a given query. Our landscape search for Poland is implemented as a GeoWeb application called TerraEx-Pl and is available at http://sil.uc.edu/. Given a sample, or a number of samples, from a target physiographic unit the landscape search delineates this unit using the principles of supervised machine learning. Repeating this procedure for all units yields a complete physiographic map. The application of this methodology to topographic data of Poland results in the delineation of nine physiographic units. The resultant map bears a close resemblance to a conventional

  4. SPATIO-TEXTUAL SIMILARITY JOIN

    Directory of Open Access Journals (Sweden)

    Ch Shylaja and Supreethi K.P

    2015-07-01

    Full Text Available Data mining is the process of discovering interesting patterns and knowledge from large amounts of data. Spatial databases store large space related data, such as maps, preprocessed remote sensing or medical imaging data. Modern mobile phones and mobile devices are equipped with GPS devices; this is the reason for the Location based services to gain significant attention. These Location based services generate large amounts of spatio- textual data which contain both spatial location and textual description. The spatiotextual objects have different representations because of deviations in GPS or due to different user descriptions. This calls for the need of efficient methods to integrate spatio-textual data. Spatio-textual similarity join meets this need. Spatio-textual similarity join: Given two sets of spatio-textual data, it finds all the similar pairs. Filter and refine framework will be developed to device the algorithms. The prefix filter technique will be extended to generate spatial and textual signatures and inverted indexes will be built on top of these signatures. Candidate pairs will be found using these indexes. Finally the candidate pairs will be refined to get the result. MBR-prefix based signature will be used to prune dissimilar objects. Hybrid signature will be used to support spatial and textual pruning simultaneously.

  5. Roget's Thesaurus and Semantic Similarity

    CERN Document Server

    Jarmasz, Mario

    2012-01-01

    We have implemented a system that measures semantic similarity using a computerized 1987 Roget's Thesaurus, and evaluated it by performing a few typical tests. We compare the results of these tests with those produced by WordNet-based similarity measures. One of the benchmarks is Miller and Charles' list of 30 noun pairs to which human judges had assigned similarity measures. We correlate these measures with those computed by several NLP systems. The 30 pairs can be traced back to Rubenstein and Goodenough's 65 pairs, which we have also studied. Our Roget's-based system gets correlations of .878 for the smaller and .818 for the larger list of noun pairs; this is quite close to the .885 that Resnik obtained when he employed humans to replicate the Miller and Charles experiment. We further evaluate our measure by using Roget's and WordNet to answer 80 TOEFL, 50 ESL and 300 Reader's Digest questions: the correct synonym must be selected amongst a group of four words. Our system gets 78.75%, 82.00% and 74.33% of ...

  6. Gait Recognition Using Image Self-Similarity

    Directory of Open Access Journals (Sweden)

    Cutler Ross G

    2004-01-01

    Full Text Available Gait is one of the few biometrics that can be measured at a distance, and is hence useful for passive surveillance as well as biometric applications. Gait recognition research is still at its infancy, however, and we have yet to solve the fundamental issue of finding gait features which at once have sufficient discrimination power and can be extracted robustly and accurately from low-resolution video. This paper describes a novel gait recognition technique based on the image self-similarity of a walking person. We contend that the similarity plot encodes a projection of gait dynamics. It is also correspondence-free, robust to segmentation noise, and works well with low-resolution video. The method is tested on multiple data sets of varying sizes and degrees of difficulty. Performance is best for fronto-parallel viewpoints, whereby a recognition rate of 98% is achieved for a data set of 6 people, and 70% for a data set of 54 people.

  7. Laboratory Building for Accurate Determination of Plutonium

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    <正>The accurate determination of plutonium is one of the most important assay techniques of nuclear fuel, also the key of the chemical measurement transfer and the base of the nuclear material balance. An

  8. Rotational invariant similarity measurement for content-based image indexing

    Science.gov (United States)

    Ro, Yong M.; Yoo, Kiwon

    2000-04-01

    We propose a similarity matching technique for contents based image retrieval. The proposed technique is invariant from rotated images. Since image contents for image indexing and retrieval would be arbitrarily extracted from still image or key frame of video, the rotation invariant property of feature description of image is important for general application of contents based image indexing and retrieval. In this paper, we propose a rotation invariant similarity measurement in cooperating with texture featuring base on HVS. To simplify computational complexity, we employed hierarchical similarity distance searching. To verify the method, experiments with MPEG-7 data set are performed.

  9. A Short Survey of Document Structure Similarity Algorithms

    Energy Technology Data Exchange (ETDEWEB)

    Buttler, D

    2004-02-27

    This paper provides a brief survey of document structural similarity algorithms, including the optimal Tree Edit Distance algorithm and various approximation algorithms. The approximation algorithms include the simple weighted tag similarity algorithm, Fourier transforms of the structure, and a new application of the shingle technique to structural similarity. We show three surprising results. First, the Fourier transform technique proves to be the least accurate of any of approximation algorithms, while also being slowest. Second, optimal Tree Edit Distance algorithms may not be the best technique for clustering pages from different sites. Third, the simplest approximation to structure may be the most effective and efficient mechanism for many applications.

  10. Mechanisms for similarity based cooperation

    Science.gov (United States)

    Traulsen, A.

    2008-06-01

    Cooperation based on similarity has been discussed since Richard Dawkins introduced the term “green beard” effect. In these models, individuals cooperate based on an aribtrary signal (or tag) such as the famous green beard. Here, two different models for such tag based cooperation are analysed. As neutral drift is important in both models, a finite population framework is applied. The first model, which we term “cooperative tags” considers a situation in which groups of cooperators are formed by some joint signal. Defectors adopting the signal and exploiting the group can lead to a breakdown of cooperation. In this case, conditions are derived under which the average abundance of the more cooperative strategy exceeds 50%. The second model considers a situation in which individuals start defecting towards others that are not similar to them. This situation is termed “defective tags”. It is shown that in this case, individuals using tags to cooperate exclusively with their own kind dominate over unconditional cooperators.

  11. Invariant Image Watermarking Using Accurate Zernike Moments

    Directory of Open Access Journals (Sweden)

    Ismail A. Ismail

    2010-01-01

    Full Text Available problem statement: Digital image watermarking is the most popular method for image authentication, copyright protection and content description. Zernike moments are the most widely used moments in image processing and pattern recognition. The magnitudes of Zernike moments are rotation invariant so they can be used just as a watermark signal or be further modified to carry embedded data. The computed Zernike moments in Cartesian coordinate are not accurate due to geometrical and numerical error. Approach: In this study, we employed a robust image-watermarking algorithm using accurate Zernike moments. These moments are computed in polar coordinate, where both approximation and geometric errors are removed. Accurate Zernike moments are used in image watermarking and proved to be robust against different kind of geometric attacks. The performance of the proposed algorithm is evaluated using standard images. Results: Experimental results show that, accurate Zernike moments achieve higher degree of robustness than those approximated ones against rotation, scaling, flipping, shearing and affine transformation. Conclusion: By computing accurate Zernike moments, the embedded bits watermark can be extracted at low error rate.

  12. Accurate discrimination of conserved coding and non-coding regions through multiple indicators of evolutionary dynamics

    Directory of Open Access Journals (Sweden)

    Pesole Graziano

    2009-09-01

    Full Text Available Abstract Background The conservation of sequences between related genomes has long been recognised as an indication of functional significance and recognition of sequence homology is one of the principal approaches used in the annotation of newly sequenced genomes. In the context of recent findings that the number non-coding transcripts in higher organisms is likely to be much higher than previously imagined, discrimination between conserved coding and non-coding sequences is a topic of considerable interest. Additionally, it should be considered desirable to discriminate between coding and non-coding conserved sequences without recourse to the use of sequence similarity searches of protein databases as such approaches exclude the identification of novel conserved proteins without characterized homologs and may be influenced by the presence in databases of sequences which are erroneously annotated as coding. Results Here we present a machine learning-based approach for the discrimination of conserved coding sequences. Our method calculates various statistics related to the evolutionary dynamics of two aligned sequences. These features are considered by a Support Vector Machine which designates the alignment coding or non-coding with an associated probability score. Conclusion We show that our approach is both sensitive and accurate with respect to comparable methods and illustrate several situations in which it may be applied, including the identification of conserved coding regions in genome sequences and the discrimination of coding from non-coding cDNA sequences.

  13. Performance Indexes: Similarities and Differences

    Directory of Open Access Journals (Sweden)

    André Machado Caldeira

    2013-06-01

    Full Text Available The investor of today is more rigorous on monitoring a financial assets portfolio. He no longer thinks only in terms of the expected return (one dimension, but in terms of risk-return (two dimensions. Thus new perception is more complex, since the risk measurement can vary according to anyone’s perception; some use the standard deviation for that, others disagree with this measure by proposing others. In addition to this difficulty, there is the problem of how to consider these two dimensions. The objective of this essay is to study the main performance indexes through an empirical study in order to verify the differences and similarities for some of the selected assets. One performance index proposed in Caldeira (2005 shall be included in this analysis.

  14. Features Based Text Similarity Detection

    CERN Document Server

    Kent, Chow Kok

    2010-01-01

    As the Internet help us cross cultural border by providing different information, plagiarism issue is bound to arise. As a result, plagiarism detection becomes more demanding in overcoming this issue. Different plagiarism detection tools have been developed based on various detection techniques. Nowadays, fingerprint matching technique plays an important role in those detection tools. However, in handling some large content articles, there are some weaknesses in fingerprint matching technique especially in space and time consumption issue. In this paper, we propose a new approach to detect plagiarism which integrates the use of fingerprint matching technique with four key features to assist in the detection process. These proposed features are capable to choose the main point or key sentence in the articles to be compared. Those selected sentence will be undergo the fingerprint matching process in order to detect the similarity between the sentences. Hence, time and space usage for the comparison process is r...

  15. Fast and accurate determination of modularity and its effect size

    CERN Document Server

    Treviño, Santiago; Del Genio, Charo I; Bassler, Kevin E

    2014-01-01

    We present a fast spectral algorithm for community detection in complex networks. Our method searches for the partition with the maximum value of the modularity via the interplay of several refinement steps that include both agglomeration and division. We validate the accuracy of the algorithm by applying it to several real-world benchmark networks. On all these, our algorithm performs as well or better than any other known polynomial scheme. This allows us to extensively study the modularity distribution in ensembles of Erd\\H{o}s-R\\'enyi networks, producing theoretical predictions for means and variances inclusive of finite-size corrections. Our work provides a way to accurately estimate the effect size of modularity, providing a $z$-score measure of it and enabling a more informative comparison of networks with different numbers of nodes and links.

  16. Accurate numerical solution of compressible, linear stability equations

    Science.gov (United States)

    Malik, M. R.; Chuang, S.; Hussaini, M. Y.

    1982-01-01

    The present investigation is concerned with a fourth order accurate finite difference method and its application to the study of the temporal and spatial stability of the three-dimensional compressible boundary layer flow on a swept wing. This method belongs to the class of compact two-point difference schemes discussed by White (1974) and Keller (1974). The method was apparently first used for solving the two-dimensional boundary layer equations. Attention is given to the governing equations, the solution technique, and the search for eigenvalues. A general purpose subroutine is employed for solving a block tridiagonal system of equations. The computer time can be reduced significantly by exploiting the special structure of two matrices.

  17. Relativistic mergers of black hole binaries have large, similar masses, low spins and are circular

    Science.gov (United States)

    Amaro-Seoane, Pau; Chen, Xian

    2016-05-01

    Gravitational waves are a prediction of general relativity, and with ground-based detectors now running in their advanced configuration, we will soon be able to measure them directly for the first time. Binaries of stellar-mass black holes are among the most interesting sources for these detectors. Unfortunately, the many different parameters associated with the problem make it difficult to promptly produce a large set of waveforms for the search in the data stream. To reduce the number of templates to develop, one must restrict some of the physical parameters to a certain range of values predicted by either (electromagnetic) observations or theoretical modelling. In this work, we show that `hyperstellar' black holes (HSBs) with masses 30 ≲ MBH/M⊙ ≲ 100, i.e black holes significantly larger than the nominal 10 M⊙, will have an associated low value for the spin, i.e. a < 0.5. We prove that this is true regardless of the formation channel, and that when two HSBs build a binary, each of the spin magnitudes is also low, and the binary members have similar masses. We also address the distribution of the eccentricities of HSB binaries in dense stellar systems using a large suite of three-body scattering experiments that include binary-single interactions and long-lived hierarchical systems with a highly accurate integrator, including relativistic corrections up to O(1/c^5). We find that most sources in the detector band will have nearly zero eccentricities. This correlation between large, similar masses, low spin and low eccentricity will help to accelerate the searches for gravitational-wave signals.

  18. Accurate tracking control in LOM application

    Institute of Scientific and Technical Information of China (English)

    2003-01-01

    The fabrication of accurate prototype from CAD model directly in short time depends on the accurate tracking control and reference trajectory planning in (Laminated Object Manufacture) LOM application. An improvement on contour accuracy is acquired by the introduction of a tracking controller and a trajectory generation policy. A model of the X-Y positioning system of LOM machine is developed as the design basis of tracking controller. The ZPETC (Zero Phase Error Tracking Controller) is used to eliminate single axis following error, thus reduce the contour error. The simulation is developed on a Maltab model based on a retrofitted LOM machine and the satisfied result is acquired.

  19. A similarity-based data warehousing environment for medical images.

    Science.gov (United States)

    Teixeira, Jefferson William; Annibal, Luana Peixoto; Felipe, Joaquim Cezar; Ciferri, Ricardo Rodrigues; Ciferri, Cristina Dutra de Aguiar

    2015-11-01

    A core issue of the decision-making process in the medical field is to support the execution of analytical (OLAP) similarity queries over images in data warehousing environments. In this paper, we focus on this issue. We propose imageDWE, a non-conventional data warehousing environment that enables the storage of intrinsic features taken from medical images in a data warehouse and supports OLAP similarity queries over them. To comply with this goal, we introduce the concept of perceptual layer, which is an abstraction used to represent an image dataset according to a given feature descriptor in order to enable similarity search. Based on this concept, we propose the imageDW, an extended data warehouse with dimension tables specifically designed to support one or more perceptual layers. We also detail how to build an imageDW and how to load image data into it. Furthermore, we show how to process OLAP similarity queries composed of a conventional predicate and a similarity search predicate that encompasses the specification of one or more perceptual layers. Moreover, we introduce an index technique to improve the OLAP query processing over images. We carried out performance tests over a data warehouse environment that consolidated medical images from exams of several modalities. The results demonstrated the feasibility and efficiency of our proposed imageDWE to manage images and to process OLAP similarity queries. The results also demonstrated that the use of the proposed index technique guaranteed a great improvement in query processing.

  20. Pentaquark searches with ALICE

    CERN Document Server

    Bobulska, Dana

    2016-01-01

    In this report we present the results of the data analysis for searching for possible invariant mass signals from pentaquarks in the ALICE data. Analysis was based on filtered data from real p-Pb events at psNN=5.02 TeV collected in 2013. The motivation for this project was the recent discovery of pentaquark states by the LHCb collaboration (c ¯ cuud resonance P+ c ) [1]. The search for similar not yet observed pentaquarks is an interesting research topic [2]. In this analysis we searched for a s ¯ suud pentaquark resonance P+ s and its possible decay channel to f meson and proton. The ALICE detector is well suited for the search of certain candidates thanks to its low material budget and strong PID capabilities. Additionally we might expect the production of such particles in ALICE as in heavy-ion and proton-ion collisions the thermal models describes well the particle yields and ratios [3]. Therefore it is reasonable to expect other species of hadrons, including also possible pentaquarks, to be produced w...

  1. Alaska, Gulf spills share similarities

    International Nuclear Information System (INIS)

    The accidental Exxon Valdez oil spill in Alaska and the deliberate dumping of crude oil into the Persian Gulf as a tactic of war contain both glaring differences and surprising similarities. Public reaction and public response was much greater to the Exxon Valdez spill in pristine Prince William Sound than to the war-related tragedy in the Persian Gulf. More than 12,000 workers helped in the Alaskan cleanup; only 350 have been involved in Kuwait. But in both instances, environmental damages appear to be less than anticipated. Natures highly effective self-cleansing action is primarily responsible for minimizing the damages. One positive action growing out of the two incidents is increased international cooperation and participation in oil-spill clean-up efforts. In 1990, in the aftermath of the Exxon Valdez spill, 94 nations signed an international accord on cooperation in future spills. The spills can be historic environmental landmarks leading to creation of more sophisticated response systems worldwide

  2. Relativistic Self-similar Disks

    CERN Document Server

    Cai, M J; Cai, Mike J.; Shu, Frank H.

    2002-01-01

    We formulate and solve by semi-analytic means the axisymmetric equilibria of relativistic self-similar disks of infinitesimal vertical thickness. These disks are supported in the horizontal directions against their self-gravity by a combination of isothermal (two-dimensional) pressure and a flat rotation curve. The dragging of inertial frames restricts possible solutions to rotation speeds that are always less than 0.438 times the speed of light, a result first obtained by Lynden-Bell and Pineault in 1978 for a cold disk. We show that prograde circular orbits of massive test particles exist and are stable for all of our model disks, but retrograde circular orbits cannot be maintained with particle velocities less than the speed of light once the disk develops an ergoregion. We also compute photon trajectories, planar and non-planar, in the resulting spacetime, for disks with and without ergoregions. We find that all photon orbits, except for a set of measure zero, tend to be focused by the gravity of the flat...

  3. A study of Consistency in the Selection of Search Terms and Search Concepts: A Case Study in National Taiwan University

    Directory of Open Access Journals (Sweden)

    Mu-hsuan Huang

    2001-12-01

    Full Text Available This article analyzes the consistency in the selection of search terms and search contents of college and graduate students in National Taiwan University when they are using PsycLIT CD-ROM database. 31 students conducted pre-assigned searches, doing 59 searches generating 609 search terms. The study finds the consistency in selection of search terms of first level is 22.14% and second level is 35%. These results are similar with others’ researches. About the consistency in search concepts, no matter the overlaps of searched articles or judge relevant articles are lower than other researches. [Article content in Chinese

  4. Accurate source location from P waves scattered by surface topography

    Science.gov (United States)

    Wang, N.; Shen, Y.

    2015-12-01

    Accurate source locations of earthquakes and other seismic events are fundamental in seismology. The location accuracy is limited by several factors, including velocity models, which are often poorly known. In contrast, surface topography, the largest velocity contrast in the Earth, is often precisely mapped at the seismic wavelength (> 100 m). In this study, we explore the use of P-coda waves generated by scattering at surface topography to obtain high-resolution locations of near-surface seismic events. The Pacific Northwest region is chosen as an example. The grid search method is combined with the 3D strain Green's tensor database type method to improve the search efficiency as well as the quality of hypocenter solution. The strain Green's tensor is calculated by the 3D collocated-grid finite difference method on curvilinear grids. Solutions in the search volume are then obtained based on the least-square misfit between the 'observed' and predicted P and P-coda waves. A 95% confidence interval of the solution is also provided as a posterior error estimation. We find that the scattered waves are mainly due to topography in comparison with random velocity heterogeneity characterized by the von Kάrmάn-type power spectral density function. When only P wave data is used, the 'best' solution is offset from the real source location mostly in the vertical direction. The incorporation of P coda significantly improves solution accuracy and reduces its uncertainty. The solution remains robust with a range of random noises in data, un-modeled random velocity heterogeneities, and uncertainties in moment tensors that we tested.

  5. Accurate source location from waves scattered by surface topography

    Science.gov (United States)

    Wang, Nian; Shen, Yang; Flinders, Ashton; Zhang, Wei

    2016-06-01

    Accurate source locations of earthquakes and other seismic events are fundamental in seismology. The location accuracy is limited by several factors, including velocity models, which are often poorly known. In contrast, surface topography, the largest velocity contrast in the Earth, is often precisely mapped at the seismic wavelength (>100 m). In this study, we explore the use of P coda waves generated by scattering at surface topography to obtain high-resolution locations of near-surface seismic events. The Pacific Northwest region is chosen as an example to provide realistic topography. A grid search algorithm is combined with the 3-D strain Green's tensor database to improve search efficiency as well as the quality of hypocenter solutions. The strain Green's tensor is calculated using a 3-D collocated-grid finite difference method on curvilinear grids. Solutions in the search volume are obtained based on the least squares misfit between the "observed" and predicted P and P coda waves. The 95% confidence interval of the solution is provided as an a posteriori error estimation. For shallow events tested in the study, scattering is mainly due to topography in comparison with stochastic lateral velocity heterogeneity. The incorporation of P coda significantly improves solution accuracy and reduces solution uncertainty. The solution remains robust with wide ranges of random noises in data, unmodeled random velocity heterogeneities, and uncertainties in moment tensors. The method can be extended to locate pairs of sources in close proximity by differential waveforms using source-receiver reciprocity, further reducing errors caused by unmodeled velocity structures.

  6. Accurate atomic data for industrial plasma applications

    Energy Technology Data Exchange (ETDEWEB)

    Griesmann, U.; Bridges, J.M.; Roberts, J.R.; Wiese, W.L.; Fuhr, J.R. [National Inst. of Standards and Technology, Gaithersburg, MD (United States)

    1997-12-31

    Reliable branching fraction, transition probability and transition wavelength data for radiative dipole transitions of atoms and ions in plasma are important in many industrial applications. Optical plasma diagnostics and modeling of the radiation transport in electrical discharge plasmas (e.g. in electrical lighting) depend on accurate basic atomic data. NIST has an ongoing experimental research program to provide accurate atomic data for radiative transitions. The new NIST UV-vis-IR high resolution Fourier transform spectrometer has become an excellent tool for accurate and efficient measurements of numerous transition wavelengths and branching fractions in a wide wavelength range. Recently, the authors have also begun to employ photon counting techniques for very accurate measurements of branching fractions of weaker spectral lines with the intent to improve the overall accuracy for experimental branching fractions to better than 5%. They have now completed their studies of transition probabilities of Ne I and Ne II. The results agree well with recent calculations and for the first time provide reliable transition probabilities for many weak intercombination lines.

  7. A New Generalized Similarity-Based Topic Distillation Algorithm

    Institute of Scientific and Technical Information of China (English)

    ZHOU Hongfang; DANG Xiaohui

    2007-01-01

    The procedure of hypertext induced topic search based on a semantic relation model is analyzed, and the reason for the topic drift of HITS algorithm was found to prove that Web pages are projected to a wrong latent semantic basis. A new concept-generalized similarity is introduced and, based on this, a new topic distillation algorithm GSTDA(generalized similarity based topic distillation algorithm) was presented to improve the quality of topic distillation. GSTDA was applied not only to avoid the topic drift, but also to explore relative topics to user query. The experimental results on 10 queries show that GSTDA reduces topic drift rate by 10% to 58% compared to that of HITS(hypertext induced topic search) algorithm, and discovers several relative topics to queries that have multiple meanings.

  8. Improved Scatter Search Using Cuckoo Search

    OpenAIRE

    Ahmed T.Sadiq Al-Obaidi

    2013-01-01

    The Scatter Search (SS) is a deterministic strategy that has been applied successfully to some combinatorial and continuous optimization problems. Cuckoo Search (CS) is heuristic search algorithm which is inspired by the reproduction strategy of cuckoos. This paper presents enhanced scatter search algorithm using CS algorithm. The improvement provides Scatter Search with random exploration for search space of problem and more of diversity and intensification for promising solutions. The origi...

  9. An assessment of orthographic similarity measures for several African languages

    OpenAIRE

    Keet, C. Maria

    2016-01-01

    Natural Language Interfaces and tools such as spellcheckers and Web search in one's own language are known to be useful in ICT-mediated communication. Most languages in Southern Africa are under-resourced, however. Therefore, it would be very useful if both the generic and the few language-specific NLP tools could be reused or easily adapted across languages. This depends on the notion, and extent, of similarity between the languages. We assess this from the angle of orthography and corpora. ...

  10. Feedback about more accurate versus less accurate trials: differential effects on self-confidence and activation.

    Science.gov (United States)

    Badami, Rokhsareh; VaezMousavi, Mohammad; Wulf, Gabriele; Namazizadeh, Mahdi

    2012-06-01

    One purpose of the present study was to examine whether self-confidence or anxiety would be differentially affected byfeedback from more accurate rather than less accurate trials. The second purpose was to determine whether arousal variations (activation) would predict performance. On day 1, participants performed a golf putting task under one of two conditions: one group received feedback on the most accurate trials, whereas another group received feedback on the least accurate trials. On day 2, participants completed an anxiety questionnaire and performed a retention test. Shin conductance level, as a measure of arousal, was determined. The results indicated that feedback about more accurate trials resulted in more effective learning as well as increased self-confidence. Also, activation was a predictor of performance. PMID:22808705

  11. Optimal directed searches for continuous gravitational waves

    OpenAIRE

    Ming, J.; Krishnan, B.; Papa, M.; Aulbert, C.; Fehrmann, H.

    2016-01-01

    Wide parameter space searches for long lived continuous gravitational wave signals are computationally limited. It is therefore critically important that available computational resources are used rationally. In this paper we consider directed searches, i.e. targets for which the sky position is known accurately but the frequency and spindown parameters are completely unknown. Given a list of such potential astrophysical targets, we therefore need to prioritize. On which target(s) should we s...

  12. Visualizing Search Behavior with Adaptive Discriminations

    OpenAIRE

    Cook, Robert G.; Qadri, Muhammad A. J.

    2013-01-01

    We examined different aspects of the visual search behavior of a pigeon using an open-ended, adaptive testing procedure controlled by a genetic algorithm. The animal had to accurately search for and peck a gray target element randomly located from among a variable number of surrounding darker and lighter distractor elements. Display composition was controlled by a genetic algorithm involving the multivariate configuration of different parameters or genes (number of distractors, element size, ...

  13. Accurate estimation of indoor travel times

    DEFF Research Database (Denmark)

    Prentow, Thor Siiger; Blunck, Henrik; Stisen, Allan;

    2014-01-01

    are collected within the building complex. Results indicate that InTraTime is superior with respect to metrics such as deployment cost, maintenance cost and estimation accuracy, yielding an average deviation from actual travel times of 11.7 %. This accuracy was achieved despite using a minimal-effort setup......The ability to accurately estimate indoor travel times is crucial for enabling improvements within application areas such as indoor navigation, logistics for mobile workers, and facility management. In this paper, we study the challenges inherent in indoor travel time estimation, and we propose...... the InTraTime method for accurately estimating indoor travel times via mining of historical and real-time indoor position traces. The method learns during operation both travel routes, travel times and their respective likelihood---both for routes traveled as well as for sub-routes thereof. In...

  14. Accurate Finite Difference Methods for Option Pricing

    OpenAIRE

    Persson, Jonas

    2006-01-01

    Stock options are priced numerically using space- and time-adaptive finite difference methods. European options on one and several underlying assets are considered. These are priced with adaptive numerical algorithms including a second order method and a more accurate method. For American options we use the adaptive technique to price options on one stock with and without stochastic volatility. In all these methods emphasis is put on the control of errors to fulfill predefined tolerance level...

  15. Accurate variational forms for multiskyrmion configurations

    Energy Technology Data Exchange (ETDEWEB)

    Jackson, A.D.; Weiss, C.; Wirzba, A.; Lande, A.

    1989-04-17

    Simple variational forms are suggested for the fields of a single skyrmion on a hypersphere, S/sub 3/(L), and of a face-centered cubic array of skyrmions in flat space, R/sub 3/. The resulting energies are accurate at the level of 0.2%. These approximate field configurations provide a useful alternative to brute-force solutions of the corresponding Euler equations.

  16. Efficient Accurate Context-Sensitive Anomaly Detection

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    For program behavior-based anomaly detection, the only way to ensure accurate monitoring is to construct an efficient and precise program behavior model. A new program behavior-based anomaly detection model,called combined pushdown automaton (CPDA) model was proposed, which is based on static binary executable analysis. The CPDA model incorporates the optimized call stack walk and code instrumentation technique to gain complete context information. Thereby the proposed method can detect more attacks, while retaining good performance.

  17. Accurate phase-shift velocimetry in rock

    Science.gov (United States)

    Shukla, Matsyendra Nath; Vallatos, Antoine; Phoenix, Vernon R.; Holmes, William M.

    2016-06-01

    Spatially resolved Pulsed Field Gradient (PFG) velocimetry techniques can provide precious information concerning flow through opaque systems, including rocks. This velocimetry data is used to enhance flow models in a wide range of systems, from oil behaviour in reservoir rocks to contaminant transport in aquifers. Phase-shift velocimetry is the fastest way to produce velocity maps but critical issues have been reported when studying flow through rocks and porous media, leading to inaccurate results. Combining PFG measurements for flow through Bentheimer sandstone with simulations, we demonstrate that asymmetries in the molecular displacement distributions within each voxel are the main source of phase-shift velocimetry errors. We show that when flow-related average molecular displacements are negligible compared to self-diffusion ones, symmetric displacement distributions can be obtained while phase measurement noise is minimised. We elaborate a complete method for the production of accurate phase-shift velocimetry maps in rocks and low porosity media and demonstrate its validity for a range of flow rates. This development of accurate phase-shift velocimetry now enables more rapid and accurate velocity analysis, potentially helping to inform both industrial applications and theoretical models.

  18. Accurate structural correlations from maximum likelihood superpositions.

    Directory of Open Access Journals (Sweden)

    Douglas L Theobald

    2008-02-01

    Full Text Available The cores of globular proteins are densely packed, resulting in complicated networks of structural interactions. These interactions in turn give rise to dynamic structural correlations over a wide range of time scales. Accurate analysis of these complex correlations is crucial for understanding biomolecular mechanisms and for relating structure to function. Here we report a highly accurate technique for inferring the major modes of structural correlation in macromolecules using likelihood-based statistical analysis of sets of structures. This method is generally applicable to any ensemble of related molecules, including families of nuclear magnetic resonance (NMR models, different crystal forms of a protein, and structural alignments of homologous proteins, as well as molecular dynamics trajectories. Dominant modes of structural correlation are determined using principal components analysis (PCA of the maximum likelihood estimate of the correlation matrix. The correlations we identify are inherently independent of the statistical uncertainty and dynamic heterogeneity associated with the structural coordinates. We additionally present an easily interpretable method ("PCA plots" for displaying these positional correlations by color-coding them onto a macromolecular structure. Maximum likelihood PCA of structural superpositions, and the structural PCA plots that illustrate the results, will facilitate the accurate determination of dynamic structural correlations analyzed in diverse fields of structural biology.

  19. Cerebral fat embolism: Use of MR spectroscopy for accurate diagnosis

    Directory of Open Access Journals (Sweden)

    Laxmi Kokatnur

    2015-01-01

    Full Text Available Cerebral fat embolism (CFE is an uncommon but serious complication following orthopedic procedures. It usually presents with altered mental status, and can be a part of fat embolism syndrome (FES if associated with cutaneous and respiratory manifestations. Because of the presence of other common factors affecting the mental status, particularly in the postoperative period, the diagnosis of CFE can be challenging. Magnetic resonance imaging (MRI of brain typically shows multiple lesions distributed predominantly in the subcortical region, which appear as hyperintense lesions on T2 and diffusion weighted images. Although the location offers a clue, the MRI findings are not specific for CFE. Watershed infarcts, hypoxic encephalopathy, disseminated infections, demyelinating disorders, diffuse axonal injury can also show similar changes on MRI of brain. The presence of fat in these hyperintense lesions, identified by MR spectroscopy as raised lipid peaks will help in accurate diagnosis of CFE. Normal brain tissue or conditions producing similar MRI changes will not show any lipid peak on MR spectroscopy. We present a case of CFE initially misdiagnosed as brain stem stroke based on clinical presentation and cranial computed tomography (CT scan, and later, MR spectroscopy elucidated the accurate diagnosis.

  20. Similarity-based denoising of point-sampled surface

    Institute of Scientific and Technical Information of China (English)

    Ren-fang WANG; Wen-zhi CHEN; San-yuan ZHANG; Yin ZHANG; Xiu-zi YE

    2008-01-01

    A non-local denoising (NLD) algorithm for point-sampled surfaces (PSSs) is presented based on similarities, including geometry intensity and features of sample points. By using the trilateral filtering operator, the differential signal of each sample point is determined and called "geometry intensity". Based on covariance analysis, a regular grid of geometry intensity of a sample point is constructed, and the geometry-intensity similarity of two points is measured according to their grids. Based on mean shift clustering, the PSSs are clustered in terms of the local geometry-features similarity. The smoothed geometry intensity, i.e., offset distance, of the sample point is estimated according to the two similarities. Using the resulting intensity, the noise component from PSSs is finally removed by adjusting the position of each sample point along its own normal direction. Experimental results demonstrate that the algorithm is robust and can produce a more accurate denoising result while having better feature preservation.

  1. Autonomous Search

    CERN Document Server

    Hamadi, Youssef; Saubion, Frédéric

    2012-01-01

    Decades of innovations in combinatorial problem solving have produced better and more complex algorithms. These new methods are better since they can solve larger problems and address new application domains. They are also more complex which means that they are hard to reproduce and often harder to fine-tune to the peculiarities of a given problem. This last point has created a paradox where efficient tools are out of reach of practitioners. Autonomous search (AS) represents a new research field defined to precisely address the above challenge. Its major strength and originality consist in the

  2. Reading and visual search: a developmental study in normal children.

    Directory of Open Access Journals (Sweden)

    Magali Seassau

    Full Text Available Studies dealing with developmental aspects of binocular eye movement behaviour during reading are scarce. In this study we have explored binocular strategies during reading and during visual search tasks in a large population of normal young readers. Binocular eye movements were recorded using an infrared video-oculography system in sixty-nine children (aged 6 to 15 and in a group of 10 adults (aged 24 to 39. The main findings are (i in both tasks the number of progressive saccades (to the right and regressive saccades (to the left decreases with age; (ii the amplitude of progressive saccades increases with age in the reading task only; (iii in both tasks, the duration of fixations as well as the total duration of the task decreases with age; (iv in both tasks, the amplitude of disconjugacy recorded during and after the saccades decreases with age; (v children are significantly more accurate in reading than in visual search after 10 years of age. Data reported here confirms and expands previous studies on children's reading. The new finding is that younger children show poorer coordination than adults, both while reading and while performing a visual search task. Both reading skills and binocular saccades coordination improve with age and children reach a similar level to adults after the age of 10. This finding is most likely related to the fact that learning mechanisms responsible for saccade yoking develop during childhood until adolescence.

  3. Searching chemical space with the Bayesian Idea Generator.

    Science.gov (United States)

    van Hoorn, Willem P; Bell, Andrew S

    2009-10-01

    The Pfizer Global Virtual Library (PGVL) is defined as a set compounds that could be synthesized using validated protocols and monomers. However, it is too large (10(12) compounds) to search by brute-force methods for close analogues of a given input structure. In this paper the Bayesian Idea Generator is described which is based on a novel application of Bayesian statistics to narrow down the search space to a prioritized set of existing library arrays (the default is 16). For each of these libraries the 6 closest neighbors are retrieved from the existing compound file, resulting in a screenable hypothesis of 96 compounds. Using the Bayesian models for library space, the Pfizer file of singleton compounds has been mapped to library space and is optionally searched as well. The method is >99% accurate in retrieving known library provenance from an independent test set. The compounds retrieved strike a balance between similarity and diversity resulting in frequent scaffold hops. Four examples of how the Bayesian Idea Generator has been successfully used in drug discovery are provided. The methodology of the Bayesian Idea Generator can be used for any collection of compounds containing distinct clusters, and an example using compound vendor catalogues has been included.

  4. Reading and visual search: a developmental study in normal children.

    Science.gov (United States)

    Seassau, Magali; Bucci, Maria-Pia

    2013-01-01

    Studies dealing with developmental aspects of binocular eye movement behaviour during reading are scarce. In this study we have explored binocular strategies during reading and during visual search tasks in a large population of normal young readers. Binocular eye movements were recorded using an infrared video-oculography system in sixty-nine children (aged 6 to 15) and in a group of 10 adults (aged 24 to 39). The main findings are (i) in both tasks the number of progressive saccades (to the right) and regressive saccades (to the left) decreases with age; (ii) the amplitude of progressive saccades increases with age in the reading task only; (iii) in both tasks, the duration of fixations as well as the total duration of the task decreases with age; (iv) in both tasks, the amplitude of disconjugacy recorded during and after the saccades decreases with age; (v) children are significantly more accurate in reading than in visual search after 10 years of age. Data reported here confirms and expands previous studies on children's reading. The new finding is that younger children show poorer coordination than adults, both while reading and while performing a visual search task. Both reading skills and binocular saccades coordination improve with age and children reach a similar level to adults after the age of 10. This finding is most likely related to the fact that learning mechanisms responsible for saccade yoking develop during childhood until adolescence.

  5. High Frequency QRS ECG Accurately Detects Cardiomyopathy

    Science.gov (United States)

    Schlegel, Todd T.; Arenare, Brian; Poulin, Gregory; Moser, Daniel R.; Delgado, Reynolds

    2005-01-01

    High frequency (HF, 150-250 Hz) analysis over the entire QRS interval of the ECG is more sensitive than conventional ECG for detecting myocardial ischemia. However, the accuracy of HF QRS ECG for detecting cardiomyopathy is unknown. We obtained simultaneous resting conventional and HF QRS 12-lead ECGs in 66 patients with cardiomyopathy (EF = 23.2 plus or minus 6.l%, mean plus or minus SD) and in 66 age- and gender-matched healthy controls using PC-based ECG software recently developed at NASA. The single most accurate ECG parameter for detecting cardiomyopathy was an HF QRS morphological score that takes into consideration the total number and severity of reduced amplitude zones (RAZs) present plus the clustering of RAZs together in contiguous leads. This RAZ score had an area under the receiver operator curve (ROC) of 0.91, and was 88% sensitive, 82% specific and 85% accurate for identifying cardiomyopathy at optimum score cut-off of 140 points. Although conventional ECG parameters such as the QRS and QTc intervals were also significantly longer in patients than controls (P less than 0.001, BBBs excluded), these conventional parameters were less accurate (area under the ROC = 0.77 and 0.77, respectively) than HF QRS morphological parameters for identifying underlying cardiomyopathy. The total amplitude of the HF QRS complexes, as measured by summed root mean square voltages (RMSVs), also differed between patients and controls (33.8 plus or minus 11.5 vs. 41.5 plus or minus 13.6 mV, respectively, P less than 0.003), but this parameter was even less accurate in distinguishing the two groups (area under ROC = 0.67) than the HF QRS morphologic and conventional ECG parameters. Diagnostic accuracy was optimal (86%) when the RAZ score from the HF QRS ECG and the QTc interval from the conventional ECG were used simultaneously with cut-offs of greater than or equal to 40 points and greater than or equal to 445 ms, respectively. In conclusion 12-lead HF QRS ECG employing

  6. The Search for Directed Intelligence

    CERN Document Server

    Lubin, Philip

    2016-01-01

    We propose a search for sources of directed energy systems such as those now becoming technologically feasible on Earth. Recent advances in our own abilities allow us to foresee our own capability that will radically change our ability to broadcast our presence. We show that systems of this type have the ability to be detected at vast distances and indeed can be detected across the entire horizon. This profoundly changes the possibilities for searches for extra-terrestrial technology advanced civilizations. We show that even modest searches can be extremely effective at detecting or limiting many civilization classes. We propose a search strategy that will observe more than 10 12 stellar and planetary systems with possible extensions to more than 10 20 systems allowing us to test the hypothesis that other similarly or more advanced civilization with this same capability, and are broadcasting, exist.

  7. Mapping of VSG similarities in Trypanosoma brucei.

    Science.gov (United States)

    Weirather, Jason L; Wilson, Mary E; Donelson, John E

    2012-02-01

    The protozoan parasite Trypanosoma brucei switches its variant surface glycoprotein (VSG) to subvert its mammalian hosts' immune responses. The T. brucei genome contains as many as 1600 VSG genes (VSGs), but most are silent noncoding pseudogenes. Only one functional VSG, located in a telomere-linked expression site, is transcribed at a time. Silent VSGs are copied into a VSG expression site through gene conversion. Truncated gene conversion events can generate new mosaic VSGs with segments of sequence identity to other VSGs. To examine the VSG family sub-structure within which these events occur, we combined the available VSG sequences and annotations with scripted BLAST searches to map the relationships among VSGs in the T. brucei genome. Clusters of related VSGs were visualized in 2- and 3-dimensions for different N- and C-terminal regions. Five types of N-termini (N1-N5) were observed, within which gene recombinational events are likely to occur, often with fully-coding 'functional' or 'atypical'VSGs centrally located between more dissimilar VSGs. Members of types N1, N3 and N4 are most closely related in the middle of the N-terminal region, whereas type N2 members are more similar near the N-terminus. Some preference occurs in pairing between specific N- and C-terminal types. Statistical analyses indicated no overall tendency for more related VSGs to be located closer in the genome than less related VSGs, although exceptions were noted. Many potential mosaic gene formation events within each N-terminal type were identified, contrasted by only one possible mosaic gene formation between N-terminal types (N1 and N2). These data suggest that mosaic gene formation is a major contributor to the overall VSG diversity, even though gene recombinational events between members of different N-terminal types occur only rarely. PMID:22079099

  8. A toolbox for representational similarity analysis.

    Directory of Open Access Journals (Sweden)

    Hamed Nili

    2014-04-01

    Full Text Available Neuronal population codes are increasingly being investigated with multivariate pattern-information analyses. A key challenge is to use measured brain-activity patterns to test computational models of brain information processing. One approach to this problem is representational similarity analysis (RSA, which characterizes a representation in a brain or computational model by the distance matrix of the response patterns elicited by a set of stimuli. The representational distance matrix encapsulates what distinctions between stimuli are emphasized and what distinctions are de-emphasized in the representation. A model is tested by comparing the representational distance matrix it predicts to that of a measured brain region. RSA also enables us to compare representations between stages of processing within a given brain or model, between brain and behavioral data, and between individuals and species. Here, we introduce a Matlab toolbox for RSA. The toolbox supports an analysis approach that is simultaneously data- and hypothesis-driven. It is designed to help integrate a wide range of computational models into the analysis of multichannel brain-activity measurements as provided by modern functional imaging and neuronal recording techniques. Tools for visualization and inference enable the user to relate sets of models to sets of brain regions and to statistically test and compare the models using nonparametric inference methods. The toolbox supports searchlight-based RSA, to continuously map a measured brain volume in search of a neuronal population code with a specific geometry. Finally, we introduce the linear-discriminant t value as a measure of representational discriminability that bridges the gap between linear decoding analyses and RSA. In order to demonstrate the capabilities of the toolbox, we apply it to both simulated and real fMRI data. The key functions are equally applicable to other modalities of brain-activity measurement. The

  9. Enhancing Solution Similarity in Multi-Objective Vehicle Routing Problems with Different Demand Periods

    OpenAIRE

    Murata, Tadahiko; Itai, Ryota

    2008-01-01

    In this chapter, we proposed a local search that can be used in a two-fold EMO algorithm for multiple-objective VRPs with different demands. The simulation results show that the proposed method have the fine effectiveness to enhance the similarity of obtained routes for vehicles. Although the local search slightly deteriorates the maximum duration, it improves the similarity of the routes that may decrease the possibility of getting lost the way of drivers. If drivers get lost their ways duri...

  10. Web Search Engines: Search Syntax and Features.

    Science.gov (United States)

    Ojala, Marydee

    2002-01-01

    Presents a chart that explains the search syntax, features, and commands used by the 12 most widely used general Web search engines. Discusses Web standardization, expanded types of content searched, size of databases, and search engines that include both simple and advanced versions. (LRW)

  11. Database search van tijdreeksen, met toepassing in de firma Medtronic

    OpenAIRE

    KELLENS, Tom

    2006-01-01

    In deze thesis richten we ons op het probleem van similarity search van tijdreeksen. Similarity search kan onderverdeeld worden in twee categorieën, nl. whole matching en subsequence matching. Dit laatste kan beschouwd worden als een veralgemening van whole matching. We zullen in dit werk zien hoe deze veralgemening met behulp van sliding window technieken kan gerealiseerd worden. Similarity search is in essentie gebaseerd op een afstandsfunctie. Op basis van het soort af...

  12. INDUS - a composition-based approach for rapid and accurate taxonomic classification of metagenomic sequences

    OpenAIRE

    Mohammed, Monzoorul Haque; Ghosh, Tarini Shankar; Reddy, Rachamalla Maheedhar; Reddy, Chennareddy Venkata Siva Kumar; Singh, Nitin Kumar; Sharmila S Mande

    2011-01-01

    Background Taxonomic classification of metagenomic sequences is the first step in metagenomic analysis. Existing taxonomic classification approaches are of two types, similarity-based and composition-based. Similarity-based approaches, though accurate and specific, are extremely slow. Since, metagenomic projects generate millions of sequences, adopting similarity-based approaches becomes virtually infeasible for research groups having modest computational resources. In this study, we present ...

  13. Accurate measurement of unsteady state fluid temperature

    Science.gov (United States)

    Jaremkiewicz, Magdalena

    2016-07-01

    In this paper, two accurate methods for determining the transient fluid temperature were presented. Measurements were conducted for boiling water since its temperature is known. At the beginning the thermometers are at the ambient temperature and next they are immediately immersed into saturated water. The measurements were carried out with two thermometers of different construction but with the same housing outer diameter equal to 15 mm. One of them is a K-type industrial thermometer widely available commercially. The temperature indicated by the thermometer was corrected considering the thermometers as the first or second order inertia devices. The new design of a thermometer was proposed and also used to measure the temperature of boiling water. Its characteristic feature is a cylinder-shaped housing with the sheath thermocouple located in its center. The temperature of the fluid was determined based on measurements taken in the axis of the solid cylindrical element (housing) using the inverse space marching method. Measurements of the transient temperature of the air flowing through the wind tunnel using the same thermometers were also carried out. The proposed measurement technique provides more accurate results compared with measurements using industrial thermometers in conjunction with simple temperature correction using the inertial thermometer model of the first or second order. By comparing the results, it was demonstrated that the new thermometer allows obtaining the fluid temperature much faster and with higher accuracy in comparison to the industrial thermometer. Accurate measurements of the fast changing fluid temperature are possible due to the low inertia thermometer and fast space marching method applied for solving the inverse heat conduction problem.

  14. New law requires 'medically accurate' lesson plans.

    Science.gov (United States)

    1999-09-17

    The California Legislature has passed a bill requiring all textbooks and materials used to teach about AIDS be medically accurate and objective. Statements made within the curriculum must be supported by research conducted in compliance with scientific methods, and published in peer-reviewed journals. Some of the current lesson plans were found to contain scientifically unsupported and biased information. In addition, the bill requires material to be "free of racial, ethnic, or gender biases." The legislation is supported by a wide range of interests, but opposed by the California Right to Life Education Fund, because they believe it discredits abstinence-only material.

  15. Investigations on Accurate Analysis of Microstrip Reflectarrays

    DEFF Research Database (Denmark)

    Zhou, Min; Sørensen, S. B.; Kim, Oleksiy S.;

    2011-01-01

    An investigation on accurate analysis of microstrip reflectarrays is presented. Sources of error in reflectarray analysis are examined and solutions to these issues are proposed. The focus is on two sources of error, namely the determination of the equivalent currents to calculate the radiation...... pattern, and the inaccurate mutual coupling between array elements due to the lack of periodicity. To serve as reference, two offset reflectarray antennas have been designed, manufactured and measured at the DTUESA Spherical Near-Field Antenna Test Facility. Comparisons of simulated and measured data are...

  16. Accurate diagnosis is essential for amebiasis

    Institute of Scientific and Technical Information of China (English)

    2004-01-01

    @@ Amebiasis is one of the three most common causes of death from parasitic disease, and Entamoeba histolytica is the most widely distributed parasites in the world. Particularly, Entamoeba histolytica infection in the developing countries is a significant health problem in amebiasis-endemic areas with a significant impact on infant mortality[1]. In recent years a world wide increase in the number of patients with amebiasis has refocused attention on this important infection. On the other hand, improving the quality of parasitological methods and widespread use of accurate tecniques have improved our knowledge about the disease.

  17. Universality: Accurate Checks in Dyson's Hierarchical Model

    Science.gov (United States)

    Godina, J. J.; Meurice, Y.; Oktay, M. B.

    2003-06-01

    In this talk we present high-accuracy calculations of the susceptibility near βc for Dyson's hierarchical model in D = 3. Using linear fitting, we estimate the leading (γ) and subleading (Δ) exponents. Independent estimates are obtained by calculating the first two eigenvalues of the linearized renormalization group transformation. We found γ = 1.29914073 ± 10 -8 and, Δ = 0.4259469 ± 10-7 independently of the choice of local integration measure (Ising or Landau-Ginzburg). After a suitable rescaling, the approximate fixed points for a large class of local measure coincide accurately with a fixed point constructed by Koch and Wittwer.

  18. Multimode Process Fault Detection Using Local Neighborhood Similarity Analysis☆

    Institute of Scientific and Technical Information of China (English)

    Xiaogang Deng; Xuemin Tian

    2014-01-01

    Traditional data driven fault detection methods assume unimodal distribution of process data so that they often perform not wel in chemical process with multiple operating modes. In order to monitor the multimode chemical process effectively, this paper presents a novel fault detection method based on local neighborhood similarity analysis (LNSA). In the proposed method, prior process knowledge is not required and only the multimode normal operation data are used to construct a reference dataset. For online monitoring of process state, LNSA applies moving window technique to obtain a current snapshot data window. Then neighborhood searching technique is used to acquire the corresponding local neighborhood data window from the reference dataset. Similarity analysis between snapshot and neighborhood data windows is performed, which includes the calculation of principal component analysis (PCA) similarity factor and distance similarity factor. The PCA similarity factor is to capture the change of data direction while the distance similarity factor is used for monitoring the shift of data center position. Based on these similarity factors, two monitoring statistics are built for multimode process fault detection. Final y a simulated continuous stirred tank system is used to demonstrate the effectiveness of the proposed method. The simulation results show that LNSA can detect multimode process changes effectively and performs better than traditional fault detection methods.

  19. Dependency Similarity, Attraction and Perceived Happiness.

    Science.gov (United States)

    Pandey, Janak

    1978-01-01

    Subjects were asked to evaluate either a similar personality or a dissimilar personality. Subjects rated similar others more positively than dissimilar others and, additionally, perceived similar others as more helpful and sympathetic than dissimilar others. (Author)

  20. A New Efficient Method for Calculating Similarity Between Web Services

    Directory of Open Access Journals (Sweden)

    T. RACHAD

    2014-08-01

    Full Text Available Web services allow communication between heterogeneous systems in a distributed environment. Their enormous success and their increased use led to the fact that thousands of Web services are present on the Internet. This significant number of Web services which not cease to increase has led to problems of the difficulty in locating and classifying web services, these problems are encountered mainly during the operations of web services discovery and substitution. Traditional ways of search based on keywords are not successful in this context, their results do not support the structure of Web services and they consider in their search only the identifiers of the web service description language (WSDL interface elements. The methods based on semantics (WSDLS, OWLS, SAWSDL… which increase the WSDL description of a Web service with a semantic description allow raising partially this problem, but their complexity and difficulty delays their adoption in real cases. Measuring the similarity between the web services interfaces is the most suitable solution for this kind of problems, it will classify available web services so as to know those that best match the searched profile and those that do not match. Thus, the main goal of this work is to study the degree of similarity between any two web services by offering a new method that is more effective than existing works.

  1. Marriage Matters Spousal Similarity in Life Satisfaction

    OpenAIRE

    Schimmack, Ulrich; Richard E. Lucas

    2006-01-01

    Examined the concurrent and cross-lagged spousal similarity in life satisfaction over a 21-year period. Analyses were based on married couples (N = 847) in the German Socio-Economic Panel (SOEP). Concurrent spousal similarity was considerably higher than one-year retest similarity, revealing spousal similarity in the variable component of life satisfac-tion. Spousal similarity systematically decreased with length of retest interval, revealing simi-larity in the changing component of life sati...

  2. Generating personalized web search using semantic context.

    Science.gov (United States)

    Xu, Zheng; Chen, Hai-Yan; Yu, Jie

    2015-01-01

    The "one size fits the all" criticism of search engines is that when queries are submitted, the same results are returned to different users. In order to solve this problem, personalized search is proposed, since it can provide different search results based upon the preferences of users. However, existing methods concentrate more on the long-term and independent user profile, and thus reduce the effectiveness of personalized search. In this paper, the method captures the user context to provide accurate preferences of users for effectively personalized search. First, the short-term query context is generated to identify related concepts of the query. Second, the user context is generated based on the click through data of users. Finally, a forgetting factor is introduced to merge the independent user context in a user session, which maintains the evolution of user preferences. Experimental results fully confirm that our approach can successfully represent user context according to individual user information needs.

  3. Accurate radiative transfer calculations for layered media.

    Science.gov (United States)

    Selden, Adrian C

    2016-07-01

    Simple yet accurate results for radiative transfer in layered media with discontinuous refractive index are obtained by the method of K-integrals. These are certain weighted integrals applied to the angular intensity distribution at the refracting boundaries. The radiative intensity is expressed as the sum of the asymptotic angular intensity distribution valid in the depth of the scattering medium and a transient term valid near the boundary. Integrated boundary equations are obtained, yielding simple linear equations for the intensity coefficients, enabling the angular emission intensity and the diffuse reflectance (albedo) and transmittance of the scattering layer to be calculated without solving the radiative transfer equation directly. Examples are given of half-space, slab, interface, and double-layer calculations, and extensions to multilayer systems are indicated. The K-integral method is orders of magnitude more accurate than diffusion theory and can be applied to layered scattering media with a wide range of scattering albedos, with potential applications to biomedical and ocean optics. PMID:27409700

  4. How Accurately can we Calculate Thermal Systems?

    Energy Technology Data Exchange (ETDEWEB)

    Cullen, D; Blomquist, R N; Dean, C; Heinrichs, D; Kalugin, M A; Lee, M; Lee, Y; MacFarlan, R; Nagaya, Y; Trkov, A

    2004-04-20

    I would like to determine how accurately a variety of neutron transport code packages (code and cross section libraries) can calculate simple integral parameters, such as K{sub eff}, for systems that are sensitive to thermal neutron scattering. Since we will only consider theoretical systems, we cannot really determine absolute accuracy compared to any real system. Therefore rather than accuracy, it would be more precise to say that I would like to determine the spread in answers that we obtain from a variety of code packages. This spread should serve as an excellent indicator of how accurately we can really model and calculate such systems today. Hopefully, eventually this will lead to improvements in both our codes and the thermal scattering models that they use in the future. In order to accomplish this I propose a number of extremely simple systems that involve thermal neutron scattering that can be easily modeled and calculated by a variety of neutron transport codes. These are theoretical systems designed to emphasize the effects of thermal scattering, since that is what we are interested in studying. I have attempted to keep these systems very simple, and yet at the same time they include most, if not all, of the important thermal scattering effects encountered in a large, water-moderated, uranium fueled thermal system, i.e., our typical thermal reactors.

  5. Accurate basis set truncation for wavefunction embedding

    Science.gov (United States)

    Barnes, Taylor A.; Goodpaster, Jason D.; Manby, Frederick R.; Miller, Thomas F.

    2013-07-01

    Density functional theory (DFT) provides a formally exact framework for performing embedded subsystem electronic structure calculations, including DFT-in-DFT and wavefunction theory-in-DFT descriptions. In the interest of efficiency, it is desirable to truncate the atomic orbital basis set in which the subsystem calculation is performed, thus avoiding high-order scaling with respect to the size of the MO virtual space. In this study, we extend a recently introduced projection-based embedding method [F. R. Manby, M. Stella, J. D. Goodpaster, and T. F. Miller III, J. Chem. Theory Comput. 8, 2564 (2012)], 10.1021/ct300544e to allow for the systematic and accurate truncation of the embedded subsystem basis set. The approach is applied to both covalently and non-covalently bound test cases, including water clusters and polypeptide chains, and it is demonstrated that errors associated with basis set truncation are controllable to well within chemical accuracy. Furthermore, we show that this approach allows for switching between accurate projection-based embedding and DFT embedding with approximate kinetic energy (KE) functionals; in this sense, the approach provides a means of systematically improving upon the use of approximate KE functionals in DFT embedding.

  6. Accurate pattern registration for integrated circuit tomography

    Energy Technology Data Exchange (ETDEWEB)

    Levine, Zachary H.; Grantham, Steven; Neogi, Suneeta; Frigo, Sean P.; McNulty, Ian; Retsch, Cornelia C.; Wang, Yuxin; Lucatorto, Thomas B.

    2001-07-15

    As part of an effort to develop high resolution microtomography for engineered structures, a two-level copper integrated circuit interconnect was imaged using 1.83 keV x rays at 14 angles employing a full-field Fresnel zone plate microscope. A major requirement for high resolution microtomography is the accurate registration of the reference axes in each of the many views needed for a reconstruction. A reconstruction with 100 nm resolution would require registration accuracy of 30 nm or better. This work demonstrates that even images that have strong interference fringes can be used to obtain accurate fiducials through the use of Radon transforms. We show that we are able to locate the coordinates of the rectilinear circuit patterns to 28 nm. The procedure is validated by agreement between an x-ray parallax measurement of 1.41{+-}0.17 {mu}m and a measurement of 1.58{+-}0.08 {mu}m from a scanning electron microscope image of a cross section.

  7. Accurate determination of characteristic relative permeability curves

    Science.gov (United States)

    Krause, Michael H.; Benson, Sally M.

    2015-09-01

    A recently developed technique to accurately characterize sub-core scale heterogeneity is applied to investigate the factors responsible for flowrate-dependent effective relative permeability curves measured on core samples in the laboratory. The dependency of laboratory measured relative permeability on flowrate has long been both supported and challenged by a number of investigators. Studies have shown that this apparent flowrate dependency is a result of both sub-core scale heterogeneity and outlet boundary effects. However this has only been demonstrated numerically for highly simplified models of porous media. In this paper, flowrate dependency of effective relative permeability is demonstrated using two rock cores, a Berea Sandstone and a heterogeneous sandstone from the Otway Basin Pilot Project in Australia. Numerical simulations of steady-state coreflooding experiments are conducted at a number of injection rates using a single set of input characteristic relative permeability curves. Effective relative permeability is then calculated from the simulation data using standard interpretation methods for calculating relative permeability from steady-state tests. Results show that simplified approaches may be used to determine flowrate-independent characteristic relative permeability provided flow rate is sufficiently high, and the core heterogeneity is relatively low. It is also shown that characteristic relative permeability can be determined at any typical flowrate, and even for geologically complex models, when using accurate three-dimensional models.

  8. The Hofmethode: Computing Semantic Similarities between E-Learning Products

    Directory of Open Access Journals (Sweden)

    Oliver Michel

    2009-11-01

    Full Text Available The key task in building useful e-learning repositories is to develop a system with an algorithm allowing users to retrieve information that corresponds to their specific requirements. To achieve this, products (or their verbal descriptions, i.e. presented in metadata need to be compared and structured according to the results of this comparison. Such structuring is crucial insofar as there are many search results that correspond to the entered keyword. The Hofmethode is an algorithm (based on psychological considerations to compute semantic similarities between texts and therefore offer a way to compare e-learning products. The computed similarity values are used to build semantic maps in which the products are visually arranged according to their similarities. The paper describes how the Hofmethode is implemented in the online database edulap, and how it contributes to help the user to explore the data in which he is interested.

  9. Improved Scatter Search Using Cuckoo Search

    Directory of Open Access Journals (Sweden)

    Ahmed T.Sadiq Al-Obaidi

    2013-02-01

    Full Text Available The Scatter Search (SS is a deterministic strategy that has been applied successfully to some combinatorial and continuous optimization problems. Cuckoo Search (CS is heuristic search algorithm which is inspired by the reproduction strategy of cuckoos. This paper presents enhanced scatter search algorithm using CS algorithm. The improvement provides Scatter Search with random exploration for search space of problem and more of diversity and intensification for promising solutions. The original and improved Scatter Search has been tested on Traveling Salesman Problem. A computational experiment with benchmark instances is reported. The results demonstrate that the improved Scatter Search algorithms produce better performance than original Scatter Search algorithm. The improvement in the value of average fitness is 23.2% comparing with original SS. The developed algorithm has been compared with other algorithms for the same problem, and the result was competitive with some algorithm and insufficient with another.

  10. An intelligent method for geographic Web search

    Science.gov (United States)

    Mei, Kun; Yuan, Ying

    2008-10-01

    While the electronically available information in the World-Wide Web is explosively growing and thus increasing, the difficulty to find relevant information is also increasing for search engine user. In this paper we discuss how to constrain web queries geographically. A number of search queries are associated with geographical locations, either explicitly or implicitly. Accurately and effectively detecting the locations where search queries are truly about has huge potential impact on increasing search relevance, bringing better targeted search results, and improving search user satisfaction. Our approach focus on both in the way geographic information is extracted from the web and, as far as we can tell, in the way it is integrated into query processing. This paper gives an overview of a spatially aware search engine for semantic querying of web document. It also illustrates algorithms for extracting location from web documents and query requests using the location ontologies to encode and reason about formal semantics of geographic web search. Based on a real-world scenario of tourism guide search, the application of our approach shows that the geographic information retrieval can be efficiently supported.

  11. Accurate Classification of RNA Structures Using Topological Fingerprints

    Science.gov (United States)

    Li, Kejie; Gribskov, Michael

    2016-01-01

    While RNAs are well known to possess complex structures, functionally similar RNAs often have little sequence similarity. While the exact size and spacing of base-paired regions vary, functionally similar RNAs have pronounced similarity in the arrangement, or topology, of base-paired stems. Furthermore, predicted RNA structures often lack pseudoknots (a crucial aspect of biological activity), and are only partially correct, or incomplete. A topological approach addresses all of these difficulties. In this work we describe each RNA structure as a graph that can be converted to a topological spectrum (RNA fingerprint). The set of subgraphs in an RNA structure, its RNA fingerprint, can be compared with the fingerprints of other RNA structures to identify and correctly classify functionally related RNAs. Topologically similar RNAs can be identified even when a large fraction, up to 30%, of the stems are omitted, indicating that highly accurate structures are not necessary. We investigate the performance of the RNA fingerprint approach on a set of eight highly curated RNA families, with diverse sizes and functions, containing pseudoknots, and with little sequence similarity–an especially difficult test set. In spite of the difficult test set, the RNA fingerprint approach is very successful (ROC AUC > 0.95). Due to the inclusion of pseudoknots, the RNA fingerprint approach both covers a wider range of possible structures than methods based only on secondary structure, and its tolerance for incomplete structures suggests that it can be applied even to predicted structures. Source code is freely available at https://github.rcac.purdue.edu/mgribsko/XIOS_RNA_fingerprint. PMID:27755571

  12. Fast and accurate fitting of relaxation dispersion data using the flexible software package GLOVE

    Energy Technology Data Exchange (ETDEWEB)

    Sugase, Kenji; Konuma, Tsuyoshi [Suntory Foundation for Life Sciences, Bioorganic Research Institute (Japan); Lansing, Jonathan C. [Momenta Pharmaceuticals, Inc. (United States); Wright, Peter E., E-mail: wright@scripps.edu [Scripps Research Institute, Department of Integrative Structural and Computational Biology and Skaggs Institute of Chemical Biology (United States)

    2013-07-15

    Relaxation dispersion spectroscopy is one of the most widely used techniques for the analysis of protein dynamics. To obtain a detailed understanding of the protein function from the view point of dynamics, it is essential to fit relaxation dispersion data accurately. The grid search method is commonly used for relaxation dispersion curve fits, but it does not always find the global minimum that provides the best-fit parameter set. Also, the fitting quality does not always improve with increase of the grid size although the computational time becomes longer. This is because relaxation dispersion curve fitting suffers from a local minimum problem, which is a general problem in non-linear least squares curve fitting. Therefore, in order to fit relaxation dispersion data rapidly and accurately, we developed a new fitting program called GLOVE that minimizes global and local parameters alternately, and incorporates a Monte-Carlo minimization method that enables fitting parameters to pass through local minima with low computational cost. GLOVE also implements a random search method, which sets up initial parameter values randomly within user-defined ranges. We demonstrate here that the combined use of the three methods can find the global minimum more rapidly and more accurately than grid search alone.

  13. Studying dream content using the archive and search engine on DreamBank.net.

    Science.gov (United States)

    Domhoff, G William; Schneider, Adam

    2008-12-01

    This paper shows how the dream archive and search engine on DreamBank.net, a Web site containing over 22,000 dream reports, can be used to generate new findings on dream content, some of which raise interesting questions about the relationship between dreaming and various forms of waking thought. It begins with studies that draw dream reports from DreamBank.net for studies of social networks in dreams, and then demonstrates the usefulness of the search engine by employing word strings relating to religious and sexual elements. Examples from two lengthy individual dream series are used to show how the dreams of one person can be studied for characters, activities, and emotions. A final example shows that accurate inferences about a person's religious beliefs can be made on the basis of reading through dreams retrieved with a few keywords. The overall findings are similar to those in studies using traditional forms of content analysis. PMID:18682331

  14. Toward Accurate and Quantitative Comparative Metagenomics

    Science.gov (United States)

    Nayfach, Stephen; Pollard, Katherine S.

    2016-01-01

    Shotgun metagenomics and computational analysis are used to compare the taxonomic and functional profiles of microbial communities. Leveraging this approach to understand roles of microbes in human biology and other environments requires quantitative data summaries whose values are comparable across samples and studies. Comparability is currently hampered by the use of abundance statistics that do not estimate a meaningful parameter of the microbial community and biases introduced by experimental protocols and data-cleaning approaches. Addressing these challenges, along with improving study design, data access, metadata standardization, and analysis tools, will enable accurate comparative metagenomics. We envision a future in which microbiome studies are replicable and new metagenomes are easily and rapidly integrated with existing data. Only then can the potential of metagenomics for predictive ecological modeling, well-powered association studies, and effective microbiome medicine be fully realized. PMID:27565341

  15. Toward Accurate and Quantitative Comparative Metagenomics.

    Science.gov (United States)

    Nayfach, Stephen; Pollard, Katherine S

    2016-08-25

    Shotgun metagenomics and computational analysis are used to compare the taxonomic and functional profiles of microbial communities. Leveraging this approach to understand roles of microbes in human biology and other environments requires quantitative data summaries whose values are comparable across samples and studies. Comparability is currently hampered by the use of abundance statistics that do not estimate a meaningful parameter of the microbial community and biases introduced by experimental protocols and data-cleaning approaches. Addressing these challenges, along with improving study design, data access, metadata standardization, and analysis tools, will enable accurate comparative metagenomics. We envision a future in which microbiome studies are replicable and new metagenomes are easily and rapidly integrated with existing data. Only then can the potential of metagenomics for predictive ecological modeling, well-powered association studies, and effective microbiome medicine be fully realized. PMID:27565341

  16. How accurate are SuperCOSMOS positions?

    CERN Document Server

    Schaefer, Adam; Johnston, Helen

    2014-01-01

    Optical positions from the SuperCOSMOS Sky Survey have been compared in detail with accurate radio positions that define the second realisation of the International Celestial Reference Frame (ICRF2). The comparison was limited to the IIIaJ plates from the UK/AAO and Oschin (Palomar) Schmidt telescopes. A total of 1373 ICRF2 sources was used, with the sample restricted to stellar objects brighter than $B_J=20$ and Galactic latitudes $|b|>10^{\\circ}$. Position differences showed an rms scatter of $0.16''$ in right ascension and declination. While overall systematic offsets were $<0.1''$ in each hemisphere, both the systematics and scatter were greater in the north.

  17. Accurate renormalization group analyses in neutrino sector

    Energy Technology Data Exchange (ETDEWEB)

    Haba, Naoyuki [Graduate School of Science and Engineering, Shimane University, Matsue 690-8504 (Japan); Kaneta, Kunio [Kavli IPMU (WPI), The University of Tokyo, Kashiwa, Chiba 277-8568 (Japan); Takahashi, Ryo [Graduate School of Science and Engineering, Shimane University, Matsue 690-8504 (Japan); Yamaguchi, Yuya [Department of Physics, Faculty of Science, Hokkaido University, Sapporo 060-0810 (Japan)

    2014-08-15

    We investigate accurate renormalization group analyses in neutrino sector between ν-oscillation and seesaw energy scales. We consider decoupling effects of top quark and Higgs boson on the renormalization group equations of light neutrino mass matrix. Since the decoupling effects are given in the standard model scale and independent of high energy physics, our method can basically apply to any models beyond the standard model. We find that the decoupling effects of Higgs boson are negligible, while those of top quark are not. Particularly, the decoupling effects of top quark affect neutrino mass eigenvalues, which are important for analyzing predictions such as mass squared differences and neutrinoless double beta decay in an underlying theory existing at high energy scale.

  18. Accurate Telescope Mount Positioning with MEMS Accelerometers

    CERN Document Server

    Mészáros, László; Pál, András; Csépány, Gergely

    2014-01-01

    This paper describes the advantages and challenges of applying microelectromechanical accelerometer systems (MEMS accelerometers) in order to attain precise, accurate and stateless positioning of telescope mounts. This provides a completely independent method from other forms of electronic, optical, mechanical or magnetic feedback or real-time astrometry. Our goal is to reach the sub-arcminute range which is well smaller than the field-of-view of conventional imaging telescope systems. Here we present how this sub-arcminute accuracy can be achieved with very cheap MEMS sensors and we also detail how our procedures can be extended in order to attain even finer measurements. In addition, our paper discusses how can a complete system design be implemented in order to be a part of a telescope control system.

  19. Accurate Weather Forecasting for Radio Astronomy

    Science.gov (United States)

    Maddalena, Ronald J.

    2010-01-01

    The NRAO Green Bank Telescope routinely observes at wavelengths from 3 mm to 1 m. As with all mm-wave telescopes, observing conditions depend upon the variable atmospheric water content. The site provides over 100 days/yr when opacities are low enough for good observing at 3 mm, but winds on the open-air structure reduce the time suitable for 3-mm observing where pointing is critical. Thus, to maximum productivity the observing wavelength needs to match weather conditions. For 6 years the telescope has used a dynamic scheduling system (recently upgraded; www.gb.nrao.edu/DSS) that requires accurate multi-day forecasts for winds and opacities. Since opacity forecasts are not provided by the National Weather Services (NWS), I have developed an automated system that takes available forecasts, derives forecasted opacities, and deploys the results on the web in user-friendly graphical overviews (www.gb.nrao.edu/ rmaddale/Weather). The system relies on the "North American Mesoscale" models, which are updated by the NWS every 6 hrs, have a 12 km horizontal resolution, 1 hr temporal resolution, run to 84 hrs, and have 60 vertical layers that extend to 20 km. Each forecast consists of a time series of ground conditions, cloud coverage, etc, and, most importantly, temperature, pressure, humidity as a function of height. I use the Liebe's MWP model (Radio Science, 20, 1069, 1985) to determine the absorption in each layer for each hour for 30 observing wavelengths. Radiative transfer provides, for each hour and wavelength, the total opacity and the radio brightness of the atmosphere, which contributes substantially at some wavelengths to Tsys and the observational noise. Comparisons of measured and forecasted Tsys at 22.2 and 44 GHz imply that the forecasted opacities are good to about 0.01 Nepers, which is sufficient for forecasting and accurate calibration. Reliability is high out to 2 days and degrades slowly for longer-range forecasts.

  20. The FLUKA Code: An Accurate Simulation Tool for Particle Therapy.

    Science.gov (United States)

    Battistoni, Giuseppe; Bauer, Julia; Boehlen, Till T; Cerutti, Francesco; Chin, Mary P W; Dos Santos Augusto, Ricardo; Ferrari, Alfredo; Ortega, Pablo G; Kozłowska, Wioletta; Magro, Giuseppe; Mairani, Andrea; Parodi, Katia; Sala, Paola R; Schoofs, Philippe; Tessonnier, Thomas; Vlachoudis, Vasilis

    2016-01-01

    Monte Carlo (MC) codes are increasingly spreading in the hadrontherapy community due to their detailed description of radiation transport and interaction with matter. The suitability of a MC code for application to hadrontherapy demands accurate and reliable physical models capable of handling all components of the expected radiation field. This becomes extremely important for correctly performing not only physical but also biologically based dose calculations, especially in cases where ions heavier than protons are involved. In addition, accurate prediction of emerging secondary radiation is of utmost importance in innovative areas of research aiming at in vivo treatment verification. This contribution will address the recent developments of the FLUKA MC code and its practical applications in this field. Refinements of the FLUKA nuclear models in the therapeutic energy interval lead to an improved description of the mixed radiation field as shown in the presented benchmarks against experimental data with both (4)He and (12)C ion beams. Accurate description of ionization energy losses and of particle scattering and interactions lead to the excellent agreement of calculated depth-dose profiles with those measured at leading European hadron therapy centers, both with proton and ion beams. In order to support the application of FLUKA in hospital-based environments, Flair, the FLUKA graphical interface, has been enhanced with the capability of translating CT DICOM images into voxel-based computational phantoms in a fast and well-structured way. The interface is capable of importing also radiotherapy treatment data described in DICOM RT standard. In addition, the interface is equipped with an intuitive PET scanner geometry generator and automatic recording of coincidence events. Clinically, similar cases will be presented both in terms of absorbed dose and biological dose calculations describing the various available features. PMID:27242956

  1. Partial Recurrent Laryngeal Nerve Paralysis or Paresis? In Search for the Accurate Diagnosis

    Directory of Open Access Journals (Sweden)

    Alexander Delides

    2015-01-01

    Full Text Available “Partial paralysis” of the larynx is a term often used to describe a hypomobile vocal fold as is the term “paresis.” We present a case of a dysphonic patient with a mobility disorder of the vocal fold, for whom idiopathic “partial paralysis” was the diagnosis made after laryngeal electromyography, and discuss a proposition for a different implementation of the term.

  2. A Distributed Weighted Voting Approach for Accurate Eye Center Estimation

    Directory of Open Access Journals (Sweden)

    Gagandeep Singh

    2013-05-01

    Full Text Available This paper proposes a novel approach for accurate estimation of eye center in face images. A distributed voting based approach in which every pixel votes is adopted for potential eye center candidates. The votes are distributed over a subset of pixels which lie in a direction which is opposite to gradient direction and the weightage of votes is distributed according to a novel mechanism.  First, image is normalized to eliminate illumination variations and its edge map is generated using Canny edge detector. Distributed voting is applied on the edge image to generate different eye center candidates. Morphological closing and local maxima search are used to reduce the number of candidates. A classifier based on spatial and intensity information is used to choose the correct candidates for the locations of eye center. The proposed approach was tested on BioID face database and resulted in better Iris detection rate than the state-of-the-art. The proposed approach is robust against illumination variation, small pose variations, presence of eye glasses and partial occlusion of eyes.Defence Science Journal, 2013, 63(3, pp.292-297, DOI:http://dx.doi.org/10.14429/dsj.63.2763

  3. Accurate measurement of streamwise vortices using dual-plane PIV

    Energy Technology Data Exchange (ETDEWEB)

    Waldman, Rye M.; Breuer, Kenneth S. [Brown University, School of Engineering, Providence, RI (United States)

    2012-11-15

    Low Reynolds number aerodynamic experiments with flapping animals (such as bats and small birds) are of particular interest due to their application to micro air vehicles which operate in a similar parameter space. Previous PIV wake measurements described the structures left by bats and birds and provided insight into the time history of their aerodynamic force generation; however, these studies have faced difficulty drawing quantitative conclusions based on said measurements. The highly three-dimensional and unsteady nature of the flows associated with flapping flight are major challenges for accurate measurements. The challenge of animal flight measurements is finding small flow features in a large field of view at high speed with limited laser energy and camera resolution. Cross-stream measurement is further complicated by the predominately out-of-plane flow that requires thick laser sheets and short inter-frame times, which increase noise and measurement uncertainty. Choosing appropriate experimental parameters requires compromise between the spatial and temporal resolution and the dynamic range of the measurement. To explore these challenges, we do a case study on the wake of a fixed wing. The fixed model simplifies the experiment and allows direct measurements of the aerodynamic forces via load cell. We present a detailed analysis of the wake measurements, discuss the criteria for making accurate measurements, and present a solution for making quantitative aerodynamic load measurements behind free-flyers. (orig.)

  4. Approaching system equilibrium with accurate or not accurate feedback information in a two-route system

    Science.gov (United States)

    Zhao, Xiao-mei; Xie, Dong-fan; Li, Qi

    2015-02-01

    With the development of intelligent transport system, advanced information feedback strategies have been developed to reduce traffic congestion and enhance the capacity. However, previous strategies provide accurate information to travelers and our simulation results show that accurate information brings negative effects, especially in delay case. Because travelers prefer to the best condition route with accurate information, and delayed information cannot reflect current traffic condition but past. Then travelers make wrong routing decisions, causing the decrease of the capacity and the increase of oscillations and the system deviating from the equilibrium. To avoid the negative effect, bounded rationality is taken into account by introducing a boundedly rational threshold BR. When difference between two routes is less than the BR, routes have equal probability to be chosen. The bounded rationality is helpful to improve the efficiency in terms of capacity, oscillation and the gap deviating from the system equilibrium.

  5. Dynamic Search and Working Memory in Social Recall

    Science.gov (United States)

    Hills, Thomas T.; Pachur, Thorsten

    2012-01-01

    What are the mechanisms underlying search in social memory (e.g., remembering the people one knows)? Do the search mechanisms involve dynamic local-to-global transitions similar to semantic search, and are these transitions governed by the general control of attention, associated with working memory span? To find out, we asked participants to…

  6. Learning-Based Video Superresolution Reconstruction Using Spatiotemporal Nonlocal Similarity

    Directory of Open Access Journals (Sweden)

    Meiyu Liang

    2015-01-01

    Full Text Available Aiming at improving the video visual resolution quality and details clarity, a novel learning-based video superresolution reconstruction algorithm using spatiotemporal nonlocal similarity is proposed in this paper. Objective high-resolution (HR estimations of low-resolution (LR video frames can be obtained by learning LR-HR correlation mapping and fusing spatiotemporal nonlocal similarities between video frames. With the objective of improving algorithm efficiency while guaranteeing superresolution quality, a novel visual saliency-based LR-HR correlation mapping strategy between LR and HR patches is proposed based on semicoupled dictionary learning. Moreover, aiming at improving performance and efficiency of spatiotemporal similarity matching and fusion, an improved spatiotemporal nonlocal fuzzy registration scheme is established using the similarity weighting strategy based on pseudo-Zernike moment feature similarity and structural similarity, and the self-adaptive regional correlation evaluation strategy. The proposed spatiotemporal fuzzy registration scheme does not rely on accurate estimation of subpixel motion, and therefore it can be adapted to complex motion patterns and is robust to noise and rotation. Experimental results demonstrate that the proposed algorithm achieves competitive superresolution quality compared to other state-of-the-art algorithms in terms of both subjective and objective evaluations.

  7. Data mining technique for fast retrieval of similar waveforms in Fusion massive databases

    Energy Technology Data Exchange (ETDEWEB)

    Vega, J. [Asociacion EURATOM/CIEMAT Para Fusion, Madrid (Spain)], E-mail: jesus.vega@ciemat.es; Pereira, A.; Portas, A. [Asociacion EURATOM/CIEMAT Para Fusion, Madrid (Spain); Dormido-Canto, S.; Farias, G.; Dormido, R.; Sanchez, J.; Duro, N. [Departamento de Informatica y Automatica, UNED, Madrid (Spain); Santos, M. [Departamento de Arquitectura de Computadores y Automatica, UCM, Madrid (Spain); Sanchez, E. [Asociacion EURATOM/CIEMAT Para Fusion, Madrid (Spain); Pajares, G. [Departamento de Arquitectura de Computadores y Automatica, UCM, Madrid (Spain)

    2008-01-15

    Fusion measurement systems generate similar waveforms for reproducible behavior. A major difficulty related to data analysis is the identification, in a rapid and automated way, of a set of discharges with comparable behaviour, i.e. discharges with 'similar' waveforms. Here we introduce a new technique for rapid searching and retrieval of 'similar' signals. The approach consists of building a classification system that avoids traversing the whole database looking for similarities. The classification system diminishes the problem dimensionality (by means of waveform feature extraction) and reduces the searching space to just the most probable 'similar' waveforms (clustering techniques). In the searching procedure, the input waveform is classified in any of the existing clusters. Then, a similarity measure is computed between the input signal and all cluster elements in order to identify the most similar waveforms. The inner product of normalized vectors is used as the similarity measure as it allows the searching process to be independent of signal gain and polarity. This development has been applied recently to TJ-II stellarator databases and has been integrated into its remote participation system.

  8. Data mining technique for fast retrieval of similar waveforms in Fusion massive databases

    International Nuclear Information System (INIS)

    Fusion measurement systems generate similar waveforms for reproducible behavior. A major difficulty related to data analysis is the identification, in a rapid and automated way, of a set of discharges with comparable behaviour, i.e. discharges with 'similar' waveforms. Here we introduce a new technique for rapid searching and retrieval of 'similar' signals. The approach consists of building a classification system that avoids traversing the whole database looking for similarities. The classification system diminishes the problem dimensionality (by means of waveform feature extraction) and reduces the searching space to just the most probable 'similar' waveforms (clustering techniques). In the searching procedure, the input waveform is classified in any of the existing clusters. Then, a similarity measure is computed between the input signal and all cluster elements in order to identify the most similar waveforms. The inner product of normalized vectors is used as the similarity measure as it allows the searching process to be independent of signal gain and polarity. This development has been applied recently to TJ-II stellarator databases and has been integrated into its remote participation system

  9. Faceted Semantic Search for Personalized Social Search

    OpenAIRE

    Mas, Massimiliano Dal

    2012-01-01

    Actual social networks (like Facebook, Twitter, Linkedin, ...) need to deal with vagueness on ontological indeterminacy. In this paper is analyzed the prototyping of a faceted semantic search for personalized social search using the "joint meaning" in a community environment. User researches in a "collaborative" environment defined by folksonomies can be supported by the most common features on the faceted semantic search. A solution for the context-aware personalized search is based on "join...

  10. Google Ajax Search API

    CERN Document Server

    Fitzgerald, Michael

    2007-01-01

    Use the Google Ajax Search API to integrateweb search, image search, localsearch, and other types of search intoyour web site by embedding a simple, dynamicsearch box to display search resultsin your own web pages using a fewlines of JavaScript. For those who do not want to write code,the search wizards and solutions builtwith the Google Ajax Search API generatecode to accomplish common taskslike adding local search results to a GoogleMaps API mashup, adding videosearch thumbnails to your web site, oradding a news reel with the latest up todate stories to your blog. More advanced users can

  11. Accurate free energy calculation along optimized paths.

    Science.gov (United States)

    Chen, Changjun; Xiao, Yi

    2010-05-01

    The path-based methods of free energy calculation, such as thermodynamic integration and free energy perturbation, are simple in theory, but difficult in practice because in most cases smooth paths do not exist, especially for large molecules. In this article, we present a novel method to build the transition path of a peptide. We use harmonic potentials to restrain its nonhydrogen atom dihedrals in the initial state and set the equilibrium angles of the potentials as those in the final state. Through a series of steps of geometrical optimization, we can construct a smooth and short path from the initial state to the final state. This path can be used to calculate free energy difference. To validate this method, we apply it to a small 10-ALA peptide and find that the calculated free energy changes in helix-helix and helix-hairpin transitions are both self-convergent and cross-convergent. We also calculate the free energy differences between different stable states of beta-hairpin trpzip2, and the results show that this method is more efficient than the conventional molecular dynamics method in accurate free energy calculation.

  12. Accurate fission data for nuclear safety

    CERN Document Server

    Solders, A; Jokinen, A; Kolhinen, V S; Lantz, M; Mattera, A; Penttila, H; Pomp, S; Rakopoulos, V; Rinta-Antila, S

    2013-01-01

    The Accurate fission data for nuclear safety (AlFONS) project aims at high precision measurements of fission yields, using the renewed IGISOL mass separator facility in combination with a new high current light ion cyclotron at the University of Jyvaskyla. The 30 MeV proton beam will be used to create fast and thermal neutron spectra for the study of neutron induced fission yields. Thanks to a series of mass separating elements, culminating with the JYFLTRAP Penning trap, it is possible to achieve a mass resolving power in the order of a few hundred thousands. In this paper we present the experimental setup and the design of a neutron converter target for IGISOL. The goal is to have a flexible design. For studies of exotic nuclei far from stability a high neutron flux (10^12 neutrons/s) at energies 1 - 30 MeV is desired while for reactor applications neutron spectra that resembles those of thermal and fast nuclear reactors are preferred. It is also desirable to be able to produce (semi-)monoenergetic neutrons...

  13. Fast and Provably Accurate Bilateral Filtering.

    Science.gov (United States)

    Chaudhury, Kunal N; Dabhade, Swapnil D

    2016-06-01

    The bilateral filter is a non-linear filter that uses a range filter along with a spatial filter to perform edge-preserving smoothing of images. A direct computation of the bilateral filter requires O(S) operations per pixel, where S is the size of the support of the spatial filter. In this paper, we present a fast and provably accurate algorithm for approximating the bilateral filter when the range kernel is Gaussian. In particular, for box and Gaussian spatial filters, the proposed algorithm can cut down the complexity to O(1) per pixel for any arbitrary S . The algorithm has a simple implementation involving N+1 spatial filterings, where N is the approximation order. We give a detailed analysis of the filtering accuracy that can be achieved by the proposed approximation in relation to the target bilateral filter. This allows us to estimate the order N required to obtain a given accuracy. We also present comprehensive numerical results to demonstrate that the proposed algorithm is competitive with the state-of-the-art methods in terms of speed and accuracy. PMID:27093722

  14. Asthma and COPD: Differences and Similarities

    Science.gov (United States)

    ... and COPD: differences and similarities Share | Asthma and COPD: Differences and Similarities This article has been reviewed ... or you could have Chronic Obstructive Pulmonary Disease (COPD) , such as emphysema or chronic bronchitis. Because asthma ...

  15. A Quantum-Based Similarity Method in Virtual Screening

    Directory of Open Access Journals (Sweden)

    Mohammed Mumtaz Al-Dabbagh

    2015-10-01

    Full Text Available One of the most widely-used techniques for ligand-based virtual screening is similarity searching. This study adopted the concepts of quantum mechanics to present as state-of-the-art similarity method of molecules inspired from quantum theory. The representation of molecular compounds in mathematical quantum space plays a vital role in the development of quantum-based similarity approach. One of the key concepts of quantum theory is the use of complex numbers. Hence, this study proposed three various techniques to embed and to re-represent the molecular compounds to correspond with complex numbers format. The quantum-based similarity method that developed in this study depending on complex pure Hilbert space of molecules called Standard Quantum-Based (SQB. The recall of retrieved active molecules were at top 1% and top 5%, and significant test is used to evaluate our proposed methods. The MDL drug data report (MDDR, maximum unbiased validation (MUV and Directory of Useful Decoys (DUD data sets were used for experiments and were represented by 2D fingerprints. Simulated virtual screening experiment show that the effectiveness of SQB method was significantly increased due to the role of representational power of molecular compounds in complex numbers forms compared to Tanimoto benchmark similarity measure.

  16. Learning Good Edit Similarities with Generalization Guarantees

    OpenAIRE

    Bellet, Aurélien; Habrard, Amaury; Sebban, Marc

    2011-01-01

    Similarity and distance functions are essential to many learning algorithms, thus training them has attracted a lot of interest. When it comes to dealing with structured data (e.g., strings or trees), edit similarities are widely used, and there exists a few methods for learning them. However, these methods offer no theoretical guarantee as to the generalization performance and discriminative power of the resulting similarities. Recently, a theory of learning with good similarity functions wa...

  17. Shape Similarity Measures of Linear Entities

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    The essential of feature matching technology lies in how to measure the similarity of spatial entities.Among all the possible similarity measures,the shape similarity measure is one of the most important measures because it is easy to collect the necessary parameters and it is also well matched with the human intuition.In this paper a new shape similarity measure of linear entities based on the differences of direction change along each line is presented and its effectiveness is illustrated.

  18. How doctors search

    DEFF Research Database (Denmark)

    Lykke, Marianne; Price, Susan; Delcambre, Lois

    2012-01-01

    Professional, workplace searching is different from general searching, because it is typically limited to specific facets and targeted to a single answer. We have developed the semantic component (SC) model, which is a search feature that allows searchers to structure and specify the search...... to context-specific aspects of the main topic of the documents. We have tested the model in an interactive searching study with family doctors with the purpose to explore doctors’ querying behaviour, how they applied the means for specifying a search, and how these features contributed to the search outcome....... In general, the doctors were capable of exploiting system features and search tactics during the searching. Most searchers produced well-structured queries that contained appropriate search facets. When searches failed it was not due to query structure or query length. Failures were mostly caused by the well...

  19. Web authentic and similar texts detection using AR digital signature

    OpenAIRE

    Πούλος, Μάριος; Σκιαδόπουλος, Σπύρος; Μπώκος, Γιώργος Δ.

    2010-01-01

    In this paper, we propose a new identification technique based on an AR model with a complexity of size O(n) times in web form, with the aim of creating a unique serial number for texts and to detect authentic or similar texts. For the implementation of this purpose, we used an Autoregressive Model (AR) 15 th order, and for the identification procedure, we employed the cross-correlation algorithm. Empirical investigation showed that the proposed method may be used as an accurate method for id...

  20. A fingerprint based metric for measuring similarities of crystalline structures

    Energy Technology Data Exchange (ETDEWEB)

    Zhu, Li; Fuhrer, Tobias; Schaefer, Bastian; Grauzinyte, Migle; Goedecker, Stefan, E-mail: stefan.goedecker@unibas.ch [Department of Physics, Universität Basel, Klingelbergstr. 82, 4056 Basel (Switzerland); Amsler, Maximilian [Department of Physics, Universität Basel, Klingelbergstr. 82, 4056 Basel (Switzerland); Department of Materials Science and Engineering, Northwestern University, Evanston, Illinois 60208 (United States); Faraji, Somayeh; Rostami, Samare; Ghasemi, S. Alireza [Institute for Advanced Studies in Basic Sciences, P.O. Box 45195-1159, Zanjan (Iran, Islamic Republic of); Sadeghi, Ali [Physics Department, Shahid Beheshti University, G. C., Evin, 19839 Tehran (Iran, Islamic Republic of); Wolverton, Chris [Department of Materials Science and Engineering, Northwestern University, Evanston, Illinois 60208 (United States)

    2016-01-21

    Measuring similarities/dissimilarities between atomic structures is important for the exploration of potential energy landscapes. However, the cell vectors together with the coordinates of the atoms, which are generally used to describe periodic systems, are quantities not directly suitable as fingerprints to distinguish structures. Based on a characterization of the local environment of all atoms in a cell, we introduce crystal fingerprints that can be calculated easily and define configurational distances between crystalline structures that satisfy the mathematical properties of a metric. This distance between two configurations is a measure of their similarity/dissimilarity and it allows in particular to distinguish structures. The new method can be a useful tool within various energy landscape exploration schemes, such as minima hopping, random search, swarm intelligence algorithms, and high-throughput screenings.

  1. A fingerprint based metric for measuring similarities of crystalline structures

    International Nuclear Information System (INIS)

    Measuring similarities/dissimilarities between atomic structures is important for the exploration of potential energy landscapes. However, the cell vectors together with the coordinates of the atoms, which are generally used to describe periodic systems, are quantities not directly suitable as fingerprints to distinguish structures. Based on a characterization of the local environment of all atoms in a cell, we introduce crystal fingerprints that can be calculated easily and define configurational distances between crystalline structures that satisfy the mathematical properties of a metric. This distance between two configurations is a measure of their similarity/dissimilarity and it allows in particular to distinguish structures. The new method can be a useful tool within various energy landscape exploration schemes, such as minima hopping, random search, swarm intelligence algorithms, and high-throughput screenings

  2. A fingerprint based metric for measuring similarities of crystalline structures

    CERN Document Server

    Zhu, Li; Fuhrer, Tobias; Schaefer, Bastian; Faraji, Somayeh; Rostami, Samara; Ghasemi, S Alireza; Sadeghi, Ali; Grauzinyte, Migle; Wolverton, Christopher; Goedecker, Stefan

    2015-01-01

    Measuring similarities/dissimilarities between atomic structures is important for the exploration of potential energy landscapes. However, the cell vectors together with the coordinates of the atoms, which are generally used to describe periodic systems, are quantities not suitable as fingerprints to distinguish structures. Based on a characterization of the local environment of all atoms in a cell we introduce crystal fingerprints that can be calculated easily and allow to define configurational distances between crystalline structures that satisfy the mathematical properties of a metric. This distance between two configurations is a measure of their similarity/dissimilarity and it allows in particular to distinguish structures. The new method is an useful tool within various energy landscape exploration schemes, such as minima hopping, random search, swarm intelligence algorithms and high-throughput screenings.

  3. Initial Experiences with Retrieving Similar Objects in Simulation Data

    Energy Technology Data Exchange (ETDEWEB)

    Cheung, S-C S; Kamath, C

    2003-02-21

    Comparing the output of a physics simulation with an experiment, referred to as 'code validation,' is often done by visually comparing the two outputs. In order to determine which simulation is a closer match to the experiment, more quantitative measures are needed. In this paper, we describe our early experiences with this problem by considering the slightly simpler problem of finding objects in a image that are similar to a given query object. Focusing on a dataset from a fluid mixing problem, we report on our experiments with different features that are used to represent the objects of interest in the data. These early results indicate that the features must be chosen carefully to correctly represent the query object and the goal of the similarity search.

  4. A fingerprint based metric for measuring similarities of crystalline structures

    Science.gov (United States)

    Zhu, Li; Amsler, Maximilian; Fuhrer, Tobias; Schaefer, Bastian; Faraji, Somayeh; Rostami, Samare; Ghasemi, S. Alireza; Sadeghi, Ali; Grauzinyte, Migle; Wolverton, Chris; Goedecker, Stefan

    2016-01-01

    Measuring similarities/dissimilarities between atomic structures is important for the exploration of potential energy landscapes. However, the cell vectors together with the coordinates of the atoms, which are generally used to describe periodic systems, are quantities not directly suitable as fingerprints to distinguish structures. Based on a characterization of the local environment of all atoms in a cell, we introduce crystal fingerprints that can be calculated easily and define configurational distances between crystalline structures that satisfy the mathematical properties of a metric. This distance between two configurations is a measure of their similarity/dissimilarity and it allows in particular to distinguish structures. The new method can be a useful tool within various energy landscape exploration schemes, such as minima hopping, random search, swarm intelligence algorithms, and high-throughput screenings.

  5. Accurate ionization potential of semiconductors from efficient density functional calculations

    Science.gov (United States)

    Ye, Lin-Hui

    2016-07-01

    Despite its huge successes in total-energy-related applications, the Kohn-Sham scheme of density functional theory cannot get reliable single-particle excitation energies for solids. In particular, it has not been able to calculate the ionization potential (IP), one of the most important material parameters, for semiconductors. We illustrate that an approximate exact-exchange optimized effective potential (EXX-OEP), the Becke-Johnson exchange, can be used to largely solve this long-standing problem. For a group of 17 semiconductors, we have obtained the IPs to an accuracy similar to that of the much more sophisticated G W approximation (GWA), with the computational cost of only local-density approximation/generalized gradient approximation. The EXX-OEP, therefore, is likely as useful for solids as for finite systems. For solid surfaces, the asymptotic behavior of the vx c has effects similar to those of finite systems which, when neglected, typically cause the semiconductor IPs to be underestimated. This may partially explain why standard GWA systematically underestimates the IPs and why using the same GWA procedures has not been able to get an accurate IP and band gap at the same time.

  6. Accurate paleointensities - the multi-method approach

    Science.gov (United States)

    de Groot, Lennart

    2016-04-01

    The accuracy of models describing rapid changes in the geomagnetic field over the past millennia critically depends on the availability of reliable paleointensity estimates. Over the past decade methods to derive paleointensities from lavas (the only recorder of the geomagnetic field that is available all over the globe and through geologic times) have seen significant improvements and various alternative techniques were proposed. The 'classical' Thellier-style approach was optimized and selection criteria were defined in the 'Standard Paleointensity Definitions' (Paterson et al, 2014). The Multispecimen approach was validated and the importance of additional tests and criteria to assess Multispecimen results must be emphasized. Recently, a non-heating, relative paleointensity technique was proposed -the pseudo-Thellier protocol- which shows great potential in both accuracy and efficiency, but currently lacks a solid theoretical underpinning. Here I present work using all three of the aforementioned paleointensity methods on suites of young lavas taken from the volcanic islands of Hawaii, La Palma, Gran Canaria, Tenerife, and Terceira. Many of the sampled cooling units are <100 years old, the actual field strength at the time of cooling is therefore reasonably well known. Rather intuitively, flows that produce coherent results from two or more different paleointensity methods yield the most accurate estimates of the paleofield. Furthermore, the results for some flows pass the selection criteria for one method, but fail in other techniques. Scrutinizing and combing all acceptable results yielded reliable paleointensity estimates for 60-70% of all sampled cooling units - an exceptionally high success rate. This 'multi-method paleointensity approach' therefore has high potential to provide the much-needed paleointensities to improve geomagnetic field models for the Holocene.

  7. Towards Accurate Application Characterization for Exascale (APEX)

    Energy Technology Data Exchange (ETDEWEB)

    Hammond, Simon David [Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)

    2015-09-01

    Sandia National Laboratories has been engaged in hardware and software codesign activities for a number of years, indeed, it might be argued that prototyping of clusters as far back as the CPLANT machines and many large capability resources including ASCI Red and RedStorm were examples of codesigned solutions. As the research supporting our codesign activities has moved closer to investigating on-node runtime behavior a nature hunger has grown for detailed analysis of both hardware and algorithm performance from the perspective of low-level operations. The Application Characterization for Exascale (APEX) LDRD was a project concieved of addressing some of these concerns. Primarily the research was to intended to focus on generating accurate and reproducible low-level performance metrics using tools that could scale to production-class code bases. Along side this research was an advocacy and analysis role associated with evaluating tools for production use, working with leading industry vendors to develop and refine solutions required by our code teams and to directly engage with production code developers to form a context for the application analysis and a bridge to the research community within Sandia. On each of these accounts significant progress has been made, particularly, as this report will cover, in the low-level analysis of operations for important classes of algorithms. This report summarizes the development of a collection of tools under the APEX research program and leaves to other SAND and L2 milestone reports the description of codesign progress with Sandia’s production users/developers.

  8. How flatbed scanners upset accurate film dosimetry.

    Science.gov (United States)

    van Battum, L J; Huizenga, H; Verdaasdonk, R M; Heukelom, S

    2016-01-21

    Film is an excellent dosimeter for verification of dose distributions due to its high spatial resolution. Irradiated film can be digitized with low-cost, transmission, flatbed scanners. However, a disadvantage is their lateral scan effect (LSE): a scanner readout change over its lateral scan axis. Although anisotropic light scattering was presented as the origin of the LSE, this paper presents an alternative cause. Hereto, LSE for two flatbed scanners (Epson 1680 Expression Pro and Epson 10000XL), and Gafchromic film (EBT, EBT2, EBT3) was investigated, focused on three effects: cross talk, optical path length and polarization. Cross talk was examined using triangular sheets of various optical densities. The optical path length effect was studied using absorptive and reflective neutral density filters with well-defined optical characteristics (OD range 0.2-2.0). Linear polarizer sheets were used to investigate light polarization on the CCD signal in absence and presence of (un)irradiated Gafchromic film. Film dose values ranged between 0.2 to 9 Gy, i.e. an optical density range between 0.25 to 1.1. Measurements were performed in the scanner's transmission mode, with red-green-blue channels. LSE was found to depend on scanner construction and film type. Its magnitude depends on dose: for 9 Gy increasing up to 14% at maximum lateral position. Cross talk was only significant in high contrast regions, up to 2% for very small fields. The optical path length effect introduced by film on the scanner causes 3% for pixels in the extreme lateral position. Light polarization due to film and the scanner's optical mirror system is the main contributor, different in magnitude for the red, green and blue channel. We concluded that any Gafchromic EBT type film scanned with a flatbed scanner will face these optical effects. Accurate dosimetry requires correction of LSE, therefore, determination of the LSE per color channel and dose delivered to the film.

  9. AREAL FEATURE MATCHING BASED ON SIMILARITY USING CRITIC METHOD

    Directory of Open Access Journals (Sweden)

    J. Kim

    2015-10-01

    Full Text Available In this paper, we propose an areal feature matching method that can be applied for many-to-many matching, which involves matching a simple entity with an aggregate of several polygons or two aggregates of several polygons with fewer user intervention. To this end, an affine transformation is applied to two datasets by using polygon pairs for which the building name is the same. Then, two datasets are overlaid with intersected polygon pairs that are selected as candidate matching pairs. If many polygons intersect at this time, we calculate the inclusion function between such polygons. When the value is more than 0.4, many of the polygons are aggregated as single polygons by using a convex hull. Finally, the shape similarity is calculated between the candidate pairs according to the linear sum of the weights computed in CRITIC method and the position similarity, shape ratio similarity, and overlap similarity. The candidate pairs for which the value of the shape similarity is more than 0.7 are determined as matching pairs. We applied the method to two geospatial datasets: the digital topographic map and the KAIS map in South Korea. As a result, the visual evaluation showed two polygons that had been well detected by using the proposed method. The statistical evaluation indicates that the proposed method is accurate when using our test dataset with a high F-measure of 0.91.

  10. Areal Feature Matching Based on Similarity Using Critic Method

    Science.gov (United States)

    Kim, J.; Yu, K.

    2015-10-01

    In this paper, we propose an areal feature matching method that can be applied for many-to-many matching, which involves matching a simple entity with an aggregate of several polygons or two aggregates of several polygons with fewer user intervention. To this end, an affine transformation is applied to two datasets by using polygon pairs for which the building name is the same. Then, two datasets are overlaid with intersected polygon pairs that are selected as candidate matching pairs. If many polygons intersect at this time, we calculate the inclusion function between such polygons. When the value is more than 0.4, many of the polygons are aggregated as single polygons by using a convex hull. Finally, the shape similarity is calculated between the candidate pairs according to the linear sum of the weights computed in CRITIC method and the position similarity, shape ratio similarity, and overlap similarity. The candidate pairs for which the value of the shape similarity is more than 0.7 are determined as matching pairs. We applied the method to two geospatial datasets: the digital topographic map and the KAIS map in South Korea. As a result, the visual evaluation showed two polygons that had been well detected by using the proposed method. The statistical evaluation indicates that the proposed method is accurate when using our test dataset with a high F-measure of 0.91.

  11. Code Similarity on High Level Programs

    CERN Document Server

    Bernal, M Miron; Nazuno, J Figueroa

    2007-01-01

    This paper presents a new approach for code similarity on High Level programs. Our technique is based on Fast Dynamic Time Warping, that builds a warp path or points relation with local restrictions. The source code is represented into Time Series using the operators inside programming languages that makes possible the comparison. This makes possible subsequence detection that represent similar code instructions. In contrast with other code similarity algorithms, we do not make features extraction. The experiments show that two source codes are similar when their respective Time Series are similar.

  12. Phase Coherent Observations and Millisecond Pulsar Searches

    Science.gov (United States)

    Shrauner, Jay Arthur

    1997-07-01

    We have built a new radio astronomical receiving system designed specifically for very high precision timing and polarimetry of fast pulsars. Unlike most detectors currently used to study pulsars, this instrument does not square the received signal at the time of observation. Instead, voltages proportional to the instantaneous electric vectors of incoming signals are digitized, time-tagged, and recorded on high speed magnetic media. During processing, the data streams are convolved with an inverse 'chirp' function that completely removes the phase retardation introduced by interstellar dispersion. The intrinsic time resolution of this system is the inverse of the system bandwidth, typically well under 1 μs. We have tested this and another phase-coherent observing-system in observations using the Arecibo 305 m and Green Bank 140 foot telescopes. With these two sets of observations we have studied giant pulses, performed high precision timing, and obtained high-resolution polarization profiles and accurate dispersion We have verified the existence of pulses with intensities hundreds of measures. times the mean for both the main pulse and interpulse of PSR B1937+21, and have established that the amplitudes of both types of giant pulses have similar power-law distributions. The giant pulses are narrower than the average pulses, systematically delayed by 40-50 μs, and many are nearly 100% circularly polarized. We have also conducted two searches of the Northern hemisphere for pulsars. The first used the original pulsar discovery telescope in Cambridge, England to search the entire Northern hemisphere at 81.5 MHz, with an average sensitivity to slow pulsars of 230 mJy. Although we obtained flux densities and pulse profiles of 20 known pulsars, no new pulsars were discovered. The second search effort covered a total of 384 deg2 of previously unsearched sky at 430 MHz using the Arecibo telescope, with an average sensitivity to slow pulsars of 0.83 mJy. We discovered 7

  13. Interactions of visual odometry and landmark guidance during food search in honeybees

    NARCIS (Netherlands)

    Vladusich, T; Hemmi, JM; Srinivasan, MV; Zeil, J

    2005-01-01

    How do honeybees use visual odometry and goal-defining landmarks to guide food search? In one experiment, bees were trained to forage in an optic-flow-rich tunnel with a landmark positioned directly above the feeder. Subsequent food-search tests indicated that bees searched much more accurately when

  14. K-Means Clustering For Segment Web Search Results

    OpenAIRE

    Hasitha Indika Arumawadu; R. M. Kapila Tharanga Rathnayaka; S. K. Illangarathne

    2015-01-01

    Clustering is the power full technique for segment relevant data into different levels. This study has proposed K-means clustering method for cluster web search results for search engines. For represent documents we used vector space model and use cosine similarity method for measure similarity between user query and the search results. As an improvement of K-means clustering we used distortion curve method for identify optimal initial number of clusters.

  15. Similarity boosted quantitative structure-activity relationship--a systematic study of enhancing structural descriptors by molecular similarity.

    Science.gov (United States)

    Girschick, Tobias; Almeida, Pedro R; Kramer, Stefan; Stålring, Jonna

    2013-05-24

    The concept of molecular similarity is one of the most central in the fields of predictive toxicology and quantitative structure-activity relationship (QSAR) research. Many toxicological responses result from a multimechanistic process and, consequently, structural diversity among the active compounds is likely. Combining this knowledge, we introduce similarity boosted QSAR modeling, where we calculate molecular descriptors using similarities with respect to representative reference compounds to aid a statistical learning algorithm in distinguishing between different structural classes. We present three approaches for the selection of reference compounds, one by literature search and two by clustering. Our experimental evaluation on seven publicly available data sets shows that the similarity descriptors used on their own perform quite well compared to structural descriptors. We show that the combination of similarity and structural descriptors enhances the performance and that a simple stacking approach is able to use the complementary information encoded by the different descriptor sets to further improve predictive results. All software necessary for our experiments is available within the cheminformatics software framework AZOrange.

  16. Semantic Features for Classifying Referring Search Terms

    Energy Technology Data Exchange (ETDEWEB)

    May, Chandler J.; Henry, Michael J.; McGrath, Liam R.; Bell, Eric B.; Marshall, Eric J.; Gregory, Michelle L.

    2012-05-11

    When an internet user clicks on a result in a search engine, a request is submitted to the destination web server that includes a referrer field containing the search terms given by the user. Using this information, website owners can analyze the search terms leading to their websites to better understand their visitors needs. This work explores some of the features that can be used for classification-based analysis of such referring search terms. We present initial results for the example task of classifying HTTP requests countries of origin. A system that can accurately predict the country of origin from query text may be a valuable complement to IP lookup methods which are susceptible to the obfuscation of dereferrers or proxies. We suggest that the addition of semantic features improves classifier performance in this example application. We begin by looking at related work and presenting our approach. After describing initial experiments and results, we discuss paths forward for this work.

  17. Optimal directed searches for continuous gravitational waves

    CERN Document Server

    Ming, Jing; Papa, Maria Alessandra; Aulbert, Carsten; Fehrmann, Henning

    2015-01-01

    Wide parameter space searches for long lived continuous gravitational wave signals are computationally limited. It is therefore critically important that available computational resources are used rationally. In this paper we consider directed searches, i.e. targets for which the sky position is known accurately but the frequency and spindown parameters are completely unknown. Given a list of such potential astrophysical targets, we therefore need to prioritize. On which target(s) should we spend scarce computing resources? What parameter space region in frequency and spindown should we search? Finally, what is the optimal search set-up that we should use? In this paper we present a general framework that allows to solve all three of these problems. This framework is based on maximizing the probability of making a detection subject to a constraint on the maximum available computational cost. We illustrate the method for a simplified problem.

  18. Optimal directed searches for continuous gravitational waves

    Science.gov (United States)

    Ming, Jing; Krishnan, Badri; Papa, Maria Alessandra; Aulbert, Carsten; Fehrmann, Henning

    2016-03-01

    Wide parameter space searches for long-lived continuous gravitational wave signals are computationally limited. It is therefore critically important that the available computational resources are used rationally. In this paper we consider directed searches, i.e., targets for which the sky position is known accurately but the frequency and spin-down parameters are completely unknown. Given a list of such potential astrophysical targets, we therefore need to prioritize. On which target(s) should we spend scarce computing resources? What parameter space region in frequency and spin-down should we search through? Finally, what is the optimal search setup that we should use? In this paper we present a general framework that allows us to solve all three of these problems. This framework is based on maximizing the probability of making a detection subject to a constraint on the maximum available computational cost. We illustrate the method for a simplified problem.

  19. Contextual factors for finding similar experts

    NARCIS (Netherlands)

    K. Hofmann; K. Balog; T. Bogers; M. de Rijke

    2010-01-01

    Expertise-seeking research studies how people search for expertise and choose whom to contact in the context of a specific task. An important outcome are models that identify factors that influence expert finding. Expertise retrieval addresses the same problem, expert finding, but from a system-cent

  20. Stability of similarity measurements for bipartite networks

    CERN Document Server

    Liu, Jian-Guo; Pan, Xue; Guo, Qiang; Zhou, Tao

    2015-01-01

    Similarity is a fundamental measure in network analyses and machine learning algorithms, with wide applications ranging from personalized recommendation to socio-economic dynamics. We argue that an effective similarity measurement should guarantee the stability even under some information loss. With six bipartite networks, we investigate the stabilities of fifteen similarity measurements by comparing the similarity matrixes of two data samples which are randomly divided from original data sets. Results show that, the fifteen measurements can be well classified into three clusters according to their stabilities, and measurements in the same cluster have similar mathematical definitions. In addition, we develop a top-$n$-stability method for personalized recommendation, and find that the unstable similarities would recommend false information to users, and the performance of recommendation would be largely improved by using stable similarity measurements. This work provides a novel dimension to analyze and eval...

  1. The Search for Another Earth

    Indian Academy of Sciences (India)

    2016-07-01

    Is there life anywhere else in the vast cosmos?Are there planets similar to the Earth? For centuries,these questions baffled curious minds. Eithera positive or negative answer, if found oneday, would carry a deep philosophical significancefor our very existence in the universe. Althoughthe search for extra-terrestrial intelligence wasinitiated decades ago, a systematic scientific andglobal quest towards achieving a convincing answerbegan in 1995 with the discovery of the firstconfirmed planet orbiting around the solar-typestar 51 Pegasi. Since then, astronomers have discoveredmany exoplanets using two main techniques,radial velocity and transit measurements.In the first part of this article, we shall describethe different astronomical methods through whichthe extrasolar planets of various kinds are discovered.In the second part of the article we shalldiscuss the various kinds of exoplanets, in particularabout the habitable planets discovered tilldate and the present status of our search for ahabitable planet similar to the Earth.

  2. Sound Search Engine Concept

    DEFF Research Database (Denmark)

    2006-01-01

    Sound search is provided by the major search engines, however, indexing is text based, not sound based. We will establish a dedicated sound search services with based on sound feature indexing. The current demo shows the concept of the sound search engine. The first engine will be realased June...

  3. Semantic Web Based Efficient Search Using Ontology and Mathematical Model

    Directory of Open Access Journals (Sweden)

    K.Palaniammal

    2014-01-01

    Full Text Available The semantic web is the forthcoming technology in the world of search engine. It becomes mainly focused towards the search which is more meaningful rather than the syntactic search prevailing now. This proposed work concerns about the semantic search with respect to the educational domain. In this paper, we propose semantic web based efficient search using ontology and mathematical model that takes into account the misleading, unmatched kind of service information, lack of relevant domain knowledge and the wrong service queries. To solve these issues in this framework is designed to make three major contributions, which are ontology knowledge base, Natural Language Processing (NLP techniques and search model. Ontology knowledge base is to store domain specific service ontologies and service description entity (SDE metadata. The search model is to retrieve SDE metadata as efficient for Education lenders, which include mathematical model. The Natural language processing techniques for spell-check and synonym based search. The results are retrieved and stored in an ontology, which in terms prevents the data redundancy. The results are more accurate to search, sensitive to spell check and synonymous context. This paper reduces the user’s time and complexity in finding for the correct results of his/her search text and our model provides more accurate results. A series of experiments are conducted in order to respectively evaluate the mechanism and the employed mathematical model.

  4. A Survey of Meta Search Engine%元搜索引擎研究

    Institute of Scientific and Technical Information of China (English)

    张卫丰; 徐宝文; 周晓宇; 李东; 许蕾

    2001-01-01

    With the explosive increase of the network information,it is more and more difficult for people to look up information. The occurrence of the Web search engines overcomes this problem in some degree. However, because different search engines use different mechanisms, scope and algorithms, the repetition of the search results for the same query is no more than 34 %. If wish to get relativly fullscale ,accurate search results,multi-search engines should be used and the meta search engines occur. In this paper ,the meta search engines are surveyed. At first ,the history ,the principles and the elements of the meta search engines are discussed. Then,the related creteria of the meta search engines are analyzed and several typical meta search engines are compared. Finally,on this base,the trend of the meta search engine is introduced.

  5. Lie algebraic similarity transformed Hamiltonians for lattice model systems

    Science.gov (United States)

    Wahlen-Strothman, Jacob M.; Jiménez-Hoyos, Carlos A.; Henderson, Thomas M.; Scuseria, Gustavo E.

    2015-01-01

    We present a class of Lie algebraic similarity transformations generated by exponentials of two-body on-site Hermitian operators whose Hausdorff series can be summed exactly without truncation. The correlators are defined over the entire lattice and include the Gutzwiller factor ni ↑ni ↓ , and two-site products of density (ni ↑+ni ↓) and spin (ni ↑-ni ↓) operators. The resulting non-Hermitian many-body Hamiltonian can be solved in a biorthogonal mean-field approach with polynomial computational cost. The proposed similarity transformation generates locally weighted orbital transformations of the reference determinant. Although the energy of the model is unbound, projective equations in the spirit of coupled cluster theory lead to well-defined solutions. The theory is tested on the one- and two-dimensional repulsive Hubbard model where it yields accurate results for small and medium sized interaction strengths.

  6. Efficient Proposed Framework for Semantic Search Engine using New Semantic Ranking Algorithm

    Directory of Open Access Journals (Sweden)

    M. M. El-gayar

    2015-08-01

    Full Text Available The amount of information raises billions of databases every year and there is an urgent need to search for that information by a specialize tool called search engine. There are many of search engines available today, but the main challenge in these search engines is that most of them cannot retrieve meaningful information intelligently. The semantic web technology is a solution that keeps data in a readable format that helps machines to match smartly this data with related information based on meanings. In this paper, we will introduce a proposed semantic framework that includes four phases crawling, indexing, ranking and retrieval phase. This semantic framework operates over a sorting RDF by using efficient proposed ranking algorithm and enhanced crawling algorithm. The enhanced crawling algorithm crawls relevant forum content from the web with minimal overhead. The proposed ranking algorithm is produced to order and evaluate similar meaningful data in order to make the retrieval process becomes faster, easier and more accurate. We applied our work on a standard database and achieved 99 percent effectiveness on semantic performance in minimum time and less than 1 percent error rate compared with the other semantic systems.

  7. Large Neighborhood Search

    DEFF Research Database (Denmark)

    Pisinger, David; Røpke, Stefan

    2010-01-01

    Heuristics based on large neighborhood search have recently shown outstanding results in solving various transportation and scheduling problems. Large neighborhood search methods explore a complex neighborhood by use of heuristics. Using large neighborhoods makes it possible to find better...... candidate solutions in each iteration and hence traverse a more promising search path. Starting from the large neighborhood search method,we give an overview of very large scale neighborhood search methods and discuss recent variants and extensions like variable depth search and adaptive large neighborhood...... search....

  8. Monad Transformers for Backtracking Search

    OpenAIRE

    Hedges, Jules

    2014-01-01

    This paper extends Escardo and Oliva's selection monad to the selection monad transformer, a general monadic framework for expressing backtracking search algorithms in Haskell. The use of the closely related continuation monad transformer for similar purposes is also discussed, including an implementation of a DPLL-like SAT solver with no explicit recursion. Continuing a line of work exploring connections between selection functions and game theory, we use the selection monad transformer with...

  9. Testing Self-Similarity Through Lamperti Transformations

    KAUST Repository

    Lee, Myoungji

    2016-07-14

    Self-similar processes have been widely used in modeling real-world phenomena occurring in environmetrics, network traffic, image processing, and stock pricing, to name but a few. The estimation of the degree of self-similarity has been studied extensively, while statistical tests for self-similarity are scarce and limited to processes indexed in one dimension. This paper proposes a statistical hypothesis test procedure for self-similarity of a stochastic process indexed in one dimension and multi-self-similarity for a random field indexed in higher dimensions. If self-similarity is not rejected, our test provides a set of estimated self-similarity indexes. The key is to test stationarity of the inverse Lamperti transformations of the process. The inverse Lamperti transformation of a self-similar process is a strongly stationary process, revealing a theoretical connection between the two processes. To demonstrate the capability of our test, we test self-similarity of fractional Brownian motions and sheets, their time deformations and mixtures with Gaussian white noise, and the generalized Cauchy family. We also apply the self-similarity test to real data: annual minimum water levels of the Nile River, network traffic records, and surface heights of food wrappings. © 2016, International Biometric Society.

  10. Efficient Similarity Retrieval in Music Databases

    DEFF Research Database (Denmark)

    Ruxanda, Maria Magdalena; Jensen, Christian Søndergaard

    2006-01-01

    , the Vector Approximation file is adapted to the indexing of time sequences and to use a lower bound on the DTW distance. Using these techniques, the paper exploits the lack of a ground truth for queries to efficiently compute query results that differ only slightly from results that may be more accurate...

  11. The determination of accurate dipole polarizabilities alpha and gamma for the noble gases

    Science.gov (United States)

    Rice, Julia E.; Taylor, Peter R.; Lee, Timothy J.; Almlof, Jan

    1991-01-01

    Accurate static dipole polarizabilities alpha and gamma of the noble gases He through Xe were determined using wave functions of similar quality for each system. Good agreement with experimental data for the static polarizability gamma was obtained for Ne and Xe, but not for Ar and Kr. Calculations suggest that the experimental values for these latter ions are too low.

  12. Learning music similarity from relative user ratings

    OpenAIRE

    Wolff, D.; Weyde, T.

    2013-01-01

    Computational modelling of music similarity is an increasingly important part of personalisation and optimisation in music information retrieval and research in music perception and cognition. The use of relative similarity ratings is a new and promising approach to modelling similarity that avoids well known problems with absolute ratings. In this article, we use relative ratings from the MagnaTagATune dataset with new and existing variants of state-of-the-art algorithms and provide the firs...

  13. Molecular quantum similarity using conceptual DFT descriptors

    Indian Academy of Sciences (India)

    Patrick Bultinck; Ramon carbó-dorca

    2005-09-01

    This paper reports a Molecular Quantum Similarity study for a set of congeneric steroid molecules, using as basic similarity descriptors electron density ρ (r), shape function (r), the Fukui functions +(r) and -(r) and local softness +(r) and -(r). Correlations are investigated between similarity indices for each couple of descriptors used and compared to assess whether these different descriptors sample different information and to investigate what information is revealed by each descriptor.

  14. Good edit similarity learning by loss minimization

    OpenAIRE

    Bellet, Aurélien; Habrard, Amaury; Sebban, Marc

    2012-01-01

    Similarity functions are a fundamental component of many learning algorithms. When dealing with string or tree-structured data, edit distancebased measures are widely used, and there exists a few methods for learning them from data. However, these methods offer no theoretical guarantee as to the generalization ability and discriminative power of the learned similarities. In this paper, we propose a loss minimization-based edit similarity learning approach, called GESL. It is driven by the not...

  15. Quadruplet-Wise Image Similarity Learning

    OpenAIRE

    Law M.T.; Thome N.; Cord M.

    2013-01-01

    International audience This paper introduces a novel similarity learning frame-work. Working with inequality constraints involving quadruplets of images, our approach aims at efficiently modeling similarity from rich or complex semantic label relationships. From these quadruplet-wise constraints, we propose a similarity learning framework relying on a con-vex optimization scheme. We then study how our metric learning scheme can exploit specific class relationships, such as class ranking (r...

  16. Dark matter halo's and self similarity

    OpenAIRE

    Alard, C.

    2012-01-01

    This papers explores the self similar solutions of the Vlasov-Poisson system and their relation to the gravitational collapse of dynamically cold systems. Analytic solutions are derived for power law potential in one dimension, and extensions of these solutions in three dimensions are proposed. Next the self similarity of the collapse of cold dynamical systems is investigated numerically. The fold system in phase space is consistent with analytic self similar solutions, the solutions present ...

  17. Similarity and a Duality for Fullerenes

    OpenAIRE

    Jennifer J. Edmond; Graver, Jack E.

    2015-01-01

    Fullerenes are molecules of carbon that are modeled by trivalent plane graphs with only pentagonal and hexagonal faces. Scaling up a fullerene gives a notion of similarity, and fullerenes are partitioned into similarity classes. In this expository article, we illustrate how the values of two important fullerene parameters can be deduced for all fullerenes in a similarity class by computing the values of these parameters for just the three smallest representatives of that class. In addition, i...

  18. Ontology-based prior art search

    OpenAIRE

    Bondarenok, A.

    2003-01-01

    This article describes a method of prior art document search based on semantic similarities of a user query and indexed documents. The ontology-based technology of knowledge extraction and representation is used to build document and query images, which are compared using the semantic similarity technique. Documents are ranked according to their semantic similarities to the query, and the top results are shown to the user.

  19. Collaborative Search Trails for Video Search

    OpenAIRE

    Hopfgartner, Frank; Vallet, David; Halvey, Martin; Jose, Joemon

    2009-01-01

    In this paper we present an approach for supporting users in the difficult task of searching for video. We use collaborative feedback mined from the interactions of earlier users of a video search system to help users in their current search tasks. Our objective is to improve the quality of the results that users find, and in doing so also assist users to explore a large and complex information space. It is hoped that this will lead to them considering search options that they may not have co...

  20. Accurate, low-cost 3D-models of gullies

    Science.gov (United States)

    Onnen, Nils; Gronz, Oliver; Ries, Johannes B.; Brings, Christine

    2015-04-01

    Soil erosion is a widespread problem in arid and semi-arid areas. The most severe form is the gully erosion. They often cut into agricultural farmland and can make a certain area completely unproductive. To understand the development and processes inside and around gullies, we calculated detailed 3D-models of gullies in the Souss Valley in South Morocco. Near Taroudant, we had four study areas with five gullies different in size, volume and activity. By using a Canon HF G30 Camcorder, we made varying series of Full HD videos with 25fps. Afterwards, we used the method Structure from Motion (SfM) to create the models. To generate accurate models maintaining feasible runtimes, it is necessary to select around 1500-1700 images from the video, while the overlap of neighboring images should be at least 80%. In addition, it is very important to avoid selecting photos that are blurry or out of focus. Nearby pixels of a blurry image tend to have similar color values. That is why we used a MATLAB script to compare the derivatives of the images. The higher the sum of the derivative, the sharper an image of similar objects. MATLAB subdivides the video into image intervals. From each interval, the image with the highest sum is selected. E.g.: 20min. video at 25fps equals 30.000 single images. The program now inspects the first 20 images, saves the sharpest and moves on to the next 20 images etc. Using this algorithm, we selected 1500 images for our modeling. With VisualSFM, we calculated features and the matches between all images and produced a point cloud. Then, MeshLab has been used to build a surface out of it using the Poisson surface reconstruction approach. Afterwards we are able to calculate the size and the volume of the gullies. It is also possible to determine soil erosion rates, if we compare the data with old recordings. The final step would be the combination of the terrestrial data with the data from our aerial photography. So far, the method works well and we

  1. Keep Searching and You’ll Find

    DEFF Research Database (Denmark)

    Laursen, Keld

    2012-01-01

    This article critically reviews and synthesizes the contributions found in theoretical and empirical studies of firm-level innovation search processes. It explores the advantages and disadvantages of local and non-local search, discusses organizational responses, and identifies potential exogenous...... different search strategies, but end up with very similar technological profiles in fast-growing technologies. The article concludes by highlighting what we have learnt from the literature and suggesting some new avenues for research....... triggers for different kinds of search. It argues that the initial focus on local search was a consequence, in part, of the attention in evolutionary economics to path-dependent behavior, but that as localized behavior was increasingly accepted as the standard mode, studies began to question whether local...

  2. Similarity indices I: what do they measure

    International Nuclear Information System (INIS)

    A method for estimating the effects of environmental effusions on ecosystems is described. The characteristics of 25 similarity indices used in studies of ecological communities were investigated. The type of data structure, to which these indices are frequently applied, was described as consisting of vectors of measurements on attributes (species) observed in a set of samples. A general similarity index was characterized as the result of a two-step process defined on a pair of vectors. In the first step an attribute similarity score is obtained for each attribute by comparing the attribute values observed in the pair of vectors. The result is a vector of attribute similarity scores. These are combined in the second step to arrive at the similarity index. The operation in the first step was characterized as a function, g, defined on pairs of attribute values. The second operation was characterized as a function, F, defined on the vector of attribute similarity scores from the first step. Usually, F was a simple sum or weighted sum of the attribute similarity scores. It is concluded that similarity indices should not be used as the test statistic to discriminate between two ecological communities

  3. Perceived Similarity, Proactive Adjustment, and Organizational Socialization

    Science.gov (United States)

    Kammeyer-Mueller, John D.; Livingston, Beth A.; Liao, Hui

    2011-01-01

    The present study explores how perceived demographic and attitudinal similarity can influence proactive behavior among organizational newcomers. We propose that newcomers who perceive themselves as similar to their co-workers will be more willing to seek new information or build relationships, which in turn will lead to better long-term…

  4. Similarity indices I: what do they measure.

    Energy Technology Data Exchange (ETDEWEB)

    Johnston, J.W.

    1976-11-01

    A method for estimating the effects of environmental effusions on ecosystems is described. The characteristics of 25 similarity indices used in studies of ecological communities were investigated. The type of data structure, to which these indices are frequently applied, was described as consisting of vectors of measurements on attributes (species) observed in a set of samples. A general similarity index was characterized as the result of a two-step process defined on a pair of vectors. In the first step an attribute similarity score is obtained for each attribute by comparing the attribute values observed in the pair of vectors. The result is a vector of attribute similarity scores. These are combined in the second step to arrive at the similarity index. The operation in the first step was characterized as a function, g, defined on pairs of attribute values. The second operation was characterized as a function, F, defined on the vector of attribute similarity scores from the first step. Usually, F was a simple sum or weighted sum of the attribute similarity scores. It is concluded that similarity indices should not be used as the test statistic to discriminate between two ecological communities.

  5. Interleaving Helps Students Distinguish among Similar Concepts

    Science.gov (United States)

    Rohrer, Doug

    2012-01-01

    When students encounter a set of concepts (or terms or principles) that are similar in some way, they often confuse one with another. For instance, they might mistake one word for another word with a similar spelling (e.g., allusion instead of illusion) or choose the wrong strategy for a mathematics problem because it resembles a different kind of…

  6. Similarity Structure of Wave-Collapse

    DEFF Research Database (Denmark)

    Rypdal, Kristoffer; Juul Rasmussen, Jens; Thomsen, Kenneth

    1985-01-01

    Similarity transformations of the cubic Schrödinger equation (CSE) are investigated. The transformations are used to remove the explicit time variation in the CSE and reduce it to differential equations in the spatial variables only. Two different methods for similarity reduction are employed and...

  7. Some Effects of Similarity Self-Disclosure

    Science.gov (United States)

    Murphy, Kevin C.; Strong, Stanley R.

    1972-01-01

    College males were interviewed about how college had altered their friendships, values, and plans. The interviewers diclosed experiences and feelings similar to those revealed by the students. Results support Byrne's Law of Similarity in generating interpersonal attraction in the interview and suggest that the timing of self-disclosures is…

  8. Similar methodological analysis involving the user experience.

    Science.gov (United States)

    Almeida e Silva, Caio Márcio; Okimoto, Maria Lúcia R L; Tanure, Raffaela Leane Zenni

    2012-01-01

    This article deals with the use of a protocol for analysis of similar methodological analysis related to user experience. For both, were selected articles recounting experiments in the area. They were analyze based on the similar analysis protocol and finally, synthesized and associated.

  9. Measure of Node Similarity in Multilayer Networks

    CERN Document Server

    Mollgaard, Anders; Dammeyer, Jesper; Jensen, Mogens H; Lehmann, Sune; Mathiesen, Joachim

    2016-01-01

    The weight of links in a network is often related to the similarity of the nodes. Here, we introduce a simple tunable measure for analysing the similarity of nodes across different link weights. In particular, we use the measure to analyze homophily in a group of 659 freshman students at a large university. Our analysis is based on data obtained using smartphones equipped with custom data collection software, complemented by questionnaire-based data. The network of social contacts is represented as a weighted multilayer network constructed from different channels of telecommunication as well as data on face-to-face contacts. We find that even strongly connected individuals are not more similar with respect to basic personality traits than randomly chosen pairs of individuals. In contrast, several socio-demographics variables have a significant degree of similarity. We further observe that similarity might be present in one layer of the multilayer network and simultaneously be absent in the other layers. For a...

  10. Semantic Similarity Calculation of Chinese Word

    Directory of Open Access Journals (Sweden)

    Liqiang Pan

    2014-08-01

    Full Text Available This paper puts forward a two layers computing method to calculate semantic similarity of Chinese word. Firstly, using Latent Dirichlet Allocation (LDA subject model to generate subject spatial domain. Then mapping word into topic space and forming topic distribution which is used to calculate semantic similarity of word(the first layer computing. Finally, using semantic dictionary "HowNet" to deeply excavate semantic similarity of word (the second layer computing. This method not only overcomes the problem that it’s not specific enough merely using LDA to calculate semantic similarity of word, but also solves the problems such as new words (haven’t been added in dictionary and without considering specific context when calculating semantic similarity based on semantic dictionary "HowNet". By experimental comparison, this thesis proves feasibility,availability and advantages of the calculation method.

  11. Similarity and a Duality for Fullerenes

    Directory of Open Access Journals (Sweden)

    Jennifer J. Edmond

    2015-11-01

    Full Text Available Fullerenes are molecules of carbon that are modeled by trivalent plane graphs with only pentagonal and hexagonal faces. Scaling up a fullerene gives a notion of similarity, and fullerenes are partitioned into similarity classes. In this expository article, we illustrate how the values of two important fullerene parameters can be deduced for all fullerenes in a similarity class by computing the values of these parameters for just the three smallest representatives of that class. In addition, it turns out that there is a natural duality theory for similarity classes of fullerenes based on one of the most important fullerene construction techniques: leapfrog construction. The literature on fullerenes is very extensive, and since this is a general interest journal, we will summarize and illustrate the fundamental results that we will need to develop similarity and this duality.

  12. Searching and Indexing Genomic Databases via Kernelization

    Directory of Open Access Journals (Sweden)

    Travis eGagie

    2015-02-01

    Full Text Available The rapid advance of DNA sequencing technologies has yielded databases of thousands of genomes. To search and index these databases effectively, it is important that we take advantage of the similarity between those genomes. Several authors have recently suggested searching or indexing only one reference genome and the parts of the other genomes where they differ. In this paper we survey the twenty-year history of this idea and discuss its relation to kernelization in parameterized complexity.

  13. Fast and accurate hashing via iterative nearest neighbors expansion.

    Science.gov (United States)

    Jin, Zhongming; Zhang, Debing; Hu, Yao; Lin, Shiding; Cai, Deng; He, Xiaofei

    2014-11-01

    Recently, the hashing techniques have been widely applied to approximate the nearest neighbor search problem in many real applications. The basic idea of these approaches is to generate binary codes for data points which can preserve the similarity between any two of them. Given a query, instead of performing a linear scan of the entire data base, the hashing method can perform a linear scan of the points whose hamming distance to the query is not greater than rh , where rh is a constant. However, in order to find the true nearest neighbors, both the locating time and the linear scan time are proportional to O(∑i=0(rh)(c || i)) ( c is the code length), which increase exponentially as rh increases. To address this limitation, we propose a novel algorithm named iterative expanding hashing in this paper, which builds an auxiliary index based on an offline constructed nearest neighbor table to avoid large rh . This auxiliary index can be easily combined with all the traditional hashing methods. Extensive experimental results over various real large-scale datasets demonstrate the superiority of the proposed approach.

  14. Automatic Planning of External Search Engine Optimization

    Directory of Open Access Journals (Sweden)

    Vita Jasevičiūtė

    2015-07-01

    Full Text Available This paper describes an investigation of the external search engine optimization (SEO action planning tool, dedicated to automatically extract a small set of most important keywords for each month during whole year period. The keywords in the set are extracted accordingly to external measured parameters, such as average number of searches during the year and for every month individually. Additionally the position of the optimized web site for each keyword is taken into account. The generated optimization plan is similar to the optimization plans prepared manually by the SEO professionals and can be successfully used as a support tool for web site search engine optimization.

  15. Using the Dual-Target Cost to Explore the Nature of Search Target Representations

    Science.gov (United States)

    Stroud, Michael J.; Menneer, Tamaryn; Cave, Kyle R.; Donnelly, Nick

    2012-01-01

    Eye movements were monitored to examine search efficiency and infer how color is mentally represented to guide search for multiple targets. Observers located a single color target very efficiently by fixating colors similar to the target. However, simultaneous search for 2 colors produced a dual-target cost. In addition, as the similarity between…

  16. Adaptive Levy processes and area-restricted search in human foraging.

    Directory of Open Access Journals (Sweden)

    Thomas T Hills

    Full Text Available A considerable amount of research has claimed that animals' foraging behaviors display movement lengths with power-law distributed tails, characteristic of Lévy flights and Lévy walks. Though these claims have recently come into question, the proposal that many animals forage using Lévy processes nonetheless remains. A Lévy process does not consider when or where resources are encountered, and samples movement lengths independently of past experience. However, Lévy processes too have come into question based on the observation that in patchy resource environments resource-sensitive foraging strategies, like area-restricted search, perform better than Lévy flights yet can still generate heavy-tailed distributions of movement lengths. To investigate these questions further, we tracked humans as they searched for hidden resources in an open-field virtual environment, with either patchy or dispersed resource distributions. Supporting previous research, for both conditions logarithmic binning methods were consistent with Lévy flights and rank-frequency methods-comparing alternative distributions using maximum likelihood methods-showed the strongest support for bounded power-law distributions (truncated Lévy flights. However, goodness-of-fit tests found that even bounded power-law distributions only accurately characterized movement behavior for 4 (out of 32 participants. Moreover, paths in the patchy environment (but not the dispersed environment showed a transition to intensive search following resource encounters, characteristic of area-restricted search. Transferring paths between environments revealed that paths generated in the patchy environment were adapted to that environment. Our results suggest that though power-law distributions do not accurately reflect human search, Lévy processes may still describe movement in dispersed environments, but not in patchy environments-where search was area-restricted. Furthermore, our results

  17. TOWARDS MORE ACCURATE CLUSTERING METHOD BY USING DYNAMIC TIME WARPING

    Directory of Open Access Journals (Sweden)

    Khadoudja Ghanem

    2013-03-01

    Full Text Available An intrinsic problem of classifiers based on machine learning (ML methods is that their learning time grows as the size and complexity of the training dataset increases. For this reason, it is important to have efficient computational methods and algorithms that can be applied on large datasets, such that it is still possible to complete the machine learning tasks in reasonable time. In this context, we present in this paper a more accurate simple process to speed up ML methods. An unsupervised clustering algorithm is combined with Expectation, Maximization (EM algorithm to develop an efficient Hidden Markov Model (HMM training. The idea of the proposed process consists of two steps. In the first step, training instances with similar inputs are clustered and a weight factor which represents the frequency of these instances is assigned to each representative cluster. Dynamic Time Warping technique is used as a dissimilarity function to cluster similar examples. In the second step, all formulas in the classical HMM training algorithm (EM associated with the number of training instances are modified to include the weight factor in appropriate terms. This process significantly accelerates HMM training while maintaining the same initial, transition and emission probabilities matrixes as those obtained with the classical HMM training algorithm. Accordingly, the classification accuracy is preserved. Depending on the size of the training set, speedups of up to 2200 times is possible when the size is about 100.000 instances. The proposed approach is not limited to training HMMs, but it can be employed for a large variety of MLs methods.

  18. Towards More Accurate Clutering Method by Using Dynamic Time Warping

    Directory of Open Access Journals (Sweden)

    Khadoudja Ghanem

    2013-04-01

    Full Text Available An intrinsic problem of classifiers based on machine learning (ML methods is that their learning timegrows as the size and complexity of the training dataset increases. For this reason, it is important to have efficient computational methods and algorithms that can be applied on large datasets, such that it is still possible to complete the machine learning tasks in reasonable time. In this context, we present in this paper a more accurate simple process to speed up ML methods. An unsupervised clustering algorithm is combined with Expectation, Maximization (EM algorithm to develop an efficient Hidden Markov Model (HMM training. The idea of the proposed process consists of two steps. In the first step, training instances with similar inputs are clustered and a weight factor which represents the frequency of these instances is assigned to each representative cluster. Dynamic Time Warping technique is used as a dissimilarity function to cluster similar examples. In the second step, all formulas in the classical HMM training algorithm (EM associated with the number of training instances are modified to include the weight factor in appropriate terms. This process significantly accelerates HMM training while maintaining the same initial, transition and emission probabilities matrixes as those obtained with the classical HMM training algorithm. Accordingly, the classification accuracy is preserved. Depending on the size of the training set, speedups of up to 2200 times is possible when the size is about 100.000 instances. The proposed approach is not limited to training HMMs, but it can be employed for a large variety of MLs methods

  19. Measure of Node Similarity in Multilayer Networks.

    Science.gov (United States)

    Mollgaard, Anders; Zettler, Ingo; Dammeyer, Jesper; Jensen, Mogens H; Lehmann, Sune; Mathiesen, Joachim

    2016-01-01

    The weight of links in a network is often related to the similarity of the nodes. Here, we introduce a simple tunable measure for analysing the similarity of nodes across different link weights. In particular, we use the measure to analyze homophily in a group of 659 freshman students at a large university. Our analysis is based on data obtained using smartphones equipped with custom data collection software, complemented by questionnaire-based data. The network of social contacts is represented as a weighted multilayer network constructed from different channels of telecommunication as well as data on face-to-face contacts. We find that even strongly connected individuals are not more similar with respect to basic personality traits than randomly chosen pairs of individuals. In contrast, several socio-demographics variables have a significant degree of similarity. We further observe that similarity might be present in one layer of the multilayer network and simultaneously be absent in the other layers. For a variable such as gender, our measure reveals a transition from similarity between nodes connected with links of relatively low weight to dis-similarity for the nodes connected by the strongest links. We finally analyze the overlap between layers in the network for different levels of acquaintanceships. PMID:27300084

  20. The baryonic self similarity of dark matter

    Energy Technology Data Exchange (ETDEWEB)

    Alard, C., E-mail: alard@iap.fr [Institut d' Astrophysique de Paris, 98bis boulevard Arago, F-75014 Paris (France)

    2014-06-20

    The cosmological simulations indicates that dark matter halos have specific self-similar properties. However, the halo similarity is affected by the baryonic feedback. By using momentum-driven winds as a model to represent the baryon feedback, an equilibrium condition is derived which directly implies the emergence of a new type of similarity. The new self-similar solution has constant acceleration at a reference radius for both dark matter and baryons. This model receives strong support from the observations of galaxies. The new self-similar properties imply that the total acceleration at larger distances is scale-free, the transition between the dark matter and baryons dominated regime occurs at a constant acceleration, and the maximum amplitude of the velocity curve at larger distances is proportional to M {sup 1/4}. These results demonstrate that this self-similar model is consistent with the basics of modified Newtonian dynamics (MOND) phenomenology. In agreement with the observations, the coincidence between the self-similar model and MOND breaks at the scale of clusters of galaxies. Some numerical experiments show that the behavior of the density near the origin is closely approximated by a Einasto profile.

  1. Efficient Privacy Preserving Protocols for Similarity Join

    Directory of Open Access Journals (Sweden)

    Bilal Hawashin

    2012-04-01

    Full Text Available During the similarity join process, one or more sources may not allow sharing its data with other sources. In this case, a privacy preserving similarity join is required. We showed in our previous work [4] that using long attributes, such as paper abstracts, movie summaries, product descriptions, and user feedbacks, could improve the similarity join accuracy using supervised learning. However, the existing secure protocols for similarity join methods can not be used to join sources using these long attributes. Moreover, the majority of the existing privacy‐preserving protocols do not consider the semantic similarities during the similarity join process. In this paper, we introduce a secure efficient protocol to semantically join sources when the join attributes are long attributes. We provide two secure protocols for both scenarios when a training set exists and when there is no available training set. Furthermore, we introduced the multi‐label supervised secure protocol and the expandable supervised secure protocol. Results show that our protocols can efficiently join sources using the long attributes by considering the semantic relationships among the long string values. Therefore, it improves the overall secure similarity join performance.

  2. Measure of Node Similarity in Multilayer Networks

    Science.gov (United States)

    Mollgaard, Anders; Zettler, Ingo; Dammeyer, Jesper; Jensen, Mogens H.; Lehmann, Sune; Mathiesen, Joachim

    2016-01-01

    The weight of links in a network is often related to the similarity of the nodes. Here, we introduce a simple tunable measure for analysing the similarity of nodes across different link weights. In particular, we use the measure to analyze homophily in a group of 659 freshman students at a large university. Our analysis is based on data obtained using smartphones equipped with custom data collection software, complemented by questionnaire-based data. The network of social contacts is represented as a weighted multilayer network constructed from different channels of telecommunication as well as data on face-to-face contacts. We find that even strongly connected individuals are not more similar with respect to basic personality traits than randomly chosen pairs of individuals. In contrast, several socio-demographics variables have a significant degree of similarity. We further observe that similarity might be present in one layer of the multilayer network and simultaneously be absent in the other layers. For a variable such as gender, our measure reveals a transition from similarity between nodes connected with links of relatively low weight to dis-similarity for the nodes connected by the strongest links. We finally analyze the overlap between layers in the network for different levels of acquaintanceships. PMID:27300084

  3. Genetic algorithms as global random search methods

    Science.gov (United States)

    Peck, Charles C.; Dhawan, Atam P.

    1995-01-01

    Genetic algorithm behavior is described in terms of the construction and evolution of the sampling distributions over the space of candidate solutions. This novel perspective is motivated by analysis indicating that the schema theory is inadequate for completely and properly explaining genetic algorithm behavior. Based on the proposed theory, it is argued that the similarities of candidate solutions should be exploited directly, rather than encoding candidate solutions and then exploiting their similarities. Proportional selection is characterized as a global search operator, and recombination is characterized as the search process that exploits similarities. Sequential algorithms and many deletion methods are also analyzed. It is shown that by properly constraining the search breadth of recombination operators, convergence of genetic algorithms to a global optimum can be ensured.

  4. Similarity-based pattern analysis and recognition

    CERN Document Server

    Pelillo, Marcello

    2013-01-01

    This accessible text/reference presents a coherent overview of the emerging field of non-Euclidean similarity learning. The book presents a broad range of perspectives on similarity-based pattern analysis and recognition methods, from purely theoretical challenges to practical, real-world applications. The coverage includes both supervised and unsupervised learning paradigms, as well as generative and discriminative models. Topics and features: explores the origination and causes of non-Euclidean (dis)similarity measures, and how they influence the performance of traditional classification alg

  5. Faster and More Accurate Sequence Alignment with SNAP

    CERN Document Server

    Zaharia, Matei; Curtis, Kristal; Fox, Armando; Patterson, David; Shenker, Scott; Stoica, Ion; Karp, Richard M; Sittler, Taylor

    2011-01-01

    We present the Scalable Nucleotide Alignment Program (SNAP), a new short and long read aligner that is both more accurate (i.e., aligns more reads with fewer errors) and 10-100x faster than state-of-the-art tools such as BWA. Unlike recent aligners based on the Burrows-Wheeler transform, SNAP uses a simple hash index of short seed sequences from the genome, similar to BLAST's. However, SNAP greatly reduces the number and cost of local alignment checks performed through several measures: it uses longer seeds to reduce the false positive locations considered, leverages larger memory capacities to speed index lookup, and excludes most candidate locations without fully computing their edit distance to the read. The result is an algorithm that scales well for reads from one hundred to thousands of bases long and provides a rich error model that can match classes of mutations (e.g., longer indels) that today's fast aligners ignore. We calculate that SNAP can align a dataset with 30x coverage of a human genome in le...

  6. Accurate measurement of streamwise vortices in low speed aerodynamic flows

    Science.gov (United States)

    Waldman, Rye M.; Kudo, Jun; Breuer, Kenneth S.

    2010-11-01

    Low Reynolds number experiments with flapping animals (such as bats and small birds) are of current interest in understanding biological flight mechanics, and due to their application to Micro Air Vehicles (MAVs) which operate in a similar parameter space. Previous PIV wake measurements have described the structures left by bats and birds, and provided insight to the time history of their aerodynamic force generation; however, these studies have faced difficulty drawing quantitative conclusions due to significant experimental challenges associated with the highly three-dimensional and unsteady nature of the flows, and the low wake velocities associated with lifting bodies that only weigh a few grams. This requires the high-speed resolution of small flow features in a large field of view using limited laser energy and finite camera resolution. Cross-stream measurements are further complicated by the high out-of-plane flow which requires thick laser sheets and short interframe times. To quantify and address these challenges we present data from a model study on the wake behind a fixed wing at conditions comparable to those found in biological flight. We present a detailed analysis of the PIV wake measurements, discuss the criteria necessary for accurate measurements, and present a new dual-plane PIV configuration to resolve these issues.

  7. Downhole temperature tool accurately measures well bore profile

    International Nuclear Information System (INIS)

    This paper reports that an inexpensive temperature tool provides accurate temperatures measurements during drilling operations for better design of cement jobs, workovers, well stimulation, and well bore hydraulics. Valid temperature data during specific wellbore operations can improve initial job design, fluid testing, and slurry placement, ultimately enhancing well bore performance. This improvement applies to cement slurries, breaker activation for slurries, breaker activation for stimulation and profile control, and fluid rheological properties for all downhole operations. The temperature tool has been run standalone mounted inside drill pipe, on slick wire line and braided cable, and as a free-falltool. It has also been run piggyback on both directional surveys (slick line and free-fall) and standard logging runs. This temperature measuring system has been used extensively in field well bores to depths of 20,000 ft. The temperature tool is completely reusable in the field, ever similar to the standard directional survey tools used on may drilling rigs. The system includes a small, rugged, programmable temperature sensor, a standard body housing, various adapters for specific applications, and a personal computer (PC) interface

  8. Searching for similarities : transfer-oriented learning in health education at secondary schools

    NARCIS (Netherlands)

    Peters, L.W.H.

    2012-01-01

    Het voortgezet onderwijs wordt overladen met lespakketten over gezond gedrag. Deze richten zich veelal op enkelvoudige gedragsdomeinen (bijvoorbeeld roken, alcohol, voeding of veilige seks). Het zou efficiënter zijn als één lespakket, met beperkte lestijd, effecten op meerdere gedragsdomeinen tegeli

  9. SHOP: receptor-based scaffold hopping by GRID-based similarity searches

    DEFF Research Database (Denmark)

    Bergmann, Rikke; Liljefors, Tommy; Sørensen, Morten D;

    2009-01-01

    A new field-derived 3D method for receptor-based scaffold hopping, implemented in the software SHOP, is presented. Information from a protein-ligand complex is utilized to substitute a fragment of the ligand with another fragment from a database of synthetically accessible scaffolds. A GRID-based...

  10. Similarity landscapes: An improved method for scientific visualization of information from protein and DNA database searches

    Energy Technology Data Exchange (ETDEWEB)

    Dogget, N.; Myers, G. [Los Alamos National Lab., NM (United States); Wills, C.J. [Univ. of California, San Diego, CA (United States)

    1998-12-01

    This is the final report of a three-year, Laboratory Directed Research and Development (LDRD) project at the Los Alamos National Laboratory (LANL). The authors have used computer simulations and examination of a variety of databases to answer questions about a wide range of evolutionary questions. The authors have found that there is a clear distinction in the evolution of HIV-1 and HIV-2, with the former and more virulent virus evolving more rapidly at a functional level. The authors have discovered highly non-random patterns in the evolution of HIV-1 that can be attributed to a variety of selective pressures. In the course of examination of microsatellite DNA (short repeat regions) in microorganisms, the authors have found clear differences between prokaryotes and eukaryotes in their distribution, differences that can be tied to different selective pressures. They have developed a new method (topiary pruning) for enhancing the phylogenetic information contained in DNA sequences. Most recently, the authors have discovered effects in complex rainforest ecosystems that indicate strong frequency-dependent interactions between host species and their parasites, leading to the maintenance of ecosystem variability.

  11. GeoSearch: A lightweight broking middleware for geospatial resources discovery

    Science.gov (United States)

    Gui, Z.; Yang, C.; Liu, K.; Xia, J.

    2012-12-01

    With petabytes of geodata, thousands of geospatial web services available over the Internet, it is critical to support geoscience research and applications by finding the best-fit geospatial resources from the massive and heterogeneous resources. Past decades' developments witnessed the operation of many service components to facilitate geospatial resource management and discovery. However, efficient and accurate geospatial resource discovery is still a big challenge due to the following reasons: 1)The entry barriers (also called "learning curves") hinder the usability of discovery services to end users. Different portals and catalogues always adopt various access protocols, metadata formats and GUI styles to organize, present and publish metadata. It is hard for end users to learn all these technical details and differences. 2)The cost for federating heterogeneous services is high. To provide sufficient resources and facilitate data discovery, many registries adopt periodic harvesting mechanism to retrieve metadata from other federated catalogues. These time-consuming processes lead to network and storage burdens, data redundancy, and also the overhead of maintaining data consistency. 3)The heterogeneous semantics issues in data discovery. Since the keyword matching is still the primary search method in many operational discovery services, the search accuracy (precision and recall) is hard to guarantee. Semantic technologies (such as semantic reasoning and similarity evaluation) offer a solution to solve these issues. However, integrating semantic technologies with existing service is challenging due to the expandability limitations on the service frameworks and metadata templates. 4)The capabilities to help users make final selection are inadequate. Most of the existing search portals lack intuitive and diverse information visualization methods and functions (sort, filter) to present, explore and analyze search results. Furthermore, the presentation of the value

  12. Giant African pouched rats (Cricetomys gambianus) that work on tilled soil accurately detect land mines.

    Science.gov (United States)

    Edwards, Timothy L; Cox, Christophe; Weetjens, Bart; Tewelde, Tesfazghi; Poling, Alan

    2015-09-01

    Pouched rats were employed as mine-detection animals in a quality-control application where they searched for mines in areas previously processed by a mechanical tiller. The rats located 58 mines and fragments in this 28,050-m(2) area with a false indication rate of 0.4 responses per 100 m(2) . Humans with metal detectors found no mines that were not located by the rats. These findings indicate that pouched rats can accurately detect land mines in disturbed soil and suggest that they can play multiple roles in humanitarian demining. PMID:25962550

  13. Semantic Search among Heterogeneous Biological Databases Based on Gene Ontology

    Institute of Scientific and Technical Information of China (English)

    Shun-Liang CAO; Lei QIN; Wei-Zhong HE; Yang ZHONG; Yang-Yong ZHU; Yi-Xue LI

    2004-01-01

    Semantic search is a key issue in integration of heterogeneous biological databases. In thispaper, we present a methodology for implementing semantic search in BioDW, an integrated biological datawarehouse. Two tables are presented: the DB2GO table to correlate Gene Ontology (GO) annotated entriesfrom BioDW data sources with GO, and the semantic similarity table to record similarity scores derived fromany pair of GO terms. Based on the two tables, multifarious ways for semantic search are provided and thecorresponding entries in heterogeneous biological databases in semantic terms can be expediently searched.

  14. Learning content similarity for music recommendation

    CERN Document Server

    McFee, Brian; Lanckriet, Gert

    2011-01-01

    Many tasks in music information retrieval, such as recommendation, and playlist generation for online radio, fall naturally into the query-by-example setting, wherein a user queries the system by providing a song, and the system responds with a list of relevant or similar song recommendations. Such applications ultimately depend on the notion of similarity between items to produce high-quality results. Current state-of-the-art systems employ collaborative filter methods to represent musical items, effectively comparing items in terms of their constituent users. While collaborative filter techniques perform well when historical data is available for each item, their reliance on historical data impedes performance on novel or unpopular items. To combat this problem, practitioners rely on content-based similarity, which naturally extends to novel items, but is typically out-performed by collaborative filter methods. In this article, we propose a method for optimizing contentbased similarity by learning from a sa...

  15. Bilateral Trade Flows and Income Distribution Similarity

    Science.gov (United States)

    2016-01-01

    Current models of bilateral trade neglect the effects of income distribution. This paper addresses the issue by accounting for non-homothetic consumer preferences and hence investigating the role of income distribution in the context of the gravity model of trade. A theoretically justified gravity model is estimated for disaggregated trade data (Dollar volume is used as dependent variable) using a sample of 104 exporters and 108 importers for 1980–2003 to achieve two main goals. We define and calculate new measures of income distribution similarity and empirically confirm that greater similarity of income distribution between countries implies more trade. Using distribution-based measures as a proxy for demand similarities in gravity models, we find consistent and robust support for the hypothesis that countries with more similar income-distributions trade more with each other. The hypothesis is also confirmed at disaggregated level for differentiated product categories. PMID:27137462

  16. Correlation between social proximity and mobility similarity

    CERN Document Server

    Fan, Chao; Huang, Junming; Rong, Zhihai; Zhou, Tao

    2016-01-01

    Human behaviors exhibit ubiquitous correlations in many aspects, such as individual and collective levels, temporal and spatial dimensions, content, social and geographical layers. With rich Internet data of online behaviors becoming available, it attracts academic interests to explore human mobility similarity from the perspective of social network proximity. Existent analysis shows a strong correlation between online social proximity and offline mobility similari- ty, namely, mobile records between friends are significantly more similar than between strangers, and those between friends with common neighbors are even more similar. We argue the importance of the number and diversity of com- mon friends, with a counter intuitive finding that the number of common friends has no positive impact on mobility similarity while the diversity plays a key role, disagreeing with previous studies. Our analysis provides a novel view for better understanding the coupling between human online and offline behaviors, and will...

  17. Bilateral Trade Flows and Income Distribution Similarity.

    Science.gov (United States)

    Martínez-Zarzoso, Inmaculada; Vollmer, Sebastian

    2016-01-01

    Current models of bilateral trade neglect the effects of income distribution. This paper addresses the issue by accounting for non-homothetic consumer preferences and hence investigating the role of income distribution in the context of the gravity model of trade. A theoretically justified gravity model is estimated for disaggregated trade data (Dollar volume is used as dependent variable) using a sample of 104 exporters and 108 importers for 1980-2003 to achieve two main goals. We define and calculate new measures of income distribution similarity and empirically confirm that greater similarity of income distribution between countries implies more trade. Using distribution-based measures as a proxy for demand similarities in gravity models, we find consistent and robust support for the hypothesis that countries with more similar income-distributions trade more with each other. The hypothesis is also confirmed at disaggregated level for differentiated product categories. PMID:27137462

  18. Distances and similarities in intuitionistic fuzzy sets

    CERN Document Server

    Szmidt, Eulalia

    2014-01-01

    This book presents the state-of-the-art in theory and practice regarding similarity and distance measures for intuitionistic fuzzy sets. Quantifying similarity and distances is crucial for many applications, e.g. data mining, machine learning, decision making, and control. The work provides readers with a comprehensive set of theoretical concepts and practical tools for both defining and determining similarity between intuitionistic fuzzy sets. It describes an automatic algorithm for deriving intuitionistic fuzzy sets from data, which can aid in the analysis of information in large databases. The book also discusses other important applications, e.g. the use of similarity measures to evaluate the extent of agreement between experts in the context of decision making.

  19. Bilateral Trade Flows and Income Distribution Similarity.

    Science.gov (United States)

    Martínez-Zarzoso, Inmaculada; Vollmer, Sebastian

    2016-01-01

    Current models of bilateral trade neglect the effects of income distribution. This paper addresses the issue by accounting for non-homothetic consumer preferences and hence investigating the role of income distribution in the context of the gravity model of trade. A theoretically justified gravity model is estimated for disaggregated trade data (Dollar volume is used as dependent variable) using a sample of 104 exporters and 108 importers for 1980-2003 to achieve two main goals. We define and calculate new measures of income distribution similarity and empirically confirm that greater similarity of income distribution between countries implies more trade. Using distribution-based measures as a proxy for demand similarities in gravity models, we find consistent and robust support for the hypothesis that countries with more similar income-distributions trade more with each other. The hypothesis is also confirmed at disaggregated level for differentiated product categories.

  20. Interpersonal Congruency, Attitude Similarity, and Interpersonal Attraction

    Science.gov (United States)

    Touhey, John C.

    1975-01-01

    As no experimental study has examined the effects of congruency on attraction, the present investigation orthogonally varied attitude similarity and interpersonal congruency in order to compare the two independent variables as determinants of interpersonal attraction. (Author/RK)

  1. Discovering Music Structure via Similarity Fusion

    DEFF Research Database (Denmark)

    Arenas-García, Jerónimo; Parrado-Hernandez, Emilio; Meng, Anders;

    Automatic methods for music navigation and music recommendation exploit the structure in the music to carry out a meaningful exploration of the “song space”. To get a satisfactory performance from such systems, one should incorporate as much information about songs similarity as possible; however...... semantics”, in such a way that all observed similarities can be satisfactorily explained using the latent semantics. Therefore, one can think of these semantics as the real structure in music, in the sense that they can explain the observed similarities among songs. The suitability of the PLSA model...... for representing music structure is studied in a simplified scenario consisting of 4412 songs and two similarity measures among them. The results suggest that the PLSA model is a useful framework to combine different sources of information, and provides a reasonable space for song representation....

  2. Media segmentation using self-similarity decomposition

    Science.gov (United States)

    Foote, Jonathan T.; Cooper, Matthew L.

    2003-01-01

    We present a framework for analyzing the structure of digital media streams. Though our methods work for video, text, and audio, we concentrate on detecting the structure of digital music files. In the first step, spectral data is used to construct a similarity matrix calculated from inter-frame spectral similarity.The digital audio can be robustly segmented by correlating a kernel along the diagonal of the similarity matrix. Once segmented, spectral statistics of each segment are computed. In the second step,segments are clustered based on the self-similarity of their statistics. This reveals the structure of the digital music in a set of segment boundaries and labels. Finally, the music is summarized by selecting clusters with repeated segments throughout the piece. The summaries can be customized for various applications based on the structure of the original music.

  3. Interpersonal attraction and personality: what is attractive--self similarity, ideal similarity, complementarity or attachment security?

    Science.gov (United States)

    Klohnen, Eva C; Luo, Shanhong

    2003-10-01

    Little is known about whether personality characteristics influence initial attraction. Because adult attachment differences influence a broad range of relationship processes, the authors examined their role in 3 experimental attraction studies. The authors tested four major attraction hypotheses--self similarity, ideal-self similarity, complementarity, and attachment security--and examined both actual and perceptual factors. Replicated analyses across samples, designs, and manipulations showed that actual security and self similarity predicted attraction. With regard to perceptual factors, ideal similarity, self similarity, and security all were significant predictors. Whereas perceptual ideal and self similarity had incremental predictive power, perceptual security's effects were subsumed by perceptual ideal similarity. Perceptual self similarity fully mediated actual attachment similarity effects, whereas ideal similarity was only a partial mediator. PMID:14561124

  4. Searching Online for 'Hemorrhoids'?

    Science.gov (United States)

    ... For Consumers Home For Consumers Consumer Updates Searching Online for 'Hemorrhoids'? Share Tweet Linkedin Pin it More ... he believes most people would rather search anonymously online for information about hemorrhoids than ask their doctors, ...

  5. On distributional assumptions and whitened cosine similarities

    DEFF Research Database (Denmark)

    Loog, Marco

    2008-01-01

    Recently, an interpretation of the whitened cosine similarity measure as a Bayes decision rule was proposed (C. Liu, "The Bayes Decision Rule Induced Similarity Measures,'' IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 6, pp. 1086-1090, June 2007. This communication makes th...... the observation that some of the distributional assumptions made to derive this measure are very restrictive and, considered simultaneously, even inconsistent....

  6. Some more similarities between Peirce and Skinner

    OpenAIRE

    Moxley, Roy A

    2002-01-01

    C. S. Peirce is noted for pioneering a variety of views, and the case is made here for the similarities and parallels between his views and B. F. Skinner's radical behaviorism. In addition to parallels previously noted, these similarities include an advancement of experimental science, a behavioral psychology, a shift from nominalism to realism, an opposition to positivism, a selectionist account for strengthening behavior, the importance of a community of selves, a recursive approach to meth...

  7. SIMILARITY NETWORK FOR SEMANTIC WEB SERVICES SUBSTITUTION

    OpenAIRE

    Cherifi, Chantal

    2013-01-01

    Web services substitution is one of the most challenging tasks for automating the composition process of multiple Web services. It aims to improve performances and to deal efficiently with Web services failures. Many existing solutions have approached the problem through classification of substitutable Web services. To go a step further, we propose in this paper a network based approach where nodes are Web services operations and links join similar operations. Four similarity measures based o...

  8. Limit theorems for self-similar tilings

    CERN Document Server

    Bufetov, Alexander I

    2012-01-01

    We study deviation of ergodic averages for dynamical systems given by self-similar tilings on the plane and in higher dimensions. The main object of our paper is a special family of finitely-additive measures for our systems. An asymptotic formula is given for ergodic integrals in terms of these finitely-additive measures, and, as a corollary, limit theorems are obtained for dynamical systems given by self-similar tilings.

  9. Privacy-preserving matching of similar patients.

    Science.gov (United States)

    Vatsalan, Dinusha; Christen, Peter

    2016-02-01

    The identification of similar entities represented by records in different databases has drawn considerable attention in many application areas, including in the health domain. One important type of entity matching application that is vital for quality healthcare analytics is the identification of similar patients, known as similar patient matching. A key component of identifying similar records is the calculation of similarity of the values in attributes (fields) between these records. Due to increasing privacy and confidentiality concerns, using the actual attribute values of patient records to identify similar records across different organizations is becoming non-trivial because the attributes in such records often contain highly sensitive information such as personal and medical details of patients. Therefore, the matching needs to be based on masked (encoded) values while being effective and efficient to allow matching of large databases. Bloom filter encoding has widely been used as an efficient masking technique for privacy-preserving matching of string and categorical values. However, no work on Bloom filter-based masking of numerical data, such as integer (e.g. age), floating point (e.g. body mass index), and modulus (numbers wrap around upon reaching a certain value, e.g. date and time), which are commonly required in the health domain, has been presented in the literature. We propose a framework with novel methods for masking numerical data using Bloom filters, thereby facilitating the calculation of similarities between records. We conduct an empirical study on publicly available real-world datasets which shows that our framework provides efficient masking and achieves similar matching accuracy compared to the matching of actual unencoded patient records. PMID:26707453

  10. Cascade category-aware visual search.

    Science.gov (United States)

    Zhang, Shiliang; Tian, Qi; Huang, Qingming; Gao, Wen; Rui, Yong

    2014-06-01

    Incorporating image classification into image retrieval system brings many attractive advantages. For instance, the search space can be narrowed down by rejecting images in irrelevant categories of the query. The retrieved images can be more consistent in semantics by indexing and returning images in the relevant categories together. However, due to their different goals on recognition accuracy and retrieval scalability, it is hard to efficiently incorporate most image classification works into large-scale image search. To study this problem, we propose cascade category-aware visual search, which utilizes weak category clue to achieve better retrieval accuracy, efficiency, and memory consumption. To capture the category and visual clues of an image, we first learn category-visual words, which are discriminative and repeatable local features labeled with categories. By identifying category-visual words in database images, we are able to discard noisy local features and extract image visual and category clues, which are hence recorded in a hierarchical index structure. Our retrieval system narrows down the search space by: 1) filtering the noisy local features in query; 2) rejecting irrelevant categories in database; and 3) preforming discriminative visual search in relevant categories. The proposed algorithm is tested on object search, landmark search, and large-scale similar image search on the large-scale LSVRC10 data set. Although the category clue introduced is weak, our algorithm still shows substantial advantages in retrieval accuracy, efficiency, and memory consumption than the state-of-the-art.

  11. When Gravity Fails: Local Search Topology

    Science.gov (United States)

    Frank, Jeremy; Cheeseman, Peter; Stutz, John; Lau, Sonie (Technical Monitor)

    1997-01-01

    Local search algorithms for combinatorial search problems frequently encounter a sequence of states in which it is impossible to improve the value of the objective function; moves through these regions, called {\\em plateau moves), dominate the time spent in local search. We analyze and characterize {\\em plateaus) for three different classes of randomly generated Boolean Satisfiability problems. We identify several interesting features of plateaus that impact the performance of local search algorithms. We show that local minima tend to be small but occasionally may be very large. We also show that local minima can be escaped without unsatisfying a large number of clauses, but that systematically searching for an escape route may be computationally expensive if the local minimum is large. We show that plateaus with exits, called benches, tend to be much larger than minima, and that some benches have very few exit states which local search can use to escape. We show that the solutions (i.e. global minima) of randomly generated problem instances form clusters, which behave similarly to local minima. We revisit several enhancements of local search algorithms and explain their performance in light of our results. Finally we discuss strategies for creating the next generation of local search algorithms.

  12. Mechanisms for similarity matching in disparity measurement

    Directory of Open Access Journals (Sweden)

    Ross eGoutcher

    2014-01-01

    Full Text Available Early neural mechanisms for the measurement of binocular disparity appear to operate in a manner consistent with cross-correlation-like processes. Consequently, cross-correlation, or cross-correlation-like procedures have been used in a range of models of disparity measurement. Using such procedures as the basis for disparity measurement creates a preference for correspondence solutions that maximise the similarity between local left and right eye image regions. Here, we examine how observers’ perception of depth in an ambiguous stereogram is affected by manipulations of luminance and orientation-based image similarity. Results show a strong effect of coarse-scale luminance similarity manipulations, but a relatively weak effect of finer-scale manipulations of orientation similarity. This is in contrast to the measurements of depth obtained from a standard cross-correlation model. This model shows strong effects of orientation similarity manipulations and weaker effects of luminance similarity. In order to account for these discrepancies, the standard cross-correlation approach may be modified to include an initial spatial frequency filtering stage. The performance of this adjusted model most closely matches human psychophysical data when spatial frequency filtering favours coarser scales. This is consistent with the operation of disparity measurement processes where spatial frequency and disparity tuning are correlated, or where disparity measurement operates in a coarse-to-fine manner.

  13. Quantifying the Search Behaviour of Different Demographics Using Google Correlate.

    Science.gov (United States)

    Letchford, Adrian; Preis, Tobias; Moat, Helen Susannah

    2016-01-01

    Vast records of our everyday interests and concerns are being generated by our frequent interactions with the Internet. Here, we investigate how the searches of Google users vary across U.S. states with different birth rates and infant mortality rates. We find that users in states with higher birth rates search for more information about pregnancy, while those in states with lower birth rates search for more information about cats. Similarly, we find that users in states with higher infant mortality rates search for more information about credit, loans and diseases. Our results provide evidence that Internet search data could offer new insight into the concerns of different demographics. PMID:26910464

  14. Which fast nearest neighbour search algorithm to use?

    OpenAIRE

    Serrano Díaz-Carrasco, Aureo; Micó Andrés, Luisa; Oncina Carratalá, Jose

    2013-01-01

    Choosing which fast Nearest Neighbour search algorithm to use depends on the task we face. Usually kd-tree search algorithm is selected when the similarity function is the Euclidean or the Manhattan distances. Generic fast search algorithms (algorithms that works with any distance function) are only used when there is not specific fast search algorithms for the involved distance function. In this work we show that in real data problems generic search algorithms (i.e. MDF-tree) can be faster t...

  15. Integrated vs. Federated Search

    DEFF Research Database (Denmark)

    Løvschall, Kasper

    2009-01-01

    Oplæg om forskelle og ligheder mellem integrated og federated search i bibliotekskontekst. Holdt ved temadag om "Integrated Search - samsøgning i alle kilder" på Danmarks Biblioteksskole den 22. januar 2009.......Oplæg om forskelle og ligheder mellem integrated og federated search i bibliotekskontekst. Holdt ved temadag om "Integrated Search - samsøgning i alle kilder" på Danmarks Biblioteksskole den 22. januar 2009....

  16. Searching Databases with Keywords

    Institute of Scientific and Technical Information of China (English)

    Shan Wang; Kun-Long Zhang

    2005-01-01

    Traditionally, SQL query language is used to search the data in databases. However, it is inappropriate for end-users, since it is complex and hard to learn. It is the need of end-user, searching in databases with keywords, like in web search engines. This paper presents a survey of work on keyword search in databases. It also includes a brief introduction to the SEEKER system which has been developed.

  17. Search on Rugged Landscapes

    DEFF Research Database (Denmark)

    Billinger, Stephan; Stieglitz, Nils; Schumacher, Terry

    2014-01-01

    This paper presents findings from a laboratory experiment on human decision-making in a complex combinatorial task. We find strong evidence for a behavioral model of adaptive search. Success narrows down search to the neighborhood of the status quo, while failure promotes gradually more explorative...... for local improvements too early. We derive stylized decision rules that generate the search behavior observed in the experiment and discuss the implications of our findings for individual decision-making and organizational search....

  18. Updating collection representations for federated search

    OpenAIRE

    Shokouhi, M; Baillie, M; Azzopardi, L.

    2007-01-01

    To facilitate the search for relevant information across a set of online distributed collections, a federated information retrieval system typically represents each collection, centrally, by a set of vocabularies or sampled documents. Accurate retrieval is therefore related to how precise each representation reflects the underlying content stored in that collection. As collections evolve over time, collection representations should also be updated to reflect any change, however, a current sol...

  19. The Information Search

    Science.gov (United States)

    Doraiswamy, Uma

    2011-01-01

    This paper in the form of story discusses a college student's information search process. In this story we see Kuhlthau's information search process: initiation, selection, exploration, formulation, collection, and presentation. Katie is a student who goes in search of information for her class research paper. Katie's class readings, her interest…

  20. Search and the city

    NARCIS (Netherlands)

    P.A. Gautier; C.N. Teulings

    2009-01-01

    We develop a model of an economy with several regions, which differ in scale. Within each region, workers have to search for a job-type that matches their skill. They face a trade-off between match quality and the cost of extended search. This trade-off differs between regions, because search is mor

  1. On the similarity of symbol frequency distributions with heavy tails

    CERN Document Server

    Gerlach, Martin; Altmann, Eduardo G

    2015-01-01

    Quantifying the similarity between symbolic sequences is a traditional problem in Information Theory which requires comparing the frequencies of symbols in different sequences. In numerous modern applications, ranging from DNA over music to texts, the distribution of symbol frequencies is characterized by heavy-tailed distributions (e.g., Zipf's law). The large number of low-frequency symbols in these distributions poses major difficulties to the estimation of the similarity between sequences, e.g., they hinder an accurate finite-size estimation of entropies. Here we show how the accuracy of estimations depend on the sample size~$N$, not only for the Shannon entropy $(\\alpha=1)$ and its corresponding similarity measures (e.g., the Jensen-Shanon divergence) but also for measures based on the generalized entropy of order $\\alpha$. For small $\\alpha$'s, including $\\alpha=1$, the bias and fluctuations in the estimations decay slower than the $1/N$ decay observed in short-tailed distributions. For $\\alpha$ larger ...

  2. Similarity of Symbol Frequency Distributions with Heavy Tails

    Science.gov (United States)

    Gerlach, Martin; Font-Clos, Francesc; Altmann, Eduardo G.

    2016-04-01

    Quantifying the similarity between symbolic sequences is a traditional problem in information theory which requires comparing the frequencies of symbols in different sequences. In numerous modern applications, ranging from DNA over music to texts, the distribution of symbol frequencies is characterized by heavy-tailed distributions (e.g., Zipf's law). The large number of low-frequency symbols in these distributions poses major difficulties to the estimation of the similarity between sequences; e.g., they hinder an accurate finite-size estimation of entropies. Here, we show analytically how the systematic (bias) and statistical (fluctuations) errors in these estimations depend on the sample size N and on the exponent γ of the heavy-tailed distribution. Our results are valid for the Shannon entropy (α =1 ), its corresponding similarity measures (e.g., the Jensen-Shanon divergence), and also for measures based on the generalized entropy of order α . For small α 's, including α =1 , the errors decay slower than the 1 /N decay observed in short-tailed distributions. For α larger than a critical value α*=1 +1 /γ ≤2 , the 1 /N decay is recovered. We show the practical significance of our results by quantifying the evolution of the English language over the last two centuries using a complete α spectrum of measures. We find that frequent words change more slowly than less frequent words and that α =2 provides the most robust measure to quantify language change.

  3. Similar Words Identification Using Naive and TF-IDF Method

    Directory of Open Access Journals (Sweden)

    Divya K.S.

    2014-10-01

    Full Text Available Requirement satisfaction is one of the most important factors to success of software. All the requirements that are specified by the customer should be satisfied in every phase of the development of the software. Satisfaction assessment is the determination of whether each component of the requirement has been addressed in the design document. The objective of this paper is to implement two methods to identify the satisfied requirements in the design document. To identify the satisfied requirements, similar words in both of the documents are determined. The methods such as Naive satisfaction assessment and TF-IDF satisfaction assessment are performed to determine the similar words that are present in the requirements document and design documents. The two methods are evaluated on the basis of the precision and recall value. To perform the stemming, the Porter’s stemming algorithm is used. The satisfaction assessment methods would determine the similarity in the requirement and design documents. The final result would give a accurate picture of the requirement satisfaction so that the defects can be determined at the early stage of software development. Since the defects determines at the early stage, the cost would be low to correct the defects.

  4. Multi Agent Architecture for Search Engine

    Directory of Open Access Journals (Sweden)

    Disha Verma

    2016-03-01

    Full Text Available The process of retrieving information is becoming ambiguous day by day due to huge collection of documents present on web. A single keyword produces millions of results related to given query but these results are not up to user expectations. The search results produced from traditional text search engines may be relevant or irrelevant. The underlying reason is Web documents are HTML documents that do not contain semantic descriptors and annotations. The paper proposes multi agent architecture to produce fewer but personalized results. The purpose of the research is to provide platform for domain specific personalized search. Personalized search allows delivering web pages in accordance with user’s interest and domain. The proposed architecture uses client side as well server side personalization to provide user with personalized fever but more accurate results. Multi agent search engine architecture uses the concept of semantic descriptors for acquiring knowledge about given domain and leading to personalized search results. Semantic descriptors are represented as network graph that holds relationship between given problem in form of hierarchy. This hierarchical classification is termed as Taxonomy.

  5. Personalized online information search and visualization

    Directory of Open Access Journals (Sweden)

    Orthner Helmuth F

    2005-03-01

    Full Text Available Abstract Background The rapid growth of online publications such as the Medline and other sources raises the questions how to get the relevant information efficiently. It is important, for a bench scientist, e.g., to monitor related publications constantly. It is also important, for a clinician, e.g., to access the patient records anywhere and anytime. Although time-consuming, this kind of searching procedure is usually similar and simple. Likely, it involves a search engine and a visualization interface. Different words or combination reflects different research topics. The objective of this study is to automate this tedious procedure by recording those words/terms in a database and online sources, and use the information for an automated search and retrieval. The retrieved information will be available anytime and anywhere through a secure web server. Results We developed such a database that stored searching terms, journals and et al., and implement a piece of software for searching the medical subject heading-indexed sources such as the Medline and other online sources automatically. The returned information were stored locally, as is, on a server and visible through a Web-based interface. The search was performed daily or otherwise scheduled and the users logon to the website anytime without typing any words. The system has potentials to retrieve similarly from non-medical subject heading-indexed literature or a privileged information source such as a clinical information system. The issues such as security, presentation and visualization of the retrieved information were thus addressed. One of the presentation issues such as wireless access was also experimented. A user survey showed that the personalized online searches saved time and increased and relevancy. Handheld devices could also be used to access the stored information but less satisfactory. Conclusion The Web-searching software or similar system has potential to be an efficient

  6. Identifying mechanistic similarities in drug responses

    KAUST Repository

    Zhao, C.

    2012-05-15

    Motivation: In early drug development, it would be beneficial to be able to identify those dynamic patterns of gene response that indicate that drugs targeting a particular gene will be likely or not to elicit the desired response. One approach would be to quantitate the degree of similarity between the responses that cells show when exposed to drugs, so that consistencies in the regulation of cellular response processes that produce success or failure can be more readily identified.Results: We track drug response using fluorescent proteins as transcription activity reporters. Our basic assumption is that drugs inducing very similar alteration in transcriptional regulation will produce similar temporal trajectories on many of the reporter proteins and hence be identified as having similarities in their mechanisms of action (MOA). The main body of this work is devoted to characterizing similarity in temporal trajectories/signals. To do so, we must first identify the key points that determine mechanistic similarity between two drug responses. Directly comparing points on the two signals is unrealistic, as it cannot handle delays and speed variations on the time axis. Hence, to capture the similarities between reporter responses, we develop an alignment algorithm that is robust to noise, time delays and is able to find all the contiguous parts of signals centered about a core alignment (reflecting a core mechanism in drug response). Applying the proposed algorithm to a range of real drug experiments shows that the result agrees well with the prior drug MOA knowledge. © The Author 2012. Published by Oxford University Press. All rights reserved.

  7. How utilities can achieve more accurate decommissioning cost estimates

    International Nuclear Information System (INIS)

    The number of commercial nuclear power plants that are undergoing decommissioning coupled with the economic pressure of deregulation has increased the focus on adequate funding for decommissioning. The introduction of spent-fuel storage and disposal of low-level radioactive waste into the cost analysis places even greater concern as to the accuracy of the fund calculation basis. The size and adequacy of the decommissioning fund have also played a major part in the negotiations for transfer of plant ownership. For all of these reasons, it is important that the operating plant owner reduce the margin of error in the preparation of decommissioning cost estimates. To data, all of these estimates have been prepared via the building block method. That is, numerous individual calculations defining the planning, engineering, removal, and disposal of plant systems and structures are performed. These activity costs are supplemented by the period-dependent costs reflecting the administration, control, licensing, and permitting of the program. This method will continue to be used in the foreseeable future until adequate performance data are available. The accuracy of the activity cost calculation is directly related to the accuracy of the inventory of plant system component, piping and equipment, and plant structural composition. Typically, it is left up to the cost-estimating contractor to develop this plant inventory. The data are generated by searching and analyzing property asset records, plant databases, piping and instrumentation drawings, piping system isometric drawings, and component assembly drawings. However, experience has shown that these sources may not be up to date, discrepancies may exist, there may be missing data, and the level of detail may not be sufficient. Again, typically, the time constraints associated with the development of the cost estimate preclude perfect resolution of the inventory questions. Another problem area in achieving accurate cost

  8. Where and how to search? Search paths in open innovation

    OpenAIRE

    Lopez-Vega, Henry; Tell, Fredrik; VANHAVERBEKE, Wim

    2015-01-01

    Search for external knowledge is vital for firms' innovative activities. To understand search, we propose two knowledge search dimensions: search space (local or distant) and search heuristics (experiential or cognitive). Combining these two dimensions, we distinguish four search paths - situated paths, analogical paths, sophisticated paths, and scientific paths - which respond to recent calls to move beyond "where to search" and to investigate the connection with "how to search." Also, we hi...

  9. Similarity Metrics for Closed Loop Dynamic Systems

    Science.gov (United States)

    Whorton, Mark S.; Yang, Lee C.; Bedrossian, Naz; Hall, Robert A.

    2008-01-01

    To what extent and in what ways can two closed-loop dynamic systems be said to be "similar?" This question arises in a wide range of dynamic systems modeling and control system design applications. For example, bounds on error models are fundamental to the controller optimization with modern control design methods. Metrics such as the structured singular value are direct measures of the degree to which properties such as stability or performance are maintained in the presence of specified uncertainties or variations in the plant model. Similarly, controls-related areas such as system identification, model reduction, and experimental model validation employ measures of similarity between multiple realizations of a dynamic system. Each area has its tools and approaches, with each tool more or less suited for one application or the other. Similarity in the context of closed-loop model validation via flight test is subtly different from error measures in the typical controls oriented application. Whereas similarity in a robust control context relates to plant variation and the attendant affect on stability and performance, in this context similarity metrics are sought that assess the relevance of a dynamic system test for the purpose of validating the stability and performance of a "similar" dynamic system. Similarity in the context of system identification is much more relevant than are robust control analogies in that errors between one dynamic system (the test article) and another (the nominal "design" model) are sought for the purpose of bounding the validity of a model for control design and analysis. Yet system identification typically involves open-loop plant models which are independent of the control system (with the exception of limited developments in closed-loop system identification which is nonetheless focused on obtaining open-loop plant models from closed-loop data). Moreover the objectives of system identification are not the same as a flight test and

  10. Accurate Jones Matrix of the Practical Faraday Rotator

    Institute of Scientific and Technical Information of China (English)

    王林斗; 祝昇翔; 李玉峰; 邢文烈; 魏景芝

    2003-01-01

    The Jones matrix of practical Faraday rotators is often used in the engineering calculation of non-reciprocal optical field. Nevertheless, only the approximate Jones matrix of practical Faraday rotators has been presented by now. Based on the theory of polarized light, this paper presents the accurate Jones matrix of practical Faraday rotators. In addition, an experiment has been carried out to verify the validity of the accurate Jones matrix. This matrix accurately describes the optical characteristics of practical Faraday rotators, including rotation, loss and depolarization of the polarized light. The accurate Jones matrix can be used to obtain the accurate results for the practical Faraday rotator to transform the polarized light, which paves the way for the accurate analysis and calculation of practical Faraday rotators in relevant engineering applications.

  11. The theory of similarity in turning operations

    Directory of Open Access Journals (Sweden)

    R. Rząsiński

    2012-07-01

    Full Text Available Purpose: The article presents the development of series of types of technology issues. This is an accomplished using of the innovative technological similarity theory. The transformation presented in the theory relates to the turning machining processes.Design/methodology/approach: The data generation process is concerned with the creation of the conditions and number similarities. The turning condition of similarity results from the cutting power, cutting forces and cutting performance.Findings: The development of the theory of similarity allows the generation of machining parameters for the series of types of construction (blank, machining parameters, tools.Research limitations/implications: The analyzed methods develop the algorithmisation of engineers and technologists environment and support the integration with the process of preparation of the production.Practical implications: The described methods were being developed on the practical examples of the creating of the series of types of the hydraulic cylinders used in mining.Originality/value: The method of the technological similarity presented in the paper is the basis of the selection of technological features in the process of series of types and module systems of constructions and technology creating.

  12. Image fusion using bi-directional similarity

    Science.gov (United States)

    Bai, Chunshan; Luo, Xiaoyan

    2015-05-01

    Infrared images are widely used in the practical applications to capture abundant information. However, it is still challenging to enhance the infrared image by the visual image. In this paper, we propose an effective method using bidirectional similarity. In the proposed method, we aim to find an optimal solution from many feasible solutions without introducing intermediate image. We employ some priori constraints to meet the requirements of image fusion which can be detailed to preserve both good characteristics in the infrared image and spatial information in the visual image. In the iterative step, we use the matrix with the square of the difference between images to integrate the image holding most information. We call this matrix the bidirectional similarity distance. By the bidirectional similarity distance, we can get the transitive images. Then, we fuse the images according to the weight. Experimental results show that, compared to the traditional image fusion algorithm, fusion images from bidirectional similarity fusion algorithm have greatly improved in the subjective vision, entropy, structural similarity index measurement. We believe that the proposed scheme can have a wide applications.

  13. The prediction method of similar cycles

    Institute of Scientific and Technical Information of China (English)

    Zhan-Le Du; Hua-Ning Wang

    2011-01-01

    The concept of degree of similarity (η),is proposed to quantitatively describe the similarity of a parameter (e.g.the maximum amplitude Rmax) of a solar cycle relative to a referenced one,and the prediction method of similar cycles is further developed.For two parameters,the solar minimum (Rmin) and rising rate (βa),which can be directly measured a few months after the minimum,a synthesis degree of similarity (ηs) is defined as the weighted-average of the η values around Rmin and βa,with the weights given by the coefficients of determination ofRmax withRmin and βa,respectively.The monthly values of the whole referenced cycle can be predicted by averaging the corresponding values in the most similar cycles with the weights givenby the ηs values.As an application,Cycle 24 is predicted to peak around January 2013 ±8 (month) with a size of about Rmax =84 + 17 and to end around September 2019.

  14. Searching for Displaced Higgs Decays

    CERN Document Server

    Csaki, Csaba; Lombardo, Salvator; Slone, Oren

    2015-01-01

    We study a simplified model of the SM Higgs boson decaying to a degenerate pair of scalars which travel a macroscopic distance before decaying to SM particles. This is the leading signal for many well-motivated solutions to the hierarchy problem that do not propose additional light colored particles. Bounds for displaced Higgs decays below $10$ cm are found by recasting existing tracker searches from Run I. New tracker search strategies, sensitive to the characteristics of these models and similar decays, are proposed with sensitivities projected for Run II at $\\sqrt{s} = 13 $ TeV. With 20 fb$^{-1}$ of data, we find that Higgs branching ratios down to $7 \\times 10^{-4}$ can be probed for centimeter decay lengths.

  15. Location-based Services using Image Search

    DEFF Research Database (Denmark)

    Vertongen, Pieter-Paulus; Hansen, Dan Witzner

    2008-01-01

    Recent developments in image search has made them sufficiently efficient to be used in real-time applications. GPS has become a popular navigation tool. While GPS information provide reasonably good accuracy, they are not always present in all hand held devices nor are they accurate in all...... situations, for example in urban environments. We propose a system to provide location-based services using image searches without requiring GPS. The goal of this system is to assist tourists in cities with additional information using their mobile phones and built-in cameras. Based upon the result...... of the image search engine and database image location knowledge, the location is determined of the query image and associated data can be presented to the user....

  16. Smart Images Search based on Visual Features Fusion

    International Nuclear Information System (INIS)

    Image search engines attempt to give fast and accurate access to the wide range of the huge amount images available on the Internet. There have been a number of efforts to build search engines based on the image content to enhance search results. Content-Based Image Retrieval (CBIR) systems have achieved a great interest since multimedia files, such as images and videos, have dramatically entered our lives throughout the last decade. CBIR allows automatically extracting target images according to objective visual contents of the image itself, for example its shapes, colors and textures to provide more accurate ranking of the results. The recent approaches of CBIR differ in terms of which image features are extracted to be used as image descriptors for matching process. This thesis proposes improvements of the efficiency and accuracy of CBIR systems by integrating different types of image features. This framework addresses efficient retrieval of images in large image collections. A comparative study between recent CBIR techniques is provided. According to this study; image features need to be integrated to provide more accurate description of image content and better image retrieval accuracy. In this context, this thesis presents new image retrieval approaches that provide more accurate retrieval accuracy than previous approaches. The first proposed image retrieval system uses color, texture and shape descriptors to form the global features vector. This approach integrates the ycbcr color histogram as a color descriptor, the modified Fourier descriptor as a shape descriptor and modified Edge Histogram as a texture descriptor in order to enhance the retrieval results. The second proposed approach integrates the global features vector, which is used in the first approach, with the SURF salient point technique as local feature. The nearest neighbor matching algorithm with a proposed similarity measure is applied to determine the final image rank. The second approach is

  17. University Students' Online Information Searching Strategies in Different Search Contexts

    Science.gov (United States)

    Tsai, Meng-Jung; Liang, Jyh-Chong; Hou, Huei-Tse; Tsai, Chin-Chung

    2012-01-01

    This study investigates the role of search context played in university students' online information searching strategies. A total of 304 university students in Taiwan were surveyed with questionnaires in which two search contexts were defined as searching for learning, and searching for daily life information. Students' online search strategies…

  18. [Advanced online search techniques and dedicated search engines for physicians].

    Science.gov (United States)

    Nahum, Yoav

    2008-02-01

    In recent years search engines have become an essential tool in the work of physicians. This article will review advanced search techniques from the world of information specialists, as well as some advanced search engine operators that may help physicians improve their online search capabilities, and maximize the yield of their searches. This article also reviews popular dedicated scientific and biomedical literature search engines.

  19. RECOGNITION OF STRUCTURE SIMILARITIES IN PROTEINS

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    Protein fold structure is more conserved than its amino acid sequence and closely associated with biological function,so calculating the similarity of protein structures is a fundamental problem in structural biology and plays a key role in protein fold classification,fold function inference,and protein structure prediction.Large progress has been made in recent years in this field and many methods for considering structural similarity have been proposed,including methods for protein structure compar-ison,retrieval of protein structures from databases,and ligand binding site comparison.Most of those methods can be available on the World Wide Web,but evaluation of all the methods is still a hard problem.This paper summarizes some popular methods and latest methods for structure similarities,including structure alignment,protein structure retrieval,and ligand binding site alignment.

  20. Universal self-similarity of propagating populations

    Science.gov (United States)

    Eliazar, Iddo; Klafter, Joseph

    2010-07-01

    This paper explores the universal self-similarity of propagating populations. The following general propagation model is considered: particles are randomly emitted from the origin of a d -dimensional Euclidean space and propagate randomly and independently of each other in space; all particles share a statistically common—yet arbitrary—motion pattern; each particle has its own random propagation parameters—emission epoch, motion frequency, and motion amplitude. The universally self-similar statistics of the particles’ displacements and first passage times (FPTs) are analyzed: statistics which are invariant with respect to the details of the displacement and FPT measurements and with respect to the particles’ underlying motion pattern. Analysis concludes that the universally self-similar statistics are governed by Poisson processes with power-law intensities and by the Fréchet and Weibull extreme-value laws.

  1. Unveiling Music Structure Via PLSA Similarity Fusion

    DEFF Research Database (Denmark)

    Arenas-García, Jerónimo; Meng, Anders; Petersen, Kaare Brandt;

    2007-01-01

    Nowadays there is an increasing interest in developing methods for building music recommendation systems. In order to get a satisfactory performance from such a system, one needs to incorporate as much information about songs similarity as possible; however, how to do so is not obvious. In this p......Nowadays there is an increasing interest in developing methods for building music recommendation systems. In order to get a satisfactory performance from such a system, one needs to incorporate as much information about songs similarity as possible; however, how to do so is not obvious...... observed similarities can be satisfactorily explained using the latent semantics. Additionally, this approach significantly simplifies the song retrieval phase, leading to a more practical system implementation. The suitability of the PLSA model for representing music structure is studied in a simplified...

  2. Structural similarity and category-specificity

    DEFF Research Database (Denmark)

    Gerlach, Christian; Law, Ian; Paulson, Olaf B

    2004-01-01

    It has been suggested that category-specific recognition disorders for natural objects may reflect that natural objects are more structurally (visually) similar than artefacts and therefore more difficult to recognize following brain damage. On this account one might expect a positive relationship...... between blood flow and structural similarity in areas involved in visual object recognition. Contrary to this expectation we report a negative relationship in that identification of articles of clothing cause more extensive activation than identification of vegetables/fruit and animals even though items...... from the categories of animals and vegetables/fruit are rated as more structurally similar than items from the category of articles of clothing. Given that this pattern cannot be explained in terms of a tradeoff between activation and accuracy, we interpret these findings within a model where...

  3. Visual Similarity Based Document Layout Analysis

    Institute of Scientific and Technical Information of China (English)

    Di Wen; Xiao-Qing Ding

    2006-01-01

    In this paper, a visual similarity based document layout analysis (DLA) scheme is proposed, which by using clustering strategy can adaptively deal with documents in different languages, with different layout structures and skew angles. Aiming at a robust and adaptive DLA approach, the authors first manage to find a set of representative filters and statistics to characterize typical texture patterns in document images, which is through a visual similarity testing process.Texture features are then extracted from these filters and passed into a dynamic clustering procedure, which is called visual similarity clustering. Finally, text contents are located from the clustered results. Benefit from this scheme, the algorithm demonstrates strong robustness and adaptability in a wide variety of documents, which previous traditional DLA approaches do not possess.

  4. Asymptotic expansion based equation of state for hard-disk fluids offering accurate virial coefficients

    CERN Document Server

    Tian, Jianxiang; Mulero, A

    2016-01-01

    Despite the fact that more that more than 30 analytical expressions for the equation of state of hard-disk fluids have been proposed in the literature, none of them is capable of reproducing the currently accepted numeric or estimated values for the first eighteen virial coefficients. Using the asymptotic expansion method, extended to the first ten virial coefficients for hard-disk fluids, fifty-seven new expressions for the equation of state have been studied. Of these, a new equation of state is selected which reproduces accurately all the first eighteen virial coefficients. Comparisons for the compressibility factor with computer simulations show that this new equation is as accurate as other similar expressions with the same number of parameters. Finally, the location of the poles of the 57 new equations shows that there are some particular configurations which could give both the accurate virial coefficients and the correct closest packing fraction in the future when higher virial coefficients than the t...

  5. PHOG analysis of self-similarity in aesthetic images

    Science.gov (United States)

    Amirshahi, Seyed Ali; Koch, Michael; Denzler, Joachim; Redies, Christoph

    2012-03-01

    In recent years, there have been efforts in defining the statistical properties of aesthetic photographs and artworks using computer vision techniques. However, it is still an open question how to distinguish aesthetic from non-aesthetic images with a high recognition rate. This is possibly because aesthetic perception is influenced also by a large number of cultural variables. Nevertheless, the search for statistical properties of aesthetic images has not been futile. For example, we have shown that the radially averaged power spectrum of monochrome artworks of Western and Eastern provenance falls off according to a power law with increasing spatial frequency (1/f2 characteristics). This finding implies that this particular subset of artworks possesses a Fourier power spectrum that is self-similar across different scales of spatial resolution. Other types of aesthetic images, such as cartoons, comics and mangas also display this type of self-similarity, as do photographs of complex natural scenes. Since the human visual system is adapted to encode images of natural scenes in a particular efficient way, we have argued that artists imitate these statistics in their artworks. In support of this notion, we presented results that artists portrait human faces with the self-similar Fourier statistics of complex natural scenes although real-world photographs of faces are not self-similar. In view of these previous findings, we investigated other statistical measures of self-similarity to characterize aesthetic and non-aesthetic images. In the present work, we propose a novel measure of self-similarity that is based on the Pyramid Histogram of Oriented Gradients (PHOG). For every image, we first calculate PHOG up to pyramid level 3. The similarity between the histograms of each section at a particular level is then calculated to the parent section at the previous level (or to the histogram at the ground level). The proposed approach is tested on datasets of aesthetic and

  6. Biomimetic Approach for Accurate, Real-Time Aerodynamic Coefficients Project

    Data.gov (United States)

    National Aeronautics and Space Administration — Aerodynamic and structural reliability and efficiency depends critically on the ability to accurately assess the aerodynamic loads and moments for each lifting...

  7. Large margin classification with indefinite similarities

    KAUST Repository

    Alabdulmohsin, Ibrahim

    2016-01-07

    Classification with indefinite similarities has attracted attention in the machine learning community. This is partly due to the fact that many similarity functions that arise in practice are not symmetric positive semidefinite, i.e. the Mercer condition is not satisfied, or the Mercer condition is difficult to verify. Examples of such indefinite similarities in machine learning applications are ample including, for instance, the BLAST similarity score between protein sequences, human-judged similarities between concepts and words, and the tangent distance or the shape matching distance in computer vision. Nevertheless, previous works on classification with indefinite similarities are not fully satisfactory. They have either introduced sources of inconsistency in handling past and future examples using kernel approximation, settled for local-minimum solutions using non-convex optimization, or produced non-sparse solutions by learning in Krein spaces. Despite the large volume of research devoted to this subject lately, we demonstrate in this paper how an old idea, namely the 1-norm support vector machine (SVM) proposed more than 15 years ago, has several advantages over more recent work. In particular, the 1-norm SVM method is conceptually simpler, which makes it easier to implement and maintain. It is competitive, if not superior to, all other methods in terms of predictive accuracy. Moreover, it produces solutions that are often sparser than more recent methods by several orders of magnitude. In addition, we provide various theoretical justifications by relating 1-norm SVM to well-established learning algorithms such as neural networks, SVM, and nearest neighbor classifiers. Finally, we conduct a thorough experimental evaluation, which reveals that the evidence in favor of 1-norm SVM is statistically significant.

  8. Large-Scale Multi-Resolution Representations for Accurate Interactive Image and Volume Operations

    KAUST Repository

    Sicat, Ronell B.

    2015-11-25

    and voxel footprints in input images and volumes. We show that the continuous pdfs encoded in the sparse pdf map representation enable accurate multi-resolution non-linear image operations on gigapixel images. Similarly, we show that sparse pdf volumes enable more consistent multi-resolution volume rendering compared to standard approaches, on both artificial and real world large-scale volumes. The supplementary videos demonstrate our results. In the standard approach, users heavily rely on panning and zooming interactions to navigate the data within the limits of their display devices. However, panning across the whole spatial domain and zooming across all resolution levels of large-scale images to search for interesting regions is not practical. Assisted exploration techniques allow users to quickly narrow down millions to billions of possible regions to a more manageable number for further inspection. However, existing approaches are not fully user-driven because they typically already prescribe what being of interest means. To address this, we introduce the patch sets representation for large-scale images. Patches inside a patch set are grouped and encoded according to similarity via a permutohedral lattice (p-lattice) in a user-defined feature space. Fast set operations on p-lattices facilitate patch set queries that enable users to describe what is interesting. In addition, we introduce an exploration framework—GigaPatchExplorer—for patch set-based image exploration. We show that patch sets in our framework are useful for a variety of user-driven exploration tasks in gigapixel images and whole collections thereof.

  9. Some more similarities between Peirce and Skinner.

    Science.gov (United States)

    Moxley, Roy A

    2002-01-01

    C. S. Peirce is noted for pioneering a variety of views, and the case is made here for the similarities and parallels between his views and B. F. Skinner's radical behaviorism. In addition to parallels previously noted, these similarities include an advancement of experimental science, a behavioral psychology, a shift from nominalism to realism, an opposition to positivism, a selectionist account for strengthening behavior, the importance of a community of selves, a recursive approach to method, and the probabilistic nature of truth. Questions are raised as to the extent to which Skinner's radical behaviorism, as distinguished from his S-R positivism, may be seen as an extension of Peirce's pragmatism. PMID:22478387

  10. Similarity joins in relational database systems

    CERN Document Server

    Augsten, Nikolaus

    2013-01-01

    State-of-the-art database systems manage and process a variety of complex objects, including strings and trees. For such objects equality comparisons are often not meaningful and must be replaced by similarity comparisons. This book describes the concepts and techniques to incorporate similarity into database systems. We start out by discussing the properties of strings and trees, and identify the edit distance as the de facto standard for comparing complex objects. Since the edit distance is computationally expensive, token-based distances have been introduced to speed up edit distance comput

  11. Self-Similarity Limits of Genomic Signatures

    CERN Document Server

    Wu, Z B

    2002-01-01

    It is shown that metric representation of DNA sequences is one-to-one. By using the metric representation method, suppression of nucleotide strings in the DNA sequences is determined. For a DNA sequence, an optimal string length to display genomic signature in chaos game representation is obtained by eliminating effects of the finite sequence. The optical string length is further shown as a self- similarity limit in computing information dimension. By using the method, self-similarity limits of bacteria complete genomic signatures are further determined.

  12. SIMILARITY SOLUTIONS FOR VISCOUS CAVITATING VORTEX CORES

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    Theoretical analysis of the dynamic behavior of cavitating vortices requires knowledge of the velocity distribution in the vortex core. For that reason an existing similarity model for slender axisymmetric non-cavitating vortex cores has been extended with the viscous boundary conditions for a cavitating vortex. Cavitating similarity solutions exist for a vortex of which both core diameter and circulation grow with the square root of the axial coordinate. Results for this model are presented for varying cavitation numbers and Reynolds numbers. It is shown that viscosity may have a significant influence on the velocity distribution around a cavitating vortex.

  13. Outer boundaries of self-similar tiles

    OpenAIRE

    Drenning, Shawn; Palagallo, Judith; Price, Thomas; Strichartz, Robert S.

    2005-01-01

    There are many examples of self-similar tiles that are connected, but whose interior is disconnected. For such tiles we show that the boundary of a component of the interior may be decomposed into a finite union of pieces, each similar to a subset of the outer boundary of the tile. This is significant because the outer boundary typically has lower dimension than the full boundary. We describe a method to realize the outer boundary as the invariant set of a graph-directed iterated function sys...

  14. The collagenous gastroenteritides: similarities and differences.

    Science.gov (United States)

    Gopal, Purva; McKenna, Barbara J

    2010-10-01

    Collagenous gastritis, collagenous sprue, and collagenous colitis share striking histologic similarities and occur together in some patients. They also share some drug and disease associations. Pediatric cases of collagenous gastritis, however, lack most of these associations. The etiologies of the collagenous gastroenteritides are not known, so it is not clear whether they are similar because they share pathogeneses, or because they indicate a common histologic response to varying injuries. The features, disease and drug associations, and the inquiries into the pathogenesis of these disorders are reviewed. PMID:20923305

  15. Collaborative Search Trails for Video Search

    CERN Document Server

    Hopfgartner, Frank; Halvey, Martin; Jose, Joemon

    2009-01-01

    In this paper we present an approach for supporting users in the difficult task of searching for video. We use collaborative feedback mined from the interactions of earlier users of a video search system to help users in their current search tasks. Our objective is to improve the quality of the results that users find, and in doing so also assist users to explore a large and complex information space. It is hoped that this will lead to them considering search options that they may not have considered otherwise. We performed a user centred evaluation. The results of our evaluation indicate that we achieved our goals, the performance of the users in finding relevant video clips was enhanced with our system; users were able to explore the collection of video clips more and users demonstrated a preference for our system that provided recommendations.

  16. Differences and similarities in breast cancer risk assessment models in clinical practice : which model to choose?

    NARCIS (Netherlands)

    Jacobi, Catharina E.; de Bock, Geertruida H.; Siegerink, Bob; van Asperen, Christi J.

    2009-01-01

    To show differences and similarities between risk estimation models for breast cancer in healthy women from BRCA1/2-negative or untested families. After a systematic literature search seven models were selected: Gail-2, Claus Model, Claus Tables, BOADICEA, Jonker Model, Claus-Extended Formula, and T

  17. Search Engine Optimization and Search Engine Marketing

    OpenAIRE

    Stárek, Pavel

    2012-01-01

    This thesis addresses website optimization for search engines and their promotion through PPC ads. The theoretical part describes all of the important aspects and principles necessary for a correct optimization and combines them with principles related to search engine marketing. The practical part describes the whole process of optimization from the analysis of competition and link-building to the set up and evaluation of PPC ads. It is trying to find any important differences between the tw...

  18. A new similarity computing method based on concept similarity in Chinese text processing

    Institute of Scientific and Technical Information of China (English)

    PENG Jing; YANG DongQing; TANG ShiWei; WANG TengJiao; GAO Jun

    2008-01-01

    The paper proposes a new text similarity computing method based on concept similarity in Chinese text processing. The new method converts text to words vec-tor space modet al first, and then splits words into a set of concepts. Through computing the inner products between concepts, it obtains the similarity between words. The new method computes the similarity of text based on the similarity of words at last. The contributions of the paper include: 1) propose a new computing formula between words; 2) propose a new text similarity computing method based on words similarity; 3) successfully use the method in the application of similarity computing of WEB news; and 4) prove the validity of the method through extensive experiments.

  19. Image Tracking for the High Similarity Drug Tablets Based on Light Intensity Reflective Energy and Artificial Neural Network

    Science.gov (United States)

    Liang, Zhongwei; Zhou, Liang; Liu, Xiaochu; Wang, Xiaogang

    2014-01-01

    It is obvious that tablet image tracking exerts a notable influence on the efficiency and reliability of high-speed drug mass production, and, simultaneously, it also emerges as a big difficult problem and targeted focus during production monitoring in recent years, due to the high similarity shape and random position distribution of those objectives to be searched for. For the purpose of tracking tablets accurately in random distribution, through using surface fitting approach and transitional vector determination, the calibrated surface of light intensity reflective energy can be established, describing the shape topology and topography details of objective tablet. On this basis, the mathematical properties of these established surfaces have been proposed, and thereafter artificial neural network (ANN) has been employed for classifying those moving targeted tablets by recognizing their different surface properties; therefore, the instantaneous coordinate positions of those drug tablets on one image frame can then be determined. By repeating identical pattern recognition on the next image frame, the real-time movements of objective tablet templates were successfully tracked in sequence. This paper provides reliable references and new research ideas for the real-time objective tracking in the case of drug production practices. PMID:25143781

  20. Image tracking for the high similarity drug tablets based on light intensity reflective energy and artificial neural network.

    Science.gov (United States)

    Liang, Zhongwei; Zhou, Liang; Liu, Xiaochu; Wang, Xiaogang

    2014-01-01

    It is obvious that tablet image tracking exerts a notable influence on the efficiency and reliability of high-speed drug mass production, and, simultaneously, it also emerges as a big difficult problem and targeted focus during production monitoring in recent years, due to the high similarity shape and random position distribution of those objectives to be searched for. For the purpose of tracking tablets accurately in random distribution, through using surface fitting approach and transitional vector determination, the calibrated surface of light intensity reflective energy can be established, describing the shape topology and topography details of objective tablet. On this basis, the mathematical properties of these established surfaces have been proposed, and thereafter artificial neural network (ANN) has been employed for classifying those moving targeted tablets by recognizing their different surface properties; therefore, the instantaneous coordinate positions of those drug tablets on one image frame can then be determined. By repeating identical pattern recognition on the next image frame, the real-time movements of objective tablet templates were successfully tracked in sequence. This paper provides reliable references and new research ideas for the real-time objective tracking in the case of drug production practices. PMID:25143781