WorldWideScience

Sample records for accurate similarity search

  1. Protein structural similarity search by Ramachandran codes

    Directory of Open Access Journals (Sweden)

    Chang Chih-Hung

    2007-08-01

    Full Text Available Abstract Background Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases. Results We propose a new linear encoding method, SARST (Structural similarity search Aided by Ramachandran Sequential Transformation. SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms. Conclusion As a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era.

  2. Hashing for Similarity Search: A Survey

    OpenAIRE

    Wang, Jingdong; Shen, Heng Tao; Song, Jingkuan; Ji, Jianqiu

    2014-01-01

    Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categorie...

  3. Similarity search processing. Paralelization and indexing technologies.

    Directory of Open Access Journals (Sweden)

    Eder Dos Santos

    2015-08-01

    The next Scientific-Technical Report addresses the similarity search and the implementation of metric structures on parallel environments. It also presents the state of the art related to similarity search on metric structures and parallelism technologies. Comparative analysis are also proposed, seeking to identify the behavior of a set of metric spaces and metric structures over processing platforms multicore-based and GPU-based.

  4. Biosequence Similarity Search on the Mercury System

    Science.gov (United States)

    Krishnamurthy, Praveen; Buhler, Jeremy; Chamberlain, Roger; Franklin, Mark; Gyang, Kwame; Jacob, Arpith; Lancaster, Joseph

    2007-01-01

    Biosequence similarity search is an important application in modern molecular biology. Search algorithms aim to identify sets of sequences whose extensional similarity suggests a common evolutionary origin or function. The most widely used similarity search tool for biosequences is BLAST, a program designed to compare query sequences to a database. Here, we present the design of BLASTN, the version of BLAST that searches DNA sequences, on the Mercury system, an architecture that supports high-volume, high-throughput data movement off a data store and into reconfigurable hardware. An important component of application deployment on the Mercury system is the functional decomposition of the application onto both the reconfigurable hardware and the traditional processor. Both the Mercury BLASTN application design and its performance analysis are described. PMID:18846267

  5. Fast similarity search for learned metrics.

    Science.gov (United States)

    Kulis, Brian; Jain, Prateek; Grauman, Kristen

    2009-12-01

    We introduce a method that enables scalable similarity search for learned metrics. Given pairwise similarity and dissimilarity constraints between some examples, we learn a Mahalanobis distance function that captures the examples' underlying relationships well. To allow sublinear time similarity search under the learned metric, we show how to encode the learned metric parameterization into randomized locality-sensitive hash functions. We further formulate an indirect solution that enables metric learning and hashing for vector spaces whose high dimensionality makes it infeasible to learn an explicit transformation over the feature dimensions. We demonstrate the approach applied to a variety of image data sets, as well as a systems data set. The learned metrics improve accuracy relative to commonly used metric baselines, while our hashing construction enables efficient indexing with learned distances and very large databases.

  6. New similarity search based glioma grading

    Energy Technology Data Exchange (ETDEWEB)

    Haegler, Katrin; Brueckmann, Hartmut; Linn, Jennifer [Ludwig-Maximilians-University of Munich, Department of Neuroradiology, Munich (Germany); Wiesmann, Martin; Freiherr, Jessica [RWTH Aachen University, Department of Neuroradiology, Aachen (Germany); Boehm, Christian [Ludwig-Maximilians-University of Munich, Department of Computer Science, Munich (Germany); Schnell, Oliver; Tonn, Joerg-Christian [Ludwig-Maximilians-University of Munich, Department of Neurosurgery, Munich (Germany)

    2012-08-15

    MR-based differentiation between low- and high-grade gliomas is predominately based on contrast-enhanced T1-weighted images (CE-T1w). However, functional MR sequences as perfusion- and diffusion-weighted sequences can provide additional information on tumor grade. Here, we tested the potential of a recently developed similarity search based method that integrates information of CE-T1w and perfusion maps for non-invasive MR-based glioma grading. We prospectively included 37 untreated glioma patients (23 grade I/II, 14 grade III gliomas), in whom 3T MRI with FLAIR, pre- and post-contrast T1-weighted, and perfusion sequences was performed. Cerebral blood volume, cerebral blood flow, and mean transit time maps as well as CE-T1w images were used as input for the similarity search. Data sets were preprocessed and converted to four-dimensional Gaussian Mixture Models that considered correlations between the different MR sequences. For each patient, a so-called tumor feature vector (= probability-based classifier) was defined and used for grading. Biopsy was used as gold standard, and similarity based grading was compared to grading solely based on CE-T1w. Accuracy, sensitivity, and specificity of pure CE-T1w based glioma grading were 64.9%, 78.6%, and 56.5%, respectively. Similarity search based tumor grading allowed differentiation between low-grade (I or II) and high-grade (III) gliomas with an accuracy, sensitivity, and specificity of 83.8%, 78.6%, and 87.0%. Our findings indicate that integration of perfusion parameters and CE-T1w information in a semi-automatic similarity search based analysis improves the potential of MR-based glioma grading compared to CE-T1w data alone. (orig.)

  7. Ultra-accurate collaborative information filtering via directed user similarity

    Science.gov (United States)

    Guo, Q.; Song, W.-J.; Liu, J.-G.

    2014-07-01

    A key challenge of the collaborative filtering (CF) information filtering is how to obtain the reliable and accurate results with the help of peers' recommendation. Since the similarities from small-degree users to large-degree users would be larger than the ones in opposite direction, the large-degree users' selections are recommended extensively by the traditional second-order CF algorithms. By considering the users' similarity direction and the second-order correlations to depress the influence of mainstream preferences, we present the directed second-order CF (HDCF) algorithm specifically to address the challenge of accuracy and diversity of the CF algorithm. The numerical results for two benchmark data sets, MovieLens and Netflix, show that the accuracy of the new algorithm outperforms the state-of-the-art CF algorithms. Comparing with the CF algorithm based on random walks proposed by Liu et al. (Int. J. Mod. Phys. C, 20 (2009) 285) the average ranking score could reach 0.0767 and 0.0402, which is enhanced by 27.3% and 19.1% for MovieLens and Netflix, respectively. In addition, the diversity, precision and recall are also enhanced greatly. Without relying on any context-specific information, tuning the similarity direction of CF algorithms could obtain accurate and diverse recommendations. This work suggests that the user similarity direction is an important factor to improve the personalized recommendation performance.

  8. Outsourced similarity search on metric data assets

    KAUST Repository

    Yiu, Man Lung

    2012-02-01

    This paper considers a cloud computing setting in which similarity querying of metric data is outsourced to a service provider. The data is to be revealed only to trusted users, not to the service provider or anyone else. Users query the server for the most similar data objects to a query example. Outsourcing offers the data owner scalability and a low-initial investment. The need for privacy may be due to the data being sensitive (e.g., in medicine), valuable (e.g., in astronomy), or otherwise confidential. Given this setting, the paper presents techniques that transform the data prior to supplying it to the service provider for similarity queries on the transformed data. Our techniques provide interesting trade-offs between query cost and accuracy. They are then further extended to offer an intuitive privacy guarantee. Empirical studies with real data demonstrate that the techniques are capable of offering privacy while enabling efficient and accurate processing of similarity queries.

  9. Similarity Search for Continuous Seismic Data

    Science.gov (United States)

    OReilly, O. J.; Yoon, C. E.; Beroza, G. C.

    2013-12-01

    Cross-correlation of seismic data streams with a priori known waveform templates has emerged as an effective tool to identify the occurrence of similar seismic signals; yet, this approach is difficult if the form of the templates is unknown. This challenge has been partially met by constructing waveform templates using reoccurring seismic signals sharing similar waveforms. This waveform similarity arises because the Earth's structure is essentially time invariant at the temporal scales considered in seismology. The problem of finding similar waveforms without known templates has been approached previously by segmenting incoming seismic data streams into multiple overlapping windows, each with a fixed length and lag, followed by matched filtering using each window as a template. An immediate shortcoming of this strategy is that it is computationally expensive; it scales quadratically with the number of lags needed, which limits the analysis to time series of short duration. The principal concept behind our approach, which enables scalable similarity search is to use a hierarchical approach to investigate only a small, near-constant sized subset of all possible waveform pairs for each query. In computer science and related fields, there are efficient techniques to solve this problem, which appears in numerous applications. Here, we bring these techniques into detection seismology. As a first step, we present a prototype database application that relies on a fingerprinting scheme that produces numerous high-dimensional sparse binary data representations of the windowed data streams (each fingerprint is significantly compressed compared to the actual window). These fingerprints encode key features of the actual window, enabling comparison among fingerprints rather than cross-correlating windows for comparison. Further dimensionality reduction is then applied to each fingerprint and similar fingerprints are grouped together using locality-sensitive hashing. Developing

  10. Earthquake detection through computationally efficient similarity search.

    Science.gov (United States)

    Yoon, Clara E; O'Reilly, Ossian; Bergen, Karianne J; Beroza, Gregory C

    2015-12-01

    Seismology is experiencing rapid growth in the quantity of data, which has outpaced the development of processing algorithms. Earthquake detection-identification of seismic events in continuous data-is a fundamental operation for observational seismology. We developed an efficient method to detect earthquakes using waveform similarity that overcomes the disadvantages of existing detection methods. Our method, called Fingerprint And Similarity Thresholding (FAST), can analyze a week of continuous seismic waveform data in less than 2 hours, or 140 times faster than autocorrelation. FAST adapts a data mining algorithm, originally designed to identify similar audio clips within large databases; it first creates compact "fingerprints" of waveforms by extracting key discriminative features, then groups similar fingerprints together within a database to facilitate fast, scalable search for similar fingerprint pairs, and finally generates a list of earthquake detections. FAST detected most (21 of 24) cataloged earthquakes and 68 uncataloged earthquakes in 1 week of continuous data from a station located near the Calaveras Fault in central California, achieving detection performance comparable to that of autocorrelation, with some additional false detections. FAST is expected to realize its full potential when applied to extremely long duration data sets over a distributed network of seismic stations. The widespread application of FAST has the potential to aid in the discovery of unexpected seismic signals, improve seismic monitoring, and promote a greater understanding of a variety of earthquake processes.

  11. Mining patents using molecular similarity search.

    Science.gov (United States)

    Rhodes, James; Boyer, Stephen; Kreulen, Jeffrey; Chen, Ying; Ordonez, Patricia

    2007-01-01

    Text analytics is becoming an increasingly important tool used in biomedical research. While advances continue to be made in the core algorithms for entity identification and relation extraction, a need for practical applications of these technologies arises. We developed a system that allows users to explore the US Patent corpus using molecular information. The core of our system contains three main technologies: A high performing chemical annotator which identifies chemical terms and converts them to structures, a similarity search engine based on the emerging IUPAC International Chemical Identifier (InChI) standard, and a set of on demand data mining tools. By leveraging this technology we were able to rapidly identify and index 3,623,248 unique chemical structures from 4,375,036 US Patents and Patent Applications. Using this system a user may go to a web page, draw a molecule, search for related Intellectual Property (IP) and analyze the results. Our results prove that this is a far more effective way for identifying IP than traditional keyword based approaches.

  12. Optimal neighborhood indexing for protein similarity search.

    Science.gov (United States)

    Peterlongo, Pierre; Noé, Laurent; Lavenier, Dominique; Nguyen, Van Hoa; Kucherov, Gregory; Giraud, Mathieu

    2008-12-16

    Similarity inference, one of the main bioinformatics tasks, has to face an exponential growth of the biological data. A classical approach used to cope with this data flow involves heuristics with large seed indexes. In order to speed up this technique, the index can be enhanced by storing additional information to limit the number of random memory accesses. However, this improvement leads to a larger index that may become a bottleneck. In the case of protein similarity search, we propose to decrease the index size by reducing the amino acid alphabet. The paper presents two main contributions. First, we show that an optimal neighborhood indexing combining an alphabet reduction and a longer neighborhood leads to a reduction of 35% of memory involved into the process, without sacrificing the quality of results nor the computational time. Second, our approach led us to develop a new kind of substitution score matrices and their associated e-value parameters. In contrast to usual matrices, these matrices are rectangular since they compare amino acid groups from different alphabets. We describe the method used for computing those matrices and we provide some typical examples that can be used in such comparisons. Supplementary data can be found on the website http://bioinfo.lifl.fr/reblosum. We propose a practical index size reduction of the neighborhood data, that does not negatively affect the performance of large-scale search in protein sequences. Such an index can be used in any study involving large protein data. Moreover, rectangular substitution score matrices and their associated statistical parameters can have applications in any study involving an alphabet reduction.

  13. Outsourced similarity search on metric data assets

    DEFF Research Database (Denmark)

    Yiu, Man Lung; Assent, Ira; Jensen, Christian Søndergaard

    2012-01-01

    This paper considers a cloud computing setting in which similarity querying of metric data is outsourced to a service provider. The data is to be revealed only to trusted users, not to the service provider or anyone else. Users query the server for the most similar data objects to a query example...

  14. A Similarity Search Using Molecular Topological Graphs

    Directory of Open Access Journals (Sweden)

    Yoshifumi Fukunishi

    2009-01-01

    Full Text Available A molecular similarity measure has been developed using molecular topological graphs and atomic partial charges. Two kinds of topological graphs were used. One is the ordinary adjacency matrix and the other is a matrix which represents the minimum path length between two atoms of the molecule. The ordinary adjacency matrix is suitable to compare the local structures of molecules such as functional groups, and the other matrix is suitable to compare the global structures of molecules. The combination of these two matrices gave a similarity measure. This method was applied to in silico drug screening, and the results showed that it was effective as a similarity measure.

  15. Adaptive and Dynamic Pivot Selection for Similarity Search

    OpenAIRE

    Salvetti, Mariano; Deco, Claudia; Reyes, Nora; Bender, Cristina

    2011-01-01

    In this paper, a new indexing and similarity search method based on dynamic selection of pivots is presented. It uses Sparse Spatial Selection (SSS) for the initial selection of pivots. Two new selection policies of pivots are added, in order to the index suits itself to searches when it adapts to the metric space. The proposed structure automatically adjusts to the region where most of searches are made. In this way, the amount of distance computations during searches is reduced. The adjustm...

  16. Modeling Tanimoto Similarity Value Distributions and Predicting Search Results.

    Science.gov (United States)

    Vogt, Martin; Bajorath, Jürgen

    2017-07-01

    Similarity searching using molecular fingerprints has a long history in chemoinformatics and continues to be a popular approach for virtual screening. Typically, known active reference molecules are used to search databases for new active compounds. However, this search has black box character because similarity value distributions are dependent on fingerprints and compound classes. Consequently, no generally applicable similarity threshold values are available as reliable indicators of activity relationships between reference and database compounds. Therefore, it is generally uncertain where new active compounds might appear in database rankings, if at all. In this contribution, methods are discussed for modeling similarity value distributions of fingerprint search calculations using Tanimoto coefficients and estimating rank positions of active compounds. To our knowledge, these are the first approaches for predicting the results of fingerprint-based similarity searching. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  17. Fast and accurate protein substructure searching with simulated annealing and GPUs

    Directory of Open Access Journals (Sweden)

    Stivala Alex D

    2010-09-01

    Full Text Available Abstract Background Searching a database of protein structures for matches to a query structure, or occurrences of a structural motif, is an important task in structural biology and bioinformatics. While there are many existing methods for structural similarity searching, faster and more accurate approaches are still required, and few current methods are capable of substructure (motif searching. Results We developed an improved heuristic for tableau-based protein structure and substructure searching using simulated annealing, that is as fast or faster and comparable in accuracy, with some widely used existing methods. Furthermore, we created a parallel implementation on a modern graphics processing unit (GPU. Conclusions The GPU implementation achieves up to 34 times speedup over the CPU implementation of tableau-based structure search with simulated annealing, making it one of the fastest available methods. To the best of our knowledge, this is the first application of a GPU to the protein structural search problem.

  18. Exact Score Distribution Computation for Similarity Searches in Ontologies

    Science.gov (United States)

    Schulz, Marcel H.; Köhler, Sebastian; Bauer, Sebastian; Vingron, Martin; Robinson, Peter N.

    Semantic similarity searches in ontologies are an important component of many bioinformatic algorithms, e.g., protein function prediction with the Gene Ontology. In this paper we consider the exact computation of score distributions for similarity searches in ontologies, and introduce a simple null hypothesis which can be used to compute a P-value for the statistical significance of similarity scores. We concentrate on measures based on Resnik’s definition of ontological similarity. A new algorithm is proposed that collapses subgraphs of the ontology graph and thereby allows fast score distribution computation. The new algorithm is several orders of magnitude faster than the naive approach, as we demonstrate by computing score distributions for similarity searches in the Human Phenotype Ontology.

  19. Margin-Based Pivot Selection for Similarity Search Indexes

    Science.gov (United States)

    Kurasawa, Hisashi; Fukagawa, Daiji; Takasu, Atsuhiro; Adachi, Jun

    When developing an index for a similarity search in metric spaces, how to divide the space for effective search pruning is a fundamental issue. We present Maximal Metric Margin Partitioning (MMMP), a partitioning scheme for similarity search indexes. MMMP divides the data based on its distribution pattern, especially for the boundaries of clusters. A partitioning boundary created by MMMP is likely to be located in a sparse area between clusters. Moreover, the partitioning boundary is at maximum distances from the two cluster edges. We also present an indexing scheme, named the MMMP-Index, which uses MMMP and pivot filtering. The MMMP-Index can prune many objects that are not relevant to a query, and it reduces the query execution cost. Our experimental results show that MMMP effectively indexes clustered data and reduces the search cost. For clustered data in a vector space, the MMMP-Index reduces the computational cost to less than two thirds that of comparable schemes.

  20. The breakfast effect: dogs (Canis familiaris) search more accurately when they are less hungry.

    Science.gov (United States)

    Miller, Holly C; Bender, Charlotte

    2012-11-01

    We investigated whether the consumption of a morning meal (breakfast) by dogs (Canis familiaris) would affect search accuracy on a working memory task following the exertion of self-control. Dogs were tested either 30 or 90 min after consuming half of their daily resting energy requirements (RER). During testing dogs were initially required to sit still for 10 min before searching for hidden food in a visible displacement task. We found that 30 min following the consumption of breakfast, and 10 min after the behavioral inhibition task, dogs searched more accurately than they did in a fasted state. Similar differences were not observed when dogs were tested 90 min after meal consumption. This pattern of behavior suggests that breakfast enhanced search accuracy following a behavioral inhibition task by providing energy for cognitive processes, and that search accuracy decreased as a function of energy depletion. Copyright © 2012 Elsevier B.V. All rights reserved.

  1. Image similarity search using a negative selection algorithm

    NARCIS (Netherlands)

    Keijzers, S.; Maandag, P.; Marchiori, E.; Sprinkhuizen-Kuyper, I.G.

    2013-01-01

    The Negative Selection Algorithm is an immune inspired algorithm that can be used for different purposes such as fault detection, data integrity protection and virus detection. In this paper we show how the Negative Selection Algorithm can be adapted to tackle the similar image search problem: given

  2. How Google Web Search copes with very similar documents

    NARCIS (Netherlands)

    W. Mettrop (Wouter); P. Nieuwenhuysen; H. Smulders

    2006-01-01

    textabstractA significant portion of the computer files that carry documents, multimedia, programs etc. on the Web are identical or very similar to other files on the Web. How do search engines cope with this? Do they perform some kind of “deduplication”? How should users take into account that

  3. RAPSearch: a fast protein similarity search tool for short reads

    Directory of Open Access Journals (Sweden)

    Choi Jeong-Hyeon

    2011-05-01

    Full Text Available Abstract Background Next Generation Sequencing (NGS is producing enormous corpuses of short DNA reads, affecting emerging fields like metagenomics. Protein similarity search--a key step to achieve annotation of protein-coding genes in these short reads, and identification of their biological functions--faces daunting challenges because of the very sizes of the short read datasets. Results We developed a fast protein similarity search tool RAPSearch that utilizes a reduced amino acid alphabet and suffix array to detect seeds of flexible length. For short reads (translated in 6 frames we tested, RAPSearch achieved ~20-90 times speedup as compared to BLASTX. RAPSearch missed only a small fraction (~1.3-3.2% of BLASTX similarity hits, but it also discovered additional homologous proteins (~0.3-2.1% that BLASTX missed. By contrast, BLAT, a tool that is even slightly faster than RAPSearch, had significant loss of sensitivity as compared to RAPSearch and BLAST. Conclusions RAPSearch is implemented as open-source software and is accessible at http://omics.informatics.indiana.edu/mg/RAPSearch. It enables faster protein similarity search. The application of RAPSearch in metageomics has also been demonstrated.

  4. Robust hashing with local models for approximate similarity search.

    Science.gov (United States)

    Song, Jingkuan; Yang, Yi; Li, Xuelong; Huang, Zi; Yang, Yang

    2014-07-01

    Similarity search plays an important role in many applications involving high-dimensional data. Due to the known dimensionality curse, the performance of most existing indexing structures degrades quickly as the feature dimensionality increases. Hashing methods, such as locality sensitive hashing (LSH) and its variants, have been widely used to achieve fast approximate similarity search by trading search quality for efficiency. However, most existing hashing methods make use of randomized algorithms to generate hash codes without considering the specific structural information in the data. In this paper, we propose a novel hashing method, namely, robust hashing with local models (RHLM), which learns a set of robust hash functions to map the high-dimensional data points into binary hash codes by effectively utilizing local structural information. In RHLM, for each individual data point in the training dataset, a local hashing model is learned and used to predict the hash codes of its neighboring data points. The local models from all the data points are globally aligned so that an optimal hash code can be assigned to each data point. After obtaining the hash codes of all the training data points, we design a robust method by employing l2,1 -norm minimization on the loss function to learn effective hash functions, which are then used to map each database point into its hash code. Given a query data point, the search process first maps it into the query hash code by the hash functions and then explores the buckets, which have similar hash codes to the query hash code. Extensive experimental results conducted on real-life datasets show that the proposed RHLM outperforms the state-of-the-art methods in terms of search quality and efficiency.

  5. Semantic similarity measure in biomedical domain leverage web search engine.

    Science.gov (United States)

    Chen, Chi-Huang; Hsieh, Sheau-Ling; Weng, Yung-Ching; Chang, Wen-Yung; Lai, Feipei

    2010-01-01

    Semantic similarity measure plays an essential role in Information Retrieval and Natural Language Processing. In this paper we propose a page-count-based semantic similarity measure and apply it in biomedical domains. Previous researches in semantic web related applications have deployed various semantic similarity measures. Despite the usefulness of the measurements in those applications, measuring semantic similarity between two terms remains a challenge task. The proposed method exploits page counts returned by the Web Search Engine. We define various similarity scores for two given terms P and Q, using the page counts for querying P, Q and P AND Q. Moreover, we propose a novel approach to compute semantic similarity using lexico-syntactic patterns with page counts. These different similarity scores are integrated adapting support vector machines, to leverage the robustness of semantic similarity measures. Experimental results on two datasets achieve correlation coefficients of 0.798 on the dataset provided by A. Hliaoutakis, 0.705 on the dataset provide by T. Pedersen with physician scores and 0.496 on the dataset provided by T. Pedersen et al. with expert scores.

  6. Target-nontarget similarity decreases search efficiency and increases stimulus-driven control in visual search.

    Science.gov (United States)

    Barras, Caroline; Kerzel, Dirk

    2017-10-01

    Some points of criticism against the idea that attentional selection is controlled by bottom-up processing were dispelled by the attentional window account. The attentional window account claims that saliency computations during visual search are only performed for stimuli inside the attentional window. Therefore, a small attentional window may avoid attentional capture by salient distractors because it is likely that the salient distractor is located outside the window. In contrast, a large attentional window increases the chances of attentional capture by a salient distractor. Large and small attentional windows have been associated with efficient (parallel) and inefficient (serial) search, respectively. We compared the effect of a salient color singleton on visual search for a shape singleton during efficient and inefficient search. To vary search efficiency, the nontarget shapes were either similar or dissimilar with respect to the shape singleton. We found that interference from the color singleton was larger with inefficient than efficient search, which contradicts the attentional window account. While inconsistent with the attentional window account, our results are predicted by computational models of visual search. Because of target-nontarget similarity, the target was less salient with inefficient than efficient search. Consequently, the relative saliency of the color distractor was higher with inefficient than with efficient search. Accordingly, stronger attentional capture resulted. Overall, the present results show that bottom-up control by stimulus saliency is stronger when search is difficult, which is inconsistent with the attentional window account.

  7. Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision

    Directory of Open Access Journals (Sweden)

    Holliday John D

    2011-08-01

    Full Text Available Abstract Background Data fusion methods are widely used in virtual screening, and make the implicit assumption that the more often a molecule is retrieved in multiple similarity searches, the more likely it is to be active. This paper tests the correctness of this assumption. Results Sets of 25 searches using either the same reference structure and 25 different similarity measures (similarity fusion or 25 different reference structures and the same similarity measure (group fusion show that large numbers of unique molecules are retrieved by just a single search, but that the numbers of unique molecules decrease very rapidly as more searches are considered. This rapid decrease is accompanied by a rapid increase in the fraction of those retrieved molecules that are active. There is an approximately log-log relationship between the numbers of different molecules retrieved and the number of searches carried out, and a rationale for this power-law behaviour is provided. Conclusions Using multiple searches provides a simple way of increasing the precision of a similarity search, and thus provides a justification for the use of data fusion methods in virtual screening.

  8. Fast and accurate database searches with MS-GF+Percolator

    Energy Technology Data Exchange (ETDEWEB)

    Granholm, Viktor; Kim, Sangtae; Navarro, Jose' C.; Sjolund, Erik; Smith, Richard D.; Kall, Lukas

    2014-02-28

    To identify peptides and proteins from the large number of fragmentation spectra in mass spectrometrybased proteomics, researches commonly employ so called database search engines. Additionally, postprocessors like Percolator have been used on the results from such search engines, to assess confidence, infer peptides and generally increase the number of identifications. A recent search engine, MS-GF+, has previously been showed to out-perform these classical search engines in terms of the number of identified spectra. However, MS-GF+ generates only limited statistical estimates of the results, hence hampering the biological interpretation. Here, we enabled Percolator-processing for MS-GF+ output, and observed an increased number of identified peptides for a wide variety of datasets. In addition, Percolator directly reports false discovery rate estimates, such as q values and posterior error probabilities, as well as p values, for peptide-spectrum matches, peptides and proteins, functions useful for the whole proteomics community.

  9. Query-dependent banding (QDB for faster RNA similarity searches.

    Directory of Open Access Journals (Sweden)

    Eric P Nawrocki

    2007-03-01

    Full Text Available When searching sequence databases for RNAs, it is desirable to score both primary sequence and RNA secondary structure similarity. Covariance models (CMs are probabilistic models well-suited for RNA similarity search applications. However, the computational complexity of CM dynamic programming alignment algorithms has limited their practical application. Here we describe an acceleration method called query-dependent banding (QDB, which uses the probabilistic query CM to precalculate regions of the dynamic programming lattice that have negligible probability, independently of the target database. We have implemented QDB in the freely available Infernal software package. QDB reduces the average case time complexity of CM alignment from LN(2.4 to LN(1.3 for a query RNA of N residues and a target database of L residues, resulting in a 4-fold speedup for typical RNA queries. Combined with other improvements to Infernal, including informative mixture Dirichlet priors on model parameters, benchmarks also show increased sensitivity and specificity resulting from improved parameterization.

  10. Omokage search: shape similarity search service for biomolecular structures in both the PDB and EMDB.

    Science.gov (United States)

    Suzuki, Hirofumi; Kawabata, Takeshi; Nakamura, Haruki

    2016-02-15

    Omokage search is a service to search the global shape similarity of biological macromolecules and their assemblies, in both the Protein Data Bank (PDB) and Electron Microscopy Data Bank (EMDB). The server compares global shapes of assemblies independent of sequence order and number of subunits. As a search query, the user inputs a structure ID (PDB ID or EMDB ID) or uploads an atomic model or 3D density map to the server. The search is performed usually within 1 min, using one-dimensional profiles (incremental distance rank profiles) to characterize the shapes. Using the gmfit (Gaussian mixture model fitting) program, the found structures are fitted onto the query structure and their superimposed structures are displayed on the Web browser. Our service provides new structural perspectives to life science researchers. Omokage search is freely accessible at http://pdbj.org/omokage/. © The Author 2015. Published by Oxford University Press.

  11. Efficient Similarity Search Using the Earth Mover's Distance for Large Multimedia Databases

    DEFF Research Database (Denmark)

    Assent, Ira; Wichterich, Marc; Meisen, Tobias

    2008-01-01

    Multimedia similarity search in large databases requires efficient query processing. The Earth mover's distance, introduced in computer vision, is successfully used as a similarity model in a number of small-scale applications. Its computational complexity hindered its adoption in large multimedia...... databases. We enable directly indexing the Earth mover's distance in structures such as the R-tree and the VA-file by providing the accurate 'MinDist' function to any bounding rectangle in the index. We exploit the computational structure of the new MinDist to derive a new lower bound for the EMD Min...

  12. Accurate image search using the contextual dissimilarity measure.

    Science.gov (United States)

    Jegou, Hervé; Schmid, Cordelia; Harzallah, Hedi; Verbeek, Jakob

    2010-01-01

    This paper introduces the contextual dissimilarity measure, which significantly improves the accuracy of bag-of-features-based image search. Our measure takes into account the local distribution of the vectors and iteratively estimates distance update terms in the spirit of Sinkhorn's scaling algorithm, thereby modifying the neighborhood structure. Experimental results show that our approach gives significantly better results than a standard distance and outperforms the state of the art in terms of accuracy on the Nistér-Stewénius and Lola data sets. This paper also evaluates the impact of a large number of parameters, including the number of descriptors, the clustering method, the visual vocabulary size, and the distance measure. The optimal parameter choice is shown to be quite context-dependent. In particular, using a large number of descriptors is interesting only when using our dissimilarity measure. We have also evaluated two novel variants: multiple assignment and rank aggregation. They are shown to further improve accuracy at the cost of higher memory usage and lower efficiency.

  13. Efficient and accurate Greedy Search Methods for mining functional modules in protein interaction networks.

    Science.gov (United States)

    He, Jieyue; Li, Chaojun; Ye, Baoliu; Zhong, Wei

    2012-06-25

    Most computational algorithms mainly focus on detecting highly connected subgraphs in PPI networks as protein complexes but ignore their inherent organization. Furthermore, many of these algorithms are computationally expensive. However, recent analysis indicates that experimentally detected protein complexes generally contain Core/attachment structures. In this paper, a Greedy Search Method based on Core-Attachment structure (GSM-CA) is proposed. The GSM-CA method detects densely connected regions in large protein-protein interaction networks based on the edge weight and two criteria for determining core nodes and attachment nodes. The GSM-CA method improves the prediction accuracy compared to other similar module detection approaches, however it is computationally expensive. Many module detection approaches are based on the traditional hierarchical methods, which is also computationally inefficient because the hierarchical tree structure produced by these approaches cannot provide adequate information to identify whether a network belongs to a module structure or not. In order to speed up the computational process, the Greedy Search Method based on Fast Clustering (GSM-FC) is proposed in this work. The edge weight based GSM-FC method uses a greedy procedure to traverse all edges just once to separate the network into the suitable set of modules. The proposed methods are applied to the protein interaction network of S. cerevisiae. Experimental results indicate that many significant functional modules are detected, most of which match the known complexes. Results also demonstrate that the GSM-FC algorithm is faster and more accurate as compared to other competing algorithms. Based on the new edge weight definition, the proposed algorithm takes advantages of the greedy search procedure to separate the network into the suitable set of modules. Experimental analysis shows that the identified modules are statistically significant. The algorithm can reduce the

  14. Density-based similarity measures for content based search

    Energy Technology Data Exchange (ETDEWEB)

    Hush, Don R [Los Alamos National Laboratory; Porter, Reid B [Los Alamos National Laboratory; Ruggiero, Christy E [Los Alamos National Laboratory

    2009-01-01

    We consider the query by multiple example problem where the goal is to identify database samples whose content is similar to a coUection of query samples. To assess the similarity we use a relative content density which quantifies the relative concentration of the query distribution to the database distribution. If the database distribution is a mixture of the query distribution and a background distribution then it can be shown that database samples whose relative content density is greater than a particular threshold {rho} are more likely to have been generated by the query distribution than the background distribution. We describe an algorithm for predicting samples with relative content density greater than {rho} that is computationally efficient and possesses strong performance guarantees. We also show empirical results for applications in computer network monitoring and image segmentation.

  15. SHOP: scaffold hopping by GRID-based similarity searches

    DEFF Research Database (Denmark)

    Bergmann, Rikke; Linusson, Anna; Zamora, Ismael

    2007-01-01

    scaffolds were in the 31 top-ranked scaffolds. SHOP also identified new scaffolds with substantially different chemotypes from the queries. Docking analysis indicated that the new scaffolds would have similar binding modes to those of the respective query scaffolds observed in X-ray structures...... ligands of three different protein targets relevant for drug discovery using a rational approach based on statistical experimental design. Five out of eight and seven out of eight thrombin scaffolds and all seven HIV protease scaffolds were recovered within the top 10 and 31 out of 31 neuraminidase...

  16. Accurate estimation of influenza epidemics using Google search data via ARGO.

    Science.gov (United States)

    Yang, Shihao; Santillana, Mauricio; Kou, S C

    2015-11-24

    Accurate real-time tracking of influenza outbreaks helps public health officials make timely and meaningful decisions that could save lives. We propose an influenza tracking model, ARGO (AutoRegression with GOogle search data), that uses publicly available online search data. In addition to having a rigorous statistical foundation, ARGO outperforms all previously available Google-search-based tracking models, including the latest version of Google Flu Trends, even though it uses only low-quality search data as input from publicly available Google Trends and Google Correlate websites. ARGO not only incorporates the seasonality in influenza epidemics but also captures changes in people's online search behavior over time. ARGO is also flexible, self-correcting, robust, and scalable, making it a potentially powerful tool that can be used for real-time tracking of other social events at multiple temporal and spatial resolutions.

  17. Content-Based Search on a Database of Geometric Models: Identifying Objects of Similar Shape

    Energy Technology Data Exchange (ETDEWEB)

    XAVIER, PATRICK G.; HENRY, TYSON R.; LAFARGE, ROBERT A.; MEIRANS, LILITA; RAY, LAWRENCE P.

    2001-11-01

    The Geometric Search Engine is a software system for storing and searching a database of geometric models. The database maybe searched for modeled objects similar in shape to a target model supplied by the user. The database models are generally from CAD models while the target model may be either a CAD model or a model generated from range data collected from a physical object. This document describes key generation, database layout, and search of the database.

  18. Perceptual Grouping in Haptic Search: The Influence of Proximity, Similarity, and Good Continuation

    Science.gov (United States)

    Overvliet, Krista E.; Krampe, Ralf Th.; Wagemans, Johan

    2012-01-01

    We conducted a haptic search experiment to investigate the influence of the Gestalt principles of proximity, similarity, and good continuation. We expected faster search when the distractors could be grouped. We chose edges at different orientations as stimuli because they are processed similarly in the haptic and visual modality. We therefore…

  19. OS2: Oblivious similarity based searching for encrypted data outsourced to an untrusted domain

    Science.gov (United States)

    Pervez, Zeeshan; Ahmad, Mahmood; Khattak, Asad Masood; Ramzan, Naeem

    2017-01-01

    Public cloud storage services are becoming prevalent and myriad data sharing, archiving and collaborative services have emerged which harness the pay-as-you-go business model of public cloud. To ensure privacy and confidentiality often encrypted data is outsourced to such services, which further complicates the process of accessing relevant data by using search queries. Search over encrypted data schemes solve this problem by exploiting cryptographic primitives and secure indexing to identify outsourced data that satisfy the search criteria. Almost all of these schemes rely on exact matching between the encrypted data and search criteria. A few schemes which extend the notion of exact matching to similarity based search, lack realism as those schemes rely on trusted third parties or due to increase storage and computational complexity. In this paper we propose Oblivious Similarity based Search (OS2) for encrypted data. It enables authorized users to model their own encrypted search queries which are resilient to typographical errors. Unlike conventional methodologies, OS2 ranks the search results by using similarity measure offering a better search experience than exact matching. It utilizes encrypted bloom filter and probabilistic homomorphic encryption to enable authorized users to access relevant data without revealing results of search query evaluation process to the untrusted cloud service provider. Encrypted bloom filter based search enables OS2 to reduce search space to potentially relevant encrypted data avoiding unnecessary computation on public cloud. The efficacy of OS2 is evaluated on Google App Engine for various bloom filter lengths on different cloud configurations. PMID:28692697

  20. Fast association-rule-based similarity search in 3D models

    Science.gov (United States)

    Dua, Sumeet; Jain, Vineet

    2004-11-01

    Advances in automated data collection tools in design and manufacturing have far exceeded our capacity to analyze this data for novel information. Techniques of data mining and knowledge discovery in large databases promise computationally efficient and accurate means to analyze such data for patterns and similar structures. In this paper, we present a unique data mining approach for finding similarities in classes of 3D models, using discovery of association rules. PCA is first performed on the 3D model to transform it along first principal axis. Transformed 3D model is then sliced and segmented along multiple principal axes, such that each slice can be interpreted as a transaction in a transaction database. Association-rule discovery is performed on this transaction space for multiple models and common association rules among those transactions are stored as a representative of a class of models. We have evaluated the performance of association rules for efficient representation of classes of shape models. The method is time and space efficient, besides presenting a novel paradigm for searching content dependencies in a database of 3D models.

  1. Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing.

    Science.gov (United States)

    Cao, Yiqun; Jiang, Tao; Girke, Thomas

    2010-04-01

    Similarity searching and clustering of chemical compounds by structural similarities are important computational approaches for identifying drug-like small molecules. Most algorithms available for these tasks are limited by their speed and scalability, and cannot handle today's large compound databases with several million entries. In this article, we introduce a new algorithm for accelerated similarity searching and clustering of very large compound sets using embedding and indexing (EI) techniques. First, we present EI-Search as a general purpose similarity search method for finding objects with similar features in large databases and apply it here to searching and clustering of large compound sets. The method embeds the compounds in a high-dimensional Euclidean space and searches this space using an efficient index-aware nearest neighbor search method based on locality sensitive hashing (LSH). Second, to cluster large compound sets, we introduce the EI-Clustering algorithm that combines the EI-Search method with Jarvis-Patrick clustering. Both methods were tested on three large datasets with sizes ranging from about 260 000 to over 19 million compounds. In comparison to sequential search methods, the EI-Search method was 40-200 times faster, while maintaining comparable recall rates. The EI-Clustering method allowed us to significantly reduce the CPU time required to cluster these large compound libraries from several months to only a few days. Software implementations and online services have been developed based on the methods introduced in this study. The online services provide access to the generated clustering results and ultra-fast similarity searching of the PubChem Compound database with subsecond response time.

  2. Searching the protein structure database for ligand-binding site similarities using CPASS v.2

    Directory of Open Access Journals (Sweden)

    Caprez Adam

    2011-01-01

    Full Text Available Abstract Background A recent analysis of protein sequences deposited in the NCBI RefSeq database indicates that ~8.5 million protein sequences are encoded in prokaryotic and eukaryotic genomes, where ~30% are explicitly annotated as "hypothetical" or "uncharacterized" protein. Our Comparison of Protein Active-Site Structures (CPASS v.2 database and software compares the sequence and structural characteristics of experimentally determined ligand binding sites to infer a functional relationship in the absence of global sequence or structure similarity. CPASS is an important component of our Functional Annotation Screening Technology by NMR (FAST-NMR protocol and has been successfully applied to aid the annotation of a number of proteins of unknown function. Findings We report a major upgrade to our CPASS software and database that significantly improves its broad utility. CPASS v.2 is designed with a layered architecture to increase flexibility and portability that also enables job distribution over the Open Science Grid (OSG to increase speed. Similarly, the CPASS interface was enhanced to provide more user flexibility in submitting a CPASS query. CPASS v.2 now allows for both automatic and manual definition of ligand-binding sites and permits pair-wise, one versus all, one versus list, or list versus list comparisons. Solvent accessible surface area, ligand root-mean square difference, and Cβ distances have been incorporated into the CPASS similarity function to improve the quality of the results. The CPASS database has also been updated. Conclusions CPASS v.2 is more than an order of magnitude faster than the original implementation, and allows for multiple simultaneous job submissions. Similarly, the CPASS database of ligand-defined binding sites has increased in size by ~ 38%, dramatically increasing the likelihood of a positive search result. The modification to the CPASS similarity function is effective in reducing CPASS similarity scores

  3. MUSE: An Efficient and Accurate Verifiable Privacy-Preserving Multikeyword Text Search over Encrypted Cloud Data

    Directory of Open Access Journals (Sweden)

    Zhu Xiangyang

    2017-01-01

    Full Text Available With the development of cloud computing, services outsourcing in clouds has become a popular business model. However, due to the fact that data storage and computing are completely outsourced to the cloud service provider, sensitive data of data owners is exposed, which could bring serious privacy disclosure. In addition, some unexpected events, such as software bugs and hardware failure, could cause incomplete or incorrect results returned from clouds. In this paper, we propose an efficient and accurate verifiable privacy-preserving multikeyword text search over encrypted cloud data based on hierarchical agglomerative clustering, which is named MUSE. In order to improve the efficiency of text searching, we proposed a novel index structure, HAC-tree, which is based on a hierarchical agglomerative clustering method and tends to gather the high-relevance documents in clusters. Based on the HAC-tree, a noncandidate pruning depth-first search algorithm is proposed, which can filter the unqualified subtrees and thus accelerate the search process. The secure inner product algorithm is used to encrypted the HAC-tree index and the query vector. Meanwhile, a completeness verification algorithm is given to verify search results. Experiment results demonstrate that the proposed method outperforms the existing works, DMRS and MRSE-HCI, in efficiency and accuracy, respectively.

  4. Content Based Retrieval Database Management System with Support for Similarity Searching and Query Refinement

    National Research Council Canada - National Science Library

    Ortega-Binderberger, Michael

    2002-01-01

    ... as a critical area of research. This thesis explores how to enhance database systems with content based search over arbitrary abstract data types in a similarity based framework with query refinement...

  5. [Formula: see text]: Oblivious similarity based searching for encrypted data outsourced to an untrusted domain.

    Directory of Open Access Journals (Sweden)

    Zeeshan Pervez

    Full Text Available Public cloud storage services are becoming prevalent and myriad data sharing, archiving and collaborative services have emerged which harness the pay-as-you-go business model of public cloud. To ensure privacy and confidentiality often encrypted data is outsourced to such services, which further complicates the process of accessing relevant data by using search queries. Search over encrypted data schemes solve this problem by exploiting cryptographic primitives and secure indexing to identify outsourced data that satisfy the search criteria. Almost all of these schemes rely on exact matching between the encrypted data and search criteria. A few schemes which extend the notion of exact matching to similarity based search, lack realism as those schemes rely on trusted third parties or due to increase storage and computational complexity. In this paper we propose Oblivious Similarity based Search ([Formula: see text] for encrypted data. It enables authorized users to model their own encrypted search queries which are resilient to typographical errors. Unlike conventional methodologies, [Formula: see text] ranks the search results by using similarity measure offering a better search experience than exact matching. It utilizes encrypted bloom filter and probabilistic homomorphic encryption to enable authorized users to access relevant data without revealing results of search query evaluation process to the untrusted cloud service provider. Encrypted bloom filter based search enables [Formula: see text] to reduce search space to potentially relevant encrypted data avoiding unnecessary computation on public cloud. The efficacy of [Formula: see text] is evaluated on Google App Engine for various bloom filter lengths on different cloud configurations.

  6. [Formula: see text]: Oblivious similarity based searching for encrypted data outsourced to an untrusted domain.

    Science.gov (United States)

    Pervez, Zeeshan; Ahmad, Mahmood; Khattak, Asad Masood; Ramzan, Naeem; Khan, Wajahat Ali

    2017-01-01

    Public cloud storage services are becoming prevalent and myriad data sharing, archiving and collaborative services have emerged which harness the pay-as-you-go business model of public cloud. To ensure privacy and confidentiality often encrypted data is outsourced to such services, which further complicates the process of accessing relevant data by using search queries. Search over encrypted data schemes solve this problem by exploiting cryptographic primitives and secure indexing to identify outsourced data that satisfy the search criteria. Almost all of these schemes rely on exact matching between the encrypted data and search criteria. A few schemes which extend the notion of exact matching to similarity based search, lack realism as those schemes rely on trusted third parties or due to increase storage and computational complexity. In this paper we propose Oblivious Similarity based Search ([Formula: see text]) for encrypted data. It enables authorized users to model their own encrypted search queries which are resilient to typographical errors. Unlike conventional methodologies, [Formula: see text] ranks the search results by using similarity measure offering a better search experience than exact matching. It utilizes encrypted bloom filter and probabilistic homomorphic encryption to enable authorized users to access relevant data without revealing results of search query evaluation process to the untrusted cloud service provider. Encrypted bloom filter based search enables [Formula: see text] to reduce search space to potentially relevant encrypted data avoiding unnecessary computation on public cloud. The efficacy of [Formula: see text] is evaluated on Google App Engine for various bloom filter lengths on different cloud configurations.

  7. Using electronic health records and Internet search information for accurate influenza forecasting.

    Science.gov (United States)

    Yang, Shihao; Santillana, Mauricio; Brownstein, John S; Gray, Josh; Richardson, Stewart; Kou, S C

    2017-05-08

    Accurate influenza activity forecasting helps public health officials prepare and allocate resources for unusual influenza activity. Traditional flu surveillance systems, such as the Centers for Disease Control and Prevention's (CDC) influenza-like illnesses reports, lag behind real-time by one to 2 weeks, whereas information contained in cloud-based electronic health records (EHR) and in Internet users' search activity is typically available in near real-time. We present a method that combines the information from these two data sources with historical flu activity to produce national flu forecasts for the United States up to 4 weeks ahead of the publication of CDC's flu reports. We extend a method originally designed to track flu using Google searches, named ARGO, to combine information from EHR and Internet searches with historical flu activities. Our regularized multivariate regression model dynamically selects the most appropriate variables for flu prediction every week. The model is assessed for the flu seasons within the time period 2013-2016 using multiple metrics including root mean squared error (RMSE). Our method reduces the RMSE of the publicly available alternative (Healthmap flutrends) method by 33, 20, 17 and 21%, for the four time horizons: real-time, one, two, and 3 weeks ahead, respectively. Such accuracy improvements are statistically significant at the 5% level. Our real-time estimates correctly identified the peak timing and magnitude of the studied flu seasons. Our method significantly reduces the prediction error when compared to historical publicly available Internet-based prediction systems, demonstrating that: (1) the method to combine data sources is as important as data quality; (2) effectively extracting information from a cloud-based EHR and Internet search activity leads to accurate forecast of flu.

  8. Efficient and accurate nearest neighbor and closest pair search in high-dimensional space

    KAUST Repository

    Tao, Yufei

    2010-07-01

    Nearest Neighbor (NN) search in high-dimensional space is an important problem in many applications. From the database perspective, a good solution needs to have two properties: (i) it can be easily incorporated in a relational database, and (ii) its query cost should increase sublinearly with the dataset size, regardless of the data and query distributions. Locality-Sensitive Hashing (LSH) is a well-known methodology fulfilling both requirements, but its current implementations either incur expensive space and query cost, or abandon its theoretical guarantee on the quality of query results. Motivated by this, we improve LSH by proposing an access method called the Locality-Sensitive B-tree (LSB-tree) to enable fast, accurate, high-dimensional NN search in relational databases. The combination of several LSB-trees forms a LSB-forest that has strong quality guarantees, but improves dramatically the efficiency of the previous LSH implementation having the same guarantees. In practice, the LSB-tree itself is also an effective index which consumes linear space, supports efficient updates, and provides accurate query results. In our experiments, the LSB-tree was faster than: (i) iDistance (a famous technique for exact NN search) by two orders ofmagnitude, and (ii) MedRank (a recent approximate method with nontrivial quality guarantees) by one order of magnitude, and meanwhile returned much better results. As a second step, we extend our LSB technique to solve another classic problem, called Closest Pair (CP) search, in high-dimensional space. The long-term challenge for this problem has been to achieve subquadratic running time at very high dimensionalities, which fails most of the existing solutions. We show that, using a LSB-forest, CP search can be accomplished in (worst-case) time significantly lower than the quadratic complexity, yet still ensuring very good quality. In practice, accurate answers can be found using just two LSB-trees, thus giving a substantial

  9. Managing biomedical image metadata for search and retrieval of similar images.

    Science.gov (United States)

    Korenblum, Daniel; Rubin, Daniel; Napel, Sandy; Rodriguez, Cesar; Beaulieu, Chris

    2011-08-01

    Radiology images are generally disconnected from the metadata describing their contents, such as imaging observations ("semantic" metadata), which are usually described in text reports that are not directly linked to the images. We developed a system, the Biomedical Image Metadata Manager (BIMM) to (1) address the problem of managing biomedical image metadata and (2) facilitate the retrieval of similar images using semantic feature metadata. Our approach allows radiologists, researchers, and students to take advantage of the vast and growing repositories of medical image data by explicitly linking images to their associated metadata in a relational database that is globally accessible through a Web application. BIMM receives input in the form of standard-based metadata files using Web service and parses and stores the metadata in a relational database allowing efficient data query and maintenance capabilities. Upon querying BIMM for images, 2D regions of interest (ROIs) stored as metadata are automatically rendered onto preview images included in search results. The system's "match observations" function retrieves images with similar ROIs based on specific semantic features describing imaging observation characteristics (IOCs). We demonstrate that the system, using IOCs alone, can accurately retrieve images with diagnoses matching the query images, and we evaluate its performance on a set of annotated liver lesion images. BIMM has several potential applications, e.g., computer-aided detection and diagnosis, content-based image retrieval, automating medical analysis protocols, and gathering population statistics like disease prevalences. The system provides a framework for decision support systems, potentially improving their diagnostic accuracy and selection of appropriate therapies.

  10. SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters

    Directory of Open Access Journals (Sweden)

    Lefkowitz Elliot J

    2004-10-01

    Full Text Available Abstract Background Large-scale sequence comparison is a powerful tool for biological inference in modern molecular biology. Comparing new sequences to those in annotated databases is a useful source of functional and structural information about these sequences. Using software such as the basic local alignment search tool (BLAST or HMMPFAM to identify statistically significant matches between newly sequenced segments of genetic material and those in databases is an important task for most molecular biologists. Searching algorithms are intrinsically slow and data-intensive, especially in light of the rapid growth of biological sequence databases due to the emergence of high throughput DNA sequencing techniques. Thus, traditional bioinformatics tools are impractical on PCs and even on dedicated UNIX servers. To take advantage of larger databases and more reliable methods, high performance computation becomes necessary. Results We describe the implementation of SS-Wrapper (Similarity Search Wrapper, a package of wrapper applications that can parallelize similarity search applications on a Linux cluster. Our wrapper utilizes a query segmentation-search (QS-search approach to parallelize sequence database search applications. It takes into consideration load balancing between each node on the cluster to maximize resource usage. QS-search is designed to wrap many different search tools, such as BLAST and HMMPFAM using the same interface. This implementation does not alter the original program, so newly obtained programs and program updates should be accommodated easily. Benchmark experiments using QS-search to optimize BLAST and HMMPFAM showed that QS-search accelerated the performance of these programs almost linearly in proportion to the number of CPUs used. We have also implemented a wrapper that utilizes a database segmentation approach (DS-BLAST that provides a complementary solution for BLAST searches when the database is too large to fit into

  11. SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters.

    Science.gov (United States)

    Wang, Chunlin; Lefkowitz, Elliot J

    2004-10-28

    Large-scale sequence comparison is a powerful tool for biological inference in modern molecular biology. Comparing new sequences to those in annotated databases is a useful source of functional and structural information about these sequences. Using software such as the basic local alignment search tool (BLAST) or HMMPFAM to identify statistically significant matches between newly sequenced segments of genetic material and those in databases is an important task for most molecular biologists. Searching algorithms are intrinsically slow and data-intensive, especially in light of the rapid growth of biological sequence databases due to the emergence of high throughput DNA sequencing techniques. Thus, traditional bioinformatics tools are impractical on PCs and even on dedicated UNIX servers. To take advantage of larger databases and more reliable methods, high performance computation becomes necessary. We describe the implementation of SS-Wrapper (Similarity Search Wrapper), a package of wrapper applications that can parallelize similarity search applications on a Linux cluster. Our wrapper utilizes a query segmentation-search (QS-search) approach to parallelize sequence database search applications. It takes into consideration load balancing between each node on the cluster to maximize resource usage. QS-search is designed to wrap many different search tools, such as BLAST and HMMPFAM using the same interface. This implementation does not alter the original program, so newly obtained programs and program updates should be accommodated easily. Benchmark experiments using QS-search to optimize BLAST and HMMPFAM showed that QS-search accelerated the performance of these programs almost linearly in proportion to the number of CPUs used. We have also implemented a wrapper that utilizes a database segmentation approach (DS-BLAST) that provides a complementary solution for BLAST searches when the database is too large to fit into the memory of a single node. Used together

  12. MEASURING THE PERFORMANCE OF SIMILARITY PROPAGATION IN AN SEMANTIC SEARCH ENGINE

    Directory of Open Access Journals (Sweden)

    S. K. Jayanthi

    2013-10-01

    Full Text Available In the current scenario, web page result personalization is playing a vital role. Nearly 80 % of the users expect the best results in the first page itself without having any persistence to browse longer in URL mode. This research work focuses on two main themes: Semantic web search through online and Domain based search through offline. The first part is to find an effective method which allows grouping similar results together using BookShelf Data Structure and organizing the various clusters. The second one is focused on the academic domain based search through offline. This paper focuses on finding documents which are similar and how Vector space can be used to solve it. So more weightage is given for the principles and working methodology of similarity propagation. Cosine similarity measure is used for finding the relevancy among the documents.

  13. RxnFinder: biochemical reaction search engines using molecular structures, molecular fragments and reaction similarity.

    Science.gov (United States)

    Hu, Qian-Nan; Deng, Zhe; Hu, Huanan; Cao, Dong-Sheng; Liang, Yi-Zeng

    2011-09-01

    Biochemical reactions play a key role to help sustain life and allow cells to grow. RxnFinder was developed to search biochemical reactions from KEGG reaction database using three search criteria: molecular structures, molecular fragments and reaction similarity. RxnFinder is helpful to get reference reactions for biosynthesis and xenobiotics metabolism. RxnFinder is freely available via: http://sdd.whu.edu.cn/rxnfinder. qnhu@whu.edu.cn.

  14. Searching for Unknown Earthquakes in the Guy-Greenbrier, Arkansas, Earthquake Sequence using Efficient Waveform Similarity Search

    Science.gov (United States)

    Yoon, C. E.; OReilly, O. J.; Bergen, K.; Huang, Y.; Beroza, G. C.

    2015-12-01

    Recent seismicity rate increases in the central United States have been attributed to injection of wastewater from oil and gas production. One example is the Guy-Greenbrier, Arkansas, earthquake sequence, which occurred from July 2010 through October 2011, and was potentially induced by fluid injection into nearby disposal wells (Horton, 2012). Although the Arkansas seismic network is sparse, a single 3-component station WHAR recorded continuous data before, during, and after this earthquake sequence at distances ranging from 2-9 km. Huang and Beroza (2015) used template matching to detect over 100 times the number of cataloged earthquakes by cross-correlating the continuous data with waveform templates based on known earthquakes to search for additional low-magnitude events. Because known waveform templates do not necessarily fully represent all seismic signals in the continuous data, small earthquakes from unknown sources could have escaped detection. We use a method called Fingerprint And Similarity Thresholding (FAST) to detect additional low-magnitude earthquakes that were missed by template matching. FAST enables fast, scalable search for earthquakes with similar waveforms without making prior assumptions about the seismic signal of interest. FAST, based on a data mining technique, first creates compact "fingerprints" of waveforms by extracting discriminative features, then uses locality-sensitive hashing to organize and efficiently search for similar fingerprints (and therefore similar earthquake waveforms) in a probabilistic manner. With FAST, each search query is processed in near-constant time, independent of the dataset size; this computational efficiency is gained at the expense of an approximate, rather than exact, search. During one week of continuous data at station WHAR, from 2010-07-01 to 2010-07-08, FAST detected over 200 uncataloged earthquakes that were not found through template matching. These newly detected earthquakes have the potential to

  15. Wikipedia Chemical Structure Explorer: substructure and similarity searching of molecules from Wikipedia.

    Science.gov (United States)

    Ertl, Peter; Patiny, Luc; Sander, Thomas; Rufener, Christian; Zasso, Michaël

    2015-01-01

    Wikipedia, the world's largest and most popular encyclopedia is an indispensable source of chemistry information. It contains among others also entries for over 15,000 chemicals including metabolites, drugs, agrochemicals and industrial chemicals. To provide an easy access to this wealth of information we decided to develop a substructure and similarity search tool for chemical structures referenced in Wikipedia. We extracted chemical structures from entries in Wikipedia and implemented a web system allowing structure and similarity searching on these data. The whole search as well as visualization system is written in JavaScript and therefore can run locally within a web page and does not require a central server. The Wikipedia Chemical Structure Explorer is accessible on-line at www.cheminfo.org/wikipedia and is available also as an open source project from GitHub for local installation. The web-based Wikipedia Chemical Structure Explorer provides a useful resource for research as well as for chemical education enabling both researchers and students easy and user friendly chemistry searching and identification of relevant information in Wikipedia. The tool can also help to improve quality of chemical entries in Wikipedia by providing potential contributors regularly updated list of entries with problematic structures. And last but not least this search system is a nice example of how the modern web technology can be applied in the field of cheminformatics. Graphical abstractWikipedia Chemical Structure Explorer allows substructure and similarity searches on molecules referenced in Wikipedia.

  16. Determinants of dwell time in visual search: similarity or perceptual difficulty?

    Science.gov (United States)

    Becker, Stefanie I

    2011-03-08

    The present study examined the factors that determine the dwell times in a visual search task, that is, the duration the gaze remains fixated on an object. It has been suggested that an item's similarity to the search target should be an important determiner of dwell times, because dwell times are taken to reflect the time needed to reject the item as a distractor, and such discriminations are supposed to be harder the more similar an item is to the search target. In line with this similarity view, a previous study shows that, in search for a target ring of thin line-width, dwell times on thin linewidth Landolt C's distractors were longer than dwell times on Landolt C's with thick or medium linewidth. However, dwell times may have been longer on thin Landolt C's because the thin line-width made it harder to detect whether the stimuli had a gap or not. Thus, it is an open question whether dwell times on thin line-width distractors were longer because they were similar to the target or because the perceptual decision was more difficult. The present study de-coupled similarity from perceptual difficulty, by measuring dwell times on thin, medium and thick line-width distractors when the target had thin, medium or thick line-width. The results showed that dwell times were longer on target-similar than target-dissimilar stimuli across all target conditions and regardless of the line-width. It is concluded that prior findings of longer dwell times on thin linewidth-distractors can clearly be attributed to target similarity. As will be discussed towards the end, the finding of similarity effects on dwell times has important implications for current theories of visual search and eye movement control.

  17. Ligand scaffold hopping combining 3D maximal substructure search and molecular similarity

    Directory of Open Access Journals (Sweden)

    Petitjean Michel

    2009-08-01

    Full Text Available Abstract Background Virtual screening methods are now well established as effective to identify hit and lead candidates and are fully integrated in most drug discovery programs. Ligand-based approaches make use of physico-chemical, structural and energetics properties of known active compounds to search large chemical libraries for related and novel chemotypes. While 2D-similarity search tools are known to be fast and efficient, the use of 3D-similarity search methods can be very valuable to many research projects as integration of "3D knowledge" can facilitate the identification of not only related molecules but also of chemicals possessing distant scaffolds as compared to the query and therefore be more inclined to scaffolds hopping. To date, very few methods performing this task are easily available to the scientific community. Results We introduce a new approach (LigCSRre to the 3D ligand similarity search of drug candidates. It combines a 3D maximum common substructure search algorithm independent on atom order with a tunable description of atomic compatibilities to prune the search and increase its physico-chemical relevance. We show, on 47 experimentally validated active compounds across five protein targets having different specificities, that for single compound search, the approach is able to recover on average 52% of the co-actives in the top 1% of the ranked list which is better than gold standards of the field. Moreover, the combination of several runs on a single protein target using different query active compounds shows a remarkable improvement in enrichment. Such Results demonstrate LigCSRre as a valuable tool for ligand-based screening. Conclusion LigCSRre constitutes a new efficient and generic approach to the 3D similarity screening of small compounds, whose flexible design opens the door to many enhancements. The program is freely available to the academics for non-profit research at: http://bioserv.rpbs.univ-paris-diderot.fr/LigCSRre.html.

  18. Similarity-based search of model organism, disease and drug effect phenotypes

    KAUST Repository

    Hoehndorf, Robert

    2015-02-19

    Background: Semantic similarity measures over phenotype ontologies have been demonstrated to provide a powerful approach for the analysis of model organism phenotypes, the discovery of animal models of human disease, novel pathways, gene functions, druggable therapeutic targets, and determination of pathogenicity. Results: We have developed PhenomeNET 2, a system that enables similarity-based searches over a large repository of phenotypes in real-time. It can be used to identify strains of model organisms that are phenotypically similar to human patients, diseases that are phenotypically similar to model organism phenotypes, or drug effect profiles that are similar to the phenotypes observed in a patient or model organism. PhenomeNET 2 is available at http://aber-owl.net/phenomenet. Conclusions: Phenotype-similarity searches can provide a powerful tool for the discovery and investigation of molecular mechanisms underlying an observed phenotypic manifestation. PhenomeNET 2 facilitates user-defined similarity searches and allows researchers to analyze their data within a large repository of human, mouse and rat phenotypes.

  19. SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences

    Directory of Open Access Journals (Sweden)

    Chen Ke

    2008-05-01

    Full Text Available Abstract Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is

  20. CMASA: an accurate algorithm for detecting local protein structural similarity and its application to enzyme catalytic site annotation

    Directory of Open Access Journals (Sweden)

    Li Gong-Hua

    2010-08-01

    Full Text Available Abstract Background The rapid development of structural genomics has resulted in many "unknown function" proteins being deposited in Protein Data Bank (PDB, thus, the functional prediction of these proteins has become a challenge for structural bioinformatics. Several sequence-based and structure-based methods have been developed to predict protein function, but these methods need to be improved further, such as, enhancing the accuracy, sensitivity, and the computational speed. Here, an accurate algorithm, the CMASA (Contact MAtrix based local Structural Alignment algorithm, has been developed to predict unknown functions of proteins based on the local protein structural similarity. This algorithm has been evaluated by building a test set including 164 enzyme families, and also been compared to other methods. Results The evaluation of CMASA shows that the CMASA is highly accurate (0.96, sensitive (0.86, and fast enough to be used in the large-scale functional annotation. Comparing to both sequence-based and global structure-based methods, not only the CMASA can find remote homologous proteins, but also can find the active site convergence. Comparing to other local structure comparison-based methods, the CMASA can obtain the better performance than both FFF (a method using geometry to predict protein function and SPASM (a local structure alignment method; and the CMASA is more sensitive than PINTS and is more accurate than JESS (both are local structure alignment methods. The CMASA was applied to annotate the enzyme catalytic sites of the non-redundant PDB, and at least 166 putative catalytic sites have been suggested, these sites can not be observed by the Catalytic Site Atlas (CSA. Conclusions The CMASA is an accurate algorithm for detecting local protein structural similarity, and it holds several advantages in predicting enzyme active sites. The CMASA can be used in large-scale enzyme active site annotation. The CMASA can be available by the

  1. RScan: fast searching structural similarities for structured RNAs in large databases

    Directory of Open Access Journals (Sweden)

    Liu Guo-Ping

    2007-07-01

    Full Text Available Abstract Background Many RNAs have evolutionarily conserved secondary structures instead of primary sequences. Recently, there are an increasing number of methods being developed with focus on the structural alignments for finding conserved secondary structures as well as common structural motifs in pair-wise or multiple sequences. A challenging task is to search similar structures quickly for structured RNA sequences in large genomic databases since existing methods are too slow to be used in large databases. Results An implementation of a fast structural alignment algorithm, RScan, is proposed to fulfill the task. RScan is developed by levering the advantages of both hashing algorithms and local alignment algorithms. In our experiment, on the average, the times for searching a tRNA and an rRNA in the randomized A. pernix genome are only 256 seconds and 832 seconds respectively by using RScan, but need 3,178 seconds and 8,951 seconds respectively by using an existing method RSEARCH. Remarkably, RScan can handle large database queries, taking less than 4 minutes for searching similar structures for a microRNA precursor in human chromosome 21. Conclusion These results indicate that RScan is a preferable choice for real-life application of searching structural similarities for structured RNAs in large databases. RScan software is freely available at http://bioinfo.au.tsinghua.edu.cn/member/cxue/rscan/RScan.htm.

  2. Software Suite for Gene and Protein Annotation Prediction and Similarity Search.

    Science.gov (United States)

    Chicco, Davide; Masseroli, Marco

    2015-01-01

    In the computational biology community, machine learning algorithms are key instruments for many applications, including the prediction of gene-functions based upon the available biomolecular annotations. Additionally, they may also be employed to compute similarity between genes or proteins. Here, we describe and discuss a software suite we developed to implement and make publicly available some of such prediction methods and a computational technique based upon Latent Semantic Indexing (LSI), which leverages both inferred and available annotations to search for semantically similar genes. The suite consists of three components. BioAnnotationPredictor is a computational software module to predict new gene-functions based upon Singular Value Decomposition of available annotations. SimilBio is a Web module that leverages annotations available or predicted by BioAnnotationPredictor to discover similarities between genes via LSI. The suite includes also SemSim, a new Web service built upon these modules to allow accessing them programmatically. We integrated SemSim in the Bio Search Computing framework (http://www.bioinformatics.deib. polimi.it/bio-seco/seco/), where users can exploit the Search Computing technology to run multi-topic complex queries on multiple integrated Web services. Accordingly, researchers may obtain ranked answers involving the computation of the functional similarity between genes in support of biomedical knowledge discovery.

  3. BSSF: a fingerprint based ultrafast binding site similarity search and function analysis server

    Directory of Open Access Journals (Sweden)

    Jiang Hualiang

    2010-01-01

    Full Text Available Abstract Background Genome sequencing and post-genomics projects such as structural genomics are extending the frontier of the study of sequence-structure-function relationship of genes and their products. Although many sequence/structure-based methods have been devised with the aim of deciphering this delicate relationship, there still remain large gaps in this fundamental problem, which continuously drives researchers to develop novel methods to extract relevant information from sequences and structures and to infer the functions of newly identified genes by genomics technology. Results Here we present an ultrafast method, named BSSF(Binding Site Similarity & Function, which enables researchers to conduct similarity searches in a comprehensive three-dimensional binding site database extracted from PDB structures. This method utilizes a fingerprint representation of the binding site and a validated statistical Z-score function scheme to judge the similarity between the query and database items, even if their similarities are only constrained in a sub-pocket. This fingerprint based similarity measurement was also validated on a known binding site dataset by comparing with geometric hashing, which is a standard 3D similarity method. The comparison clearly demonstrated the utility of this ultrafast method. After conducting the database searching, the hit list is further analyzed to provide basic statistical information about the occurrences of Gene Ontology terms and Enzyme Commission numbers, which may benefit researchers by helping them to design further experiments to study the query proteins. Conclusions This ultrafast web-based system will not only help researchers interested in drug design and structural genomics to identify similar binding sites, but also assist them by providing further analysis of hit list from database searching.

  4. SW#db: GPU-Accelerated Exact Sequence Similarity Database Search.

    Directory of Open Access Journals (Sweden)

    Matija Korpar

    Full Text Available In recent years we have witnessed a growth in sequencing yield, the number of samples sequenced, and as a result-the growth of publicly maintained sequence databases. The increase of data present all around has put high requirements on protein similarity search algorithms with two ever-opposite goals: how to keep the running times acceptable while maintaining a high-enough level of sensitivity. The most time consuming step of similarity search are the local alignments between query and database sequences. This step is usually performed using exact local alignment algorithms such as Smith-Waterman. Due to its quadratic time complexity, alignments of a query to the whole database are usually too slow. Therefore, the majority of the protein similarity search methods prior to doing the exact local alignment apply heuristics to reduce the number of possible candidate sequences in the database. However, there is still a need for the alignment of a query sequence to a reduced database. In this paper we present the SW#db tool and a library for fast exact similarity search. Although its running times, as a standalone tool, are comparable to the running times of BLAST, it is primarily intended to be used for exact local alignment phase in which the database of sequences has already been reduced. It uses both GPU and CPU parallelization and was 4-5 times faster than SSEARCH, 6-25 times faster than CUDASW++ and more than 20 times faster than SSW at the time of writing, using multiple queries on Swiss-prot and Uniref90 databases.

  5. Applying Statistical Models and Parametric Distance Measures for Music Similarity Search

    Science.gov (United States)

    Lukashevich, Hanna; Dittmar, Christian; Bastuck, Christoph

    Automatic deriving of similarity relations between music pieces is an inherent field of music information retrieval research. Due to the nearly unrestricted amount of musical data, the real-world similarity search algorithms have to be highly efficient and scalable. The possible solution is to represent each music excerpt with a statistical model (ex. Gaussian mixture model) and thus to reduce the computational costs by applying the parametric distance measures between the models. In this paper we discuss the combinations of applying different parametric modelling techniques and distance measures and weigh the benefits of each one against the others.

  6. Parallel implementation of 3D protein structure similarity searches using a GPU and the CUDA.

    Science.gov (United States)

    Mrozek, Dariusz; Brożek, Miłosz; Małysiak-Mrozek, Bożena

    2014-02-01

    Searching for similar 3D protein structures is one of the primary processes employed in the field of structural bioinformatics. However, the computational complexity of this process means that it is constantly necessary to search for new methods that can perform such a process faster and more efficiently. Finding molecular substructures that complex protein structures have in common is still a challenging task, especially when entire databases containing tens or even hundreds of thousands of protein structures must be scanned. Graphics processing units (GPUs) and general purpose graphics processing units (GPGPUs) can perform many time-consuming and computationally demanding processes much more quickly than a classical CPU can. In this paper, we describe the GPU-based implementation of the CASSERT algorithm for 3D protein structure similarity searching. This algorithm is based on the two-phase alignment of protein structures when matching fragments of the compared proteins. The GPU (GeForce GTX 560Ti: 384 cores, 2GB RAM) implementation of CASSERT ("GPU-CASSERT") parallelizes both alignment phases and yields an average 180-fold increase in speed over its CPU-based, single-core implementation on an Intel Xeon E5620 (2.40GHz, 4 cores). In this paper, we show that massive parallelization of the 3D structure similarity search process on many-core GPU devices can reduce the execution time of the process, allowing it to be performed in real time. GPU-CASSERT is available at: http://zti.polsl.pl/dmrozek/science/gpucassert/cassert.htm.

  7. Prospective and retrospective ECG-gating for CT coronary angiography perform similarly accurate at low heart rates

    Energy Technology Data Exchange (ETDEWEB)

    Stolzmann, Paul, E-mail: paul.stolzmann@usz.ch [Institute of Diagnostic Radiology, University Hospital Zurich, Raemistrasse 100, 8091 Zurich (Switzerland); Goetti, Robert; Baumueller, Stephan [Institute of Diagnostic Radiology, University Hospital Zurich, Raemistrasse 100, 8091 Zurich (Switzerland); Plass, Andre; Falk, Volkmar [Clinic for Cardiovascular Surgery, University Hospital Zurich (Switzerland); Scheffel, Hans; Feuchtner, Gudrun; Marincek, Borut [Institute of Diagnostic Radiology, University Hospital Zurich, Raemistrasse 100, 8091 Zurich (Switzerland); Alkadhi, Hatem [Institute of Diagnostic Radiology, University Hospital Zurich, Raemistrasse 100, 8091 Zurich (Switzerland); Cardiac MR PET CT Program, Massachusetts General Hospital and Harvard Medical School, Boston, MA (United States); Leschka, Sebastian [Institute of Diagnostic Radiology, University Hospital Zurich, Raemistrasse 100, 8091 Zurich (Switzerland)

    2011-07-15

    Objective: To compare, in patients with suspicion of coronary artery disease (CAD) and low heart rates, image quality, diagnostic performance, and radiation dose values of prospectively and retrospectively electrocardiography (ECG)-gated dual-source computed tomography coronary angiography (CTCA) for the diagnosis of significant coronary stenoses. Materials and methods: Two-hundred consecutive patients with heart rates {<=}70 bpm were retrospectively enrolled; 100 patients undergoing prospectively ECG-gated CTCA (group 1) and 100 patients undergoing retrospectively-gated CTCA (group 2). Coronary artery segments were assessed for image quality and significant luminal diameter narrowing. Sensitivity, specificity, positive predictive values (PPV), negative predictive values (NPV), and accuracy of both CTCA groups were determined using conventional catheter angiography (CCA) as reference standard. Radiation dose values were calculated. Results: Both groups were comparable regarding gender, body weight, cardiovascular risk profile, severity of CAD, mean heart rate, heart rate variability, and Agatston score (all p > 0.05). There was no significant difference in the rate of non-assessable coronary segments between group 1 (1.6%, 24/1404) and group 2 (1.4%, 19/1385; p = 0.77); non-diagnostic image quality was significantly (p < 0.001) more often attributed to stair step artifacts in group 1. Segment-based sensitivity, specificity, PPV, NPV, and accuracy were 98%, 98%, 88%, 100%, and 100% among group 1; 96%, 99%, 90%, 100%, and 98% among group 2, respectively. Parameters of diagnostic performance were similar (all p > 0.05). Mean effective radiation dose of prospectively ECG-gated CTCA (2.2 {+-} 0.4 mSv) was significantly (p < 0.0001) smaller than that of retrospectively ECG-gated CTCA (8.1 {+-} 0.6 mSv). Conclusion: Prospectively ECG-gated CTCA yields similar image quality, performs as accurately as retrospectively ECG-gated CTCA in patients having heart rates {<=}70 bpm

  8. Query-seeded iterative sequence similarity searching improves selectivity 5-20-fold.

    Science.gov (United States)

    Pearson, William R; Li, Weizhong; Lopez, Rodrigo

    2017-04-20

    Iterative similarity search programs, like psiblast, jackhmmer, and psisearch, are much more sensitive than pairwise similarity search methods like blast and ssearch because they build a position specific scoring model (a PSSM or HMM) that captures the pattern of sequence conservation characteristic to a protein family. But models are subject to contamination; once an unrelated sequence has been added to the model, homologs of the unrelated sequence will also produce high scores, and the model can diverge from the original protein family. Examination of alignment errors during psiblast PSSM contamination suggested a simple strategy for dramatically reducing PSSM contamination. psiblast PSSMs are built from the query-based multiple sequence alignment (MSA) implied by the pairwise alignments between the query model (PSSM, HMM) and the subject sequences in the library. When the original query sequence residues are inserted into gapped positions in the aligned subject sequence, the resulting PSSM rarely produces alignment over-extensions or alignments to unrelated sequences. This simple step, which tends to anchor the PSSM to the original query sequence and slightly increase target percent identity, can reduce the frequency of false-positive alignments more than 20-fold compared with psiblast and jackhmmer, with little loss in search sensitivity. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. Query-seeded iterative sequence similarity searching improves selectivity 5–20-fold

    Science.gov (United States)

    Li, Weizhong; Lopez, Rodrigo

    2017-01-01

    Abstract Iterative similarity search programs, like psiblast, jackhmmer, and psisearch, are much more sensitive than pairwise similarity search methods like blast and ssearch because they build a position specific scoring model (a PSSM or HMM) that captures the pattern of sequence conservation characteristic to a protein family. But models are subject to contamination; once an unrelated sequence has been added to the model, homologs of the unrelated sequence will also produce high scores, and the model can diverge from the original protein family. Examination of alignment errors during psiblast PSSM contamination suggested a simple strategy for dramatically reducing PSSM contamination. psiblast PSSMs are built from the query-based multiple sequence alignment (MSA) implied by the pairwise alignments between the query model (PSSM, HMM) and the subject sequences in the library. When the original query sequence residues are inserted into gapped positions in the aligned subject sequence, the resulting PSSM rarely produces alignment over-extensions or alignments to unrelated sequences. This simple step, which tends to anchor the PSSM to the original query sequence and slightly increase target percent identity, can reduce the frequency of false-positive alignments more than 20-fold compared with psiblast and jackhmmer, with little loss in search sensitivity. PMID:27923999

  10. Effects of multiple conformers per compound upon 3-D similarity search and bioassay data analysis

    Directory of Open Access Journals (Sweden)

    Kim Sunghwan

    2012-11-01

    Full Text Available Abstract Background To improve the utility of PubChem, a public repository containing biological activities of small molecules, the PubChem3D project adds computationally-derived three-dimensional (3-D descriptions to the small-molecule records contained in the PubChem Compound database and provides various search and analysis tools that exploit 3-D molecular similarity. Therefore, the efficient use of PubChem3D resources requires an understanding of the statistical and biological meaning of computed 3-D molecular similarity scores between molecules. Results The present study investigated effects of employing multiple conformers per compound upon the 3-D similarity scores between ten thousand randomly selected biologically-tested compounds (10-K set and between non-inactive compounds in a given biological assay (156-K set. When the “best-conformer-pair” approach, in which a 3-D similarity score between two compounds is represented by the greatest similarity score among all possible conformer pairs arising from a compound pair, was employed with ten diverse conformers per compound, the average 3-D similarity scores for the 10-K set increased by 0.11, 0.09, 0.15, 0.16, 0.07, and 0.18 for STST-opt, CTST-opt, ComboTST-opt, STCT-opt, CTCT-opt, and ComboTCT-opt, respectively, relative to the corresponding averages computed using a single conformer per compound. Interestingly, the best-conformer-pair approach also increased the average 3-D similarity scores for the non-inactive–non-inactive (NN pairs for a given assay, by comparable amounts to those for the random compound pairs, although some assays showed a pronounced increase in the per-assay NN-pair 3-D similarity scores, compared to the average increase for the random compound pairs. Conclusion These results suggest that the use of ten diverse conformers per compound in PubChem bioassay data analysis using 3-D molecular similarity is not expected to increase the separation of non

  11. Semantic similarity measures in the biomedical domain by leveraging a web search engine.

    Science.gov (United States)

    Hsieh, Sheau-Ling; Chang, Wen-Yung; Chen, Chi-Huang; Weng, Yung-Ching

    2013-07-01

    Various researches in web related semantic similarity measures have been deployed. However, measuring semantic similarity between two terms remains a challenging task. The traditional ontology-based methodologies have a limitation that both concepts must be resided in the same ontology tree(s). Unfortunately, in practice, the assumption is not always applicable. On the other hand, if the corpus is sufficiently adequate, the corpus-based methodologies can overcome the limitation. Now, the web is a continuous and enormous growth corpus. Therefore, a method of estimating semantic similarity is proposed via exploiting the page counts of two biomedical concepts returned by Google AJAX web search engine. The features are extracted as the co-occurrence patterns of two given terms P and Q, by querying P, Q, as well as P AND Q, and the web search hit counts of the defined lexico-syntactic patterns. These similarity scores of different patterns are evaluated, by adapting support vector machines for classification, to leverage the robustness of semantic similarity measures. Experimental results validating against two datasets: dataset 1 provided by A. Hliaoutakis; dataset 2 provided by T. Pedersen, are presented and discussed. In dataset 1, the proposed approach achieves the best correlation coefficient (0.802) under SNOMED-CT. In dataset 2, the proposed method obtains the best correlation coefficient (SNOMED-CT: 0.705; MeSH: 0.723) with physician scores comparing with measures of other methods. However, the correlation coefficients (SNOMED-CT: 0.496; MeSH: 0.539) with coder scores received opposite outcomes. In conclusion, the semantic similarity findings of the proposed method are close to those of physicians' ratings. Furthermore, the study provides a cornerstone investigation for extracting fully relevant information from digitizing, free-text medical records in the National Taiwan University Hospital database.

  12. HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.

    Science.gov (United States)

    O'Driscoll, Aisling; Belogrudov, Vladislav; Carroll, John; Kropp, Kai; Walsh, Paul; Ghazal, Peter; Sleator, Roy D

    2015-04-01

    The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples. Copyright © 2015 Elsevier Inc. All rights reserved.

  13. Efficient Retrieval of Images for Search Engine by Visual Similarity and Re Ranking

    OpenAIRE

    Viswa S S

    2013-01-01

    Nowadays, web scale image search engines (e.g. Google Image Search, Microsoft Live Image Search) rely almost purely on surrounding text features. Users type keywords in hope of finding a certain type of images. The search engine returns thousands of images ranked by the text keywords extracted from the surrounding text. However, many of returned images are noisy, disorganized, or irrelevant. Even Google and Microsoft have no Visual Information for searching of images. Using visual information...

  14. PHOG-BLAST – a new generation tool for fast similarity search of protein families

    Directory of Open Access Journals (Sweden)

    Mironov Andrey A

    2006-06-01

    Full Text Available Abstract Background The need to compare protein profiles frequently arises in various protein research areas: comparison of protein families, domain searches, resolution of orthology and paralogy. The existing fast algorithms can only compare a protein sequence with a protein sequence and a profile with a sequence. Algorithms to compare profiles use dynamic programming and complex scoring functions. Results We developed a new algorithm called PHOG-BLAST for fast similarity search of profiles. This algorithm uses profile discretization to convert a profile to a finite alphabet and utilizes hashing for fast search. To determine the optimal alphabet, we analyzed columns in reliable multiple alignments and obtained column clusters in the 20-dimensional profile space by applying a special clustering procedure. We show that the clustering procedure works best if its parameters are chosen so that 20 profile clusters are obtained which can be interpreted as ancestral amino acid residues. With these clusters, only less than 2% of columns in multiple alignments are out of clusters. We tested the performance of PHOG-BLAST vs. PSI-BLAST on three well-known databases of multiple alignments: COG, PFAM and BALIBASE. On the COG database both algorithms showed the same performance, on PFAM and BALIBASE PHOG-BLAST was much superior to PSI-BLAST. PHOG-BLAST required 10–20 times less computer memory and computation time than PSI-BLAST. Conclusion Since PHOG-BLAST can compare multiple alignments of protein families, it can be used in different areas of comparative proteomics and protein evolution. For example, PHOG-BLAST helped to build the PHOG database of phylogenetic orthologous groups. An essential step in building this database was comparing protein complements of different species and orthologous groups of different taxons on a personal computer in reasonable time. When it is applied to detect weak similarity between protein families, PHOG-BLAST is less

  15. PSimScan: algorithm and utility for fast protein similarity search.

    Directory of Open Access Journals (Sweden)

    Anna Kaznadzey

    Full Text Available In the era of metagenomics and diagnostics sequencing, the importance of protein comparison methods of boosted performance cannot be overstated. Here we present PSimScan (Protein Similarity Scanner, a flexible open source protein similarity search tool which provides a significant gain in speed compared to BLASTP at the price of controlled sensitivity loss. The PSimScan algorithm introduces a number of novel performance optimization methods that can be further used by the community to improve the speed and lower hardware requirements of bioinformatics software. The optimization starts at the lookup table construction, then the initial lookup table-based hits are passed through a pipeline of filtering and aggregation routines of increasing computational complexity. The first step in this pipeline is a novel algorithm that builds and selects 'similarity zones' aggregated from neighboring matches on small arrays of adjacent diagonals. PSimScan performs 5 to 100 times faster than the standard NCBI BLASTP, depending on chosen parameters, and runs on commodity hardware. Its sensitivity and selectivity at the slowest settings are comparable to the NCBI BLASTP's and decrease with the increase of speed, yet stay at the levels reasonable for many tasks. PSimScan is most advantageous when used on large collections of query sequences. Comparing the entire proteome of Streptocuccus pneumoniae (2,042 proteins to the NCBI's non-redundant protein database of 16,971,855 records takes 6.5 hours on a moderately powerful PC, while the same task with the NCBI BLASTP takes over 66 hours. We describe innovations in the PSimScan algorithm in considerable detail to encourage bioinformaticians to improve on the tool and to use the innovations in their own software development.

  16. Similar compounds searching system by using the gene expression microarray database.

    Science.gov (United States)

    Toyoshiba, Hiroyoshi; Sawada, Hiroshi; Naeshiro, Ichiro; Horinouchi, Akira

    2009-04-10

    Numbers of microarrays have been examined and several public and commercial databases have been developed. However, it is not easy to compare in-house microarray data with those in a database because of insufficient reproducibility due to differences in the experimental conditions. As one of the approach to use these databases, we developed the similar compounds searching system (SCSS) on a toxicogenomics database. The datasets of 55 compounds administered to rats in the Toxicogenomics Project (TGP) database in Japan were used in this study. Using the fold-change ranking method developed by Lamb et al. [Lamb, J., Crawford, E.D., Peck, D., Modell, J.W., Blat, I.C., Wrobel, M.J., Lerner, J., Brunet, J.P., Subramanian, A., Ross, K.N., Reich, M., Hieronymus, H., Wei, G., Armstrong, S.A., Haggarty, S.J., Clemons, P.A., Wei, R., Carr, S.A., Lander, E.S., Golub, T.R., 2006. The connectivity map: using gene-expression signatures to connect small molecules, genes, and disease. Science 313, 1929-1935] and criteria called hit ratio, the system let us compare in-house microarray data and those in the database. In-house generated data for clofibrate, phenobarbital, and a proprietary compound were tested to evaluate the performance of the SCSS method. Phenobarbital and clofibrate, which were included in the TGP database, scored highest by the SCSS method. Other high scoring compounds had effects similar to either phenobarbital (a cytochrome P450s inducer) or clofibrate (a peroxisome proliferator). Some of high scoring compounds identified using the proprietary compound-administered rats have been known to cause similar toxicological changes in different species. Our results suggest that the SCSS method could be used in drug discovery and development. Moreover, this method may be a powerful tool to understand the mechanisms by which biological systems respond to various chemical compounds and may also predict adverse effects of new compounds.

  17. Searching for an Accurate Marker-Based Prediction of an Individual Quantitative Trait in Molecular Plant Breeding

    Directory of Open Access Journals (Sweden)

    Yong-Bi Fu

    2017-07-01

    Full Text Available Molecular plant breeding with the aid of molecular markers has played an important role in modern plant breeding over the last two decades. Many marker-based predictions for quantitative traits have been made to enhance parental selection, but the trait prediction accuracy remains generally low, even with the aid of dense, genome-wide SNP markers. To search for more accurate trait-specific prediction with informative SNP markers, we conducted a literature review on the prediction issues in molecular plant breeding and on the applicability of an RNA-Seq technique for developing function-associated specific trait (FAST SNP markers. To understand whether and how FAST SNP markers could enhance trait prediction, we also performed a theoretical reasoning on the effectiveness of these markers in a trait-specific prediction, and verified the reasoning through computer simulation. To the end, the search yielded an alternative to regular genomic selection with FAST SNP markers that could be explored to achieve more accurate trait-specific prediction. Continuous search for better alternatives is encouraged to enhance marker-based predictions for an individual quantitative trait in molecular plant breeding.

  18. Application of 3D Zernike descriptors to shape-based ligand similarity searching

    Directory of Open Access Journals (Sweden)

    Venkatraman Vishwesh

    2009-12-01

    Full Text Available Abstract Background The identification of promising drug leads from a large database of compounds is an important step in the preliminary stages of drug design. Although shape is known to play a key role in the molecular recognition process, its application to virtual screening poses significant hurdles both in terms of the encoding scheme and speed. Results In this study, we have examined the efficacy of the alignment independent three-dimensional Zernike descriptor (3DZD for fast shape based similarity searching. Performance of this approach was compared with several other methods including the statistical moments based ultrafast shape recognition scheme (USR and SIMCOMP, a graph matching algorithm that compares atom environments. Three benchmark datasets are used to thoroughly test the methods in terms of their ability for molecular classification, retrieval rate, and performance under the situation that simulates actual virtual screening tasks over a large pharmaceutical database. The 3DZD performed better than or comparable to the other methods examined, depending on the datasets and evaluation metrics used. Reasons for the success and the failure of the shape based methods for specific cases are investigated. Based on the results for the three datasets, general conclusions are drawn with regard to their efficiency and applicability. Conclusion The 3DZD has unique ability for fast comparison of three-dimensional shape of compounds. Examples analyzed illustrate the advantages and the room for improvements for the 3DZD.

  19. In Search of an Accurate Evaluation of Intrahepatic Cholestasis of Pregnancy

    Directory of Open Access Journals (Sweden)

    Manuela Martinefski

    2012-01-01

    Full Text Available Until now, biochemical parameter for diagnosis of intrahepatic cholestasis of pregnancy (ICP mostly used is the rise of total serum bile acids (TSBA above the upper normal limit of 11 μM. However, differential diagnosis is very difficult since overlapped values calculated on bile acids determinations, are observed in different conditions of pregnancy including the benign condition of pruritus gravidarum. The aim of this work was to determine the better markers in ICP for a precise diagnosis together with parameters associated with severity of symptoms and treatment evaluation. Serum bile acid profiles were evaluated using capillary electrophoresis in 38 healthy pregnant women and 32 ICP patients and it was calculated the sensitivity, specificity, accuracy, predictive values and the relationships of certain individual bile acids in pregnant women in order to replace TSBA determinations. The evaluation of the results shows that LCA and UDCA/LCA ratio provided information for a more complete and accurate diagnosis and evaluation of ICP than calculation of solely TSBA levels in pregnant women.

  20. SIFTER search: a web server for accurate phylogeny-based protein function prediction.

    Science.gov (United States)

    Sahraeian, Sayed M; Luo, Kevin R; Brenner, Steven E

    2015-07-01

    We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access to precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. The SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. O S 2 : Oblivious similarity based searching for encrypted data outsourced to an untrusted domain

    OpenAIRE

    Pervez, Zeeshan; AHMAD, Mahmood; Khattak, Asad Masood; Ramzan, Naeem; Khan, Wajahat Ali

    2017-01-01

    Public cloud storage services are becoming prevalent and myriad data sharing, archiving and collaborative services have emerged which harness the pay-as-you-go business model of public cloud. To ensure privacy and confidentiality often encrypted data is outsourced to such services, which further complicates the process of accessing relevant data by using search queries. Search over encrypted data schemes solve this problem by exploiting cryptographic primitives and secure indexing to identify...

  2. Similarity Digest Search: A Survey and Comparative Analysis of Strategies to Perform Known File Filtering Using Approximate Matching

    Directory of Open Access Journals (Sweden)

    Vitor Hugo Galhardo Moia

    2017-01-01

    Full Text Available Digital forensics is a branch of Computer Science aiming at investigating and analyzing electronic devices in the search for crime evidence. There are several ways to perform this search. Known File Filter (KFF is one of them, where a list of interest objects is used to reduce/separate data for analysis. Holding a database of hashes of such objects, the examiner performs lookups for matches against the target device. However, due to limitations over hash functions (inability to detect similar objects, new methods have been designed, called approximate matching. This sort of function has interesting characteristics for KFF investigations but suffers mainly from high costs when dealing with huge data sets, as the search is usually done by brute force. To mitigate this problem, strategies have been developed to better perform lookups. In this paper, we present the state of the art of similarity digest search strategies, along with a detailed comparison involving several aspects, as time complexity, memory requirement, and search precision. Our results show that none of the approaches address at least these main aspects. Finally, we discuss future directions and present requirements for a new strategy aiming to fulfill current limitations.

  3. SHOP: receptor-based scaffold hopping by GRID-based similarity searches

    DEFF Research Database (Denmark)

    Bergmann, Rikke; Liljefors, Tommy; Sørensen, Morten D

    2009-01-01

    A new field-derived 3D method for receptor-based scaffold hopping, implemented in the software SHOP, is presented. Information from a protein-ligand complex is utilized to substitute a fragment of the ligand with another fragment from a database of synthetically accessible scaffolds. A GRID......-based interaction profile of the receptor and geometrical descriptions of a ligand scaffold are used to obtain new scaffolds with different structural features and are able to replace the original scaffold in the protein-ligand complex. An enrichment study was successfully performed verifying the ability of SHOP...... to find known active CDK2 scaffolds in a database. Additionally, SHOP was used for suggesting new inhibitors of p38 MAP kinase. Four p38 complexes were used to perform six scaffold searches. Several new scaffolds were suggested, and the resulting compounds were successfully docked into the query proteins....

  4. Proposal for a Similar Question Search System on a Q&A Site

    Directory of Open Access Journals (Sweden)

    Katsutoshi Kanamori

    2014-06-01

    Full Text Available There is a service to help Internet users obtain answers to specific questions when they visit a Q&A site. A Q&A site is very useful for the Internet user, but posted questions are often not answered immediately. This delay in answering occurs because in most cases another site user is answering the question manually. In this study, we propose a system that can present a question that is similar to a question posted by a user. An advantage of this system is that a user can refer to an answer to a similar question. This research measures the similarity of a candidate question based on word and dependency parsing. In an experiment, we examined the effectiveness of the proposed system for questions actually posted on the Q&A site. The result indicates that the system can show the questioner the answer to a similar question. However, the system still has a number of aspects that should be improved.

  5. Clinical Diagnostics in Human Genetics with Semantic Similarity Searches in Ontologies

    OpenAIRE

    Köhler, Sebastian; Schulz, Marcel H.; Krawitz, Peter; Bauer, Sebastian; Dölken, Sandra; Ott, Claus E.; Mundlos, Christine; Horn, Denise; Mundlos, Stefan; Robinson, Peter N

    2009-01-01

    The differential diagnostic process attempts to identify candidate diseases that best explain a set of clinical features. This process can be complicated by the fact that the features can have varying degrees of specificity, as well as by the presence of features unrelated to the disease itself. Depending on the experience of the physician and the availability of laboratory tests, clinical abnormalities may be described in greater or lesser detail. We have adapted semantic similarity metrics ...

  6. MetrIntSimil—An Accurate and Robust Metric for Comparison of Similarity in Intelligence of Any Number of Cooperative Multiagent Systems

    Directory of Open Access Journals (Sweden)

    Laszlo Barna Iantovics

    2018-02-01

    Full Text Available Intelligent cooperative multiagent systems are applied for solving a large range of real-life problems, including in domains like biology and healthcare. There are very few metrics able to make an effective measure of the machine intelligence quotient. The most important drawbacks of the designed metrics presented in the scientific literature consist in the limitation in universality, accuracy, and robustness. In this paper, we propose a novel universal metric called MetrIntSimil capable of making an accurate and robust symmetric comparison of the similarity in intelligence of any number of cooperative multiagent systems specialized in difficult problem solving. The universality is an important necessary property based on the large variety of designed intelligent systems. MetrIntSimil makes a comparison by taking into consideration the variability in intelligence in the problem solving of the compared cooperative multiagent systems. It allows a classification of the cooperative multiagent systems based on their similarity in intelligence. A cooperative multiagent system has variability in the problem solving intelligence, and it can manifest lower or higher intelligence in different problem solving tasks. More cooperative multiagent systems with similar intelligence can be included in the same class. For the evaluation of the proposed metric, we conducted a case study for more intelligent cooperative multiagent systems composed of simple computing agents applied for solving the Symmetric Travelling Salesman Problem (STSP that is a class of NP-hard problems. STSP is the problem of finding the shortest Hamiltonian cycle/tour in a weighted undirected graph that does not have loops or multiple edges. The distance between two cities is the same in each opposite direction. Two classes of similar intelligence denoted IntClassA and IntClassB were identified. The experimental results show that the agent belonging to IntClassA intelligence class is less

  7. SymDex: increasing the efficiency of chemical fingerprint similarity searches for comparing large chemical libraries by using query set indexing.

    Science.gov (United States)

    Tai, David; Fang, Jianwen

    2012-08-27

    The large sizes of today's chemical databases require efficient algorithms to perform similarity searches. It can be very time consuming to compare two large chemical databases. This paper seeks to build upon existing research efforts by describing a novel strategy for accelerating existing search algorithms for comparing large chemical collections. The quest for efficiency has focused on developing better indexing algorithms by creating heuristics for searching individual chemical against a chemical library by detecting and eliminating needless similarity calculations. For comparing two chemical collections, these algorithms simply execute searches for each chemical in the query set sequentially. The strategy presented in this paper achieves a speedup upon these algorithms by indexing the set of all query chemicals so redundant calculations that arise in the case of sequential searches are eliminated. We implement this novel algorithm by developing a similarity search program called Symmetric inDexing or SymDex. SymDex shows over a 232% maximum speedup compared to the state-of-the-art single query search algorithm over real data for various fingerprint lengths. Considerable speedup is even seen for batch searches where query set sizes are relatively small compared to typical database sizes. To the best of our knowledge, SymDex is the first search algorithm designed specifically for comparing chemical libraries. It can be adapted to most, if not all, existing indexing algorithms and shows potential for accelerating future similarity search algorithms for comparing chemical databases.

  8. Devices used by automated milking systems are similarly accurate in estimating milk yield and in collecting a representative milk sample compared with devices used by farms with conventional milk recording

    NARCIS (Netherlands)

    Kamphuis, Claudia; Dela Rue, B.; Turner, S.A.; Petch, S.

    2015-01-01

    Information on accuracy of milk-sampling devices used on farms with automated milking systems (AMS) is essential for development of milk recording protocols. The hypotheses of this study were (1) devices used by AMS units are similarly accurate in estimating milk yield and in collecting

  9. eF-seek: prediction of the functional sites of proteins by searching for similar electrostatic potential and molecular surface shape

    Science.gov (United States)

    Kinoshita, Kengo; Murakami, Yoichi; Nakamura, Haruki

    2007-01-01

    We have developed a method to predict ligand-binding sites in a new protein structure by searching for similar binding sites in the Protein Data Bank (PDB). The similarities are measured according to the shapes of the molecular surfaces and their electrostatic potentials. A new web server, eF-seek, provides an interface to our search method. It simply requires a coordinate file in the PDB format, and generates a prediction result as a virtual complex structure, with the putative ligands in a PDB format file as the output. In addition, the predicted interacting interface is displayed to facilitate the examination of the virtual complex structure on our own applet viewer with the web browser (URL: http://eF-site.hgc.jp/eF-seek). PMID:17567616

  10. SysFinder: A customized platform for search, comparison and assisted design of appropriate animal models based on systematic similarity.

    Science.gov (United States)

    Yang, Shuang; Zhang, Guoqing; Liu, Wan; Wang, Zhen; Zhang, Jifeng; Yang, Dongshan; Chen, Y Eugene; Sun, Hong; Li, Yixue

    2017-05-20

    Animal models are increasingly gaining values by cross-comparisons of response or resistance to clinical agents used for patients. However, many disease mechanisms and drug effects generated from animal models are not transferable to human. To address these issues, we developed SysFinder (http://lifecenter.sgst.cn/SysFinder), a platform for scientists to find appropriate animal models for translational research. SysFinder offers a "topic-centered" approach for systematic comparisons of human genes, whose functions are involved in a specific scientific topic, to the corresponding homologous genes of animal models. Scientific topic can be a certain disease, drug, gene function or biological pathway. SysFinder calculates multi-level similarity indexes to evaluate the similarities between human and animal models in specified scientific topics. Meanwhile, SysFinder offers species-specific information to investigate the differences in molecular mechanisms between humans and animal models. Furthermore, SysFinder provides a user-friendly platform for determination of short guide RNAs (sgRNAs) and homology arms to design a new animal model. Case studies illustrate the ability of SysFinder in helping experimental scientists. SysFinder is a useful platform for experimental scientists to carry out their research in the human molecular mechanisms. Copyright © 2017 Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, and Genetics Society of China. Published by Elsevier Ltd. All rights reserved.

  11. A systematic in silico search for target similarity identifies several approved drugs with potential activity against the Plasmodium falciparum apicoplast.

    Directory of Open Access Journals (Sweden)

    Nadlla Alves Bispo

    Full Text Available Most of the drugs in use against Plasmodium falciparum share similar modes of action and, consequently, there is a need to identify alternative potential drug targets. Here, we focus on the apicoplast, a malarial plastid-like organelle of algal source which evolved through secondary endosymbiosis. We undertake a systematic in silico target-based identification approach for detecting drugs already approved for clinical use in humans that may be able to interfere with the P. falciparum apicoplast. The P. falciparum genome database GeneDB was used to compile a list of ≈600 proteins containing apicoplast signal peptides. Each of these proteins was treated as a potential drug target and its predicted sequence was used to interrogate three different freely available databases (Therapeutic Target Database, DrugBank and STITCH3.1 that provide synoptic data on drugs and their primary or putative drug targets. We were able to identify several drugs that are expected to interact with forty-seven (47 peptides predicted to be involved in the biology of the P. falciparum apicoplast. Fifteen (15 of these putative targets are predicted to have affinity to drugs that are already approved for clinical use but have never been evaluated against malaria parasites. We suggest that some of these drugs should be experimentally tested and/or serve as leads for engineering new antimalarials.

  12. Design of a bioactive small molecule that targets the myotonic dystrophy type 1 RNA via an RNA motif-ligand database and chemical similarity searching.

    Science.gov (United States)

    Parkesh, Raman; Childs-Disney, Jessica L; Nakamori, Masayuki; Kumar, Amit; Wang, Eric; Wang, Thomas; Hoskins, Jason; Tran, Tuan; Housman, David; Thornton, Charles A; Disney, Matthew D

    2012-03-14

    Myotonic dystrophy type 1 (DM1) is a triplet repeating disorder caused by expanded CTG repeats in the 3'-untranslated region of the dystrophia myotonica protein kinase (DMPK) gene. The transcribed repeats fold into an RNA hairpin with multiple copies of a 5'CUG/3'GUC motif that binds the RNA splicing regulator muscleblind-like 1 protein (MBNL1). Sequestration of MBNL1 by expanded r(CUG) repeats causes splicing defects in a subset of pre-mRNAs including the insulin receptor, the muscle-specific chloride ion channel, sarco(endo)plasmic reticulum Ca(2+) ATPase 1, and cardiac troponin T. Based on these observations, the development of small-molecule ligands that target specifically expanded DM1 repeats could be of use as therapeutics. In the present study, chemical similarity searching was employed to improve the efficacy of pentamidine and Hoechst 33258 ligands that have been shown previously to target the DM1 triplet repeat. A series of in vitro inhibitors of the RNA-protein complex were identified with low micromolar IC(50)'s, which are >20-fold more potent than the query compounds. Importantly, a bis-benzimidazole identified from the Hoechst query improves DM1-associated pre-mRNA splicing defects in cell and mouse models of DM1 (when dosed with 1 mM and 100 mg/kg, respectively). Since Hoechst 33258 was identified as a DM1 binder through analysis of an RNA motif-ligand database, these studies suggest that lead ligands targeting RNA with improved biological activity can be identified by using a synergistic approach that combines analysis of known RNA-ligand interactions with chemical similarity searching.

  13. Discovery and electrophysiological characterization of SKF-32802: A novel hERG agonist found through a large-scale structural similarity search.

    Science.gov (United States)

    Donovan, Brian T; Bandyopadhyay, Deepak; Duraiswami, Chaya; Nixon, Christopher J; Townsend, Claire Y; Martens, Stan F

    2017-10-16

    Despite the importance of the hERG channel in drug discovery and the sizable number of antagonist molecules discovered, only a few hERG agonists have been discovered. Here we report a novel hERG agonist; SKF-32802 and a structural analog of the agonist NS3623, SB-335573. These were discovered through a similarity search of published hERG agonists. SKF-32802 incorporates an amide linker rather than NS3623's urea, resulting in a compound with a different mechanism of action. We find that both compounds decrease the time constant of open channel kinetics, increase the amplitude of the envelope of tails assay, mildly increased the amplitude of the IV curve, bind the hERG channel in either open or closed states, increase the plateau of the voltage dependence of activation and modulate the effects of the hERG antagonist, quinidine. Neither compound affects inactivation nor deactivation kinetics, a property unique among hERG agonists. Additionally, SKF-32802 induces a leftward shift in the voltage dependence of activation. Our structural models show that both compounds make strong bridging interactions with multiple channel subunits and are stabilized by internal hydrogen bonding similar to NS3623, PD-307243 and RPR26024. While SB-335573 binds in a nearly identical fashion as NS3623, SKF-32802 makes an additional hydrogen bond with neighboring threonine 623. In summary, SB-335573 is a type 4 agonist which increases open channel probability while SKF-32802 is a type 3 agonist which induces a leftward shift in the voltage dependence of activation. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. A spin transfer torque magnetoresistance random access memory-based high-density and ultralow-power associative memory for fully data-adaptive nearest neighbor search with current-mode similarity evaluation and time-domain minimum searching

    Science.gov (United States)

    Ma, Yitao; Miura, Sadahiko; Honjo, Hiroaki; Ikeda, Shoji; Hanyu, Takahiro; Ohno, Hideo; Endoh, Tetsuo

    2017-04-01

    A high-density nonvolatile associative memory (NV-AM) based on spin transfer torque magnetoresistive random access memory (STT-MRAM), which achieves highly concurrent and ultralow-power nearest neighbor search with full adaptivity of the template data format, has been proposed and fabricated using the 90 nm CMOS/70 nm perpendicular-magnetic-tunnel-junction hybrid process. A truly compact current-mode circuitry is developed to realize flexibly controllable and high-parallel similarity evaluation, which makes the NV-AM adaptable to any dimensionality and component-bit of template data. A compact dual-stage time-domain minimum searching circuit is also developed, which can freely extend the system for more template data by connecting multiple NM-AM cores without additional circuits for integrated processing. Both the embedded STT-MRAM module and the computing circuit modules in this NV-AM chip are synchronously power-gated to completely eliminate standby power and maximally reduce operation power by only activating the currently accessed circuit blocks. The operations of a prototype chip at 40 MHz are demonstrated by measurement. The average operation power is only 130 µW, and the circuit density is less than 11 µm2/bit. Compared with the latest conventional works in both volatile and nonvolatile approaches, more than 31.3% circuit area reductions and 99.2% power improvements are achieved, respectively. Further power performance analyses are discussed, which verify the special superiority of the proposed NV-AM in low-power and large-memory-based VLSIs.

  15. Combination of 2D/3D Ligand-Based Similarity Search in Rapid Virtual Screening from Multimillion Compound Repositories. Selection and Biological Evaluation of Potential PDE4 and PDE5 Inhibitors

    Directory of Open Access Journals (Sweden)

    Krisztina Dobi

    2014-05-01

    Full Text Available Rapid in silico selection of target focused libraries from commercial repositories is an attractive and cost effective approach. If structures of active compounds are available rapid 2D similarity search can be performed on multimillion compound databases but the generated library requires further focusing by various 2D/3D chemoinformatics tools. We report here a combination of the 2D approach with a ligand-based 3D method (Screen3D which applies flexible matching to align reference and target compounds in a dynamic manner and thus to assess their structural and conformational similarity. In the first case study we compared the 2D and 3D similarity scores on an existing dataset derived from the biological evaluation of a PDE5 focused library. Based on the obtained similarity metrices a fusion score was proposed. The fusion score was applied to refine the 2D similarity search in a second case study where we aimed at selecting and evaluating a PDE4B focused library. The application of this fused 2D/3D similarity measure led to an increase of the hit rate from 8.5% (1st round, 47% inhibition at 10 µM to 28.5% (2nd round at 50% inhibition at 10 µM and the best two hits had 53 nM inhibitory activities.

  16. Charting the cellular and extracellular proteome analysis of Brevibacterium linens DSM 20158 with unsequenced genome by mass spectrometry-driven sequence similarity searches.

    Science.gov (United States)

    Shabbiri, Khadija; Botting, Catherine H; Adnan, Ahmad; Fuszard, Matthew

    2013-05-27

    Brevibacterium linens DSM 20158 is an industrially important actinobacterium which is well-known for the production of amino acids and enzymes. However, as this strain has an unsequenced genome, there is no detailed information regarding its proteome although another strain of this microbe, BL2, has a shotgun genome sequence. However, this still does not cover the entire scope of its proteome. The present study is carried out by first identifying proteins by homology matches using the Mascot search algorithm followed by an advanced approach using de novo sequencing and MS BLAST to expand the B. linens proteome. The proteins identified in the secretome and cellular portion appear to be involved in various metabolic and physiological processes of this unsequenced organism. This study will help to enhance the usability of this strain of B. linens in different areas of research in the future rather than mainly in the food industries. The present study describes the construction of the first detailed proteomic reference map of B. linens DSM 20158 with unsequenced genome by comparative proteome research analysis. This opens new horizons in proteomics to understand the role of proteins involved in the metabolism and physiology of other organisms with unsequenced genomes. Copyright © 2013 Elsevier B.V. All rights reserved.

  17. Personalized Search

    CERN Document Server

    AUTHOR|(SzGeCERN)749939

    2015-01-01

    As the volume of electronically available information grows, relevant items become harder to find. This work presents an approach to personalizing search results in scientific publication databases. This work focuses on re-ranking search results from existing search engines like Solr or ElasticSearch. This work also includes the development of Obelix, a new recommendation system used to re-rank search results. The project was proposed and performed at CERN, using the scientific publications available on the CERN Document Server (CDS). This work experiments with re-ranking using offline and online evaluation of users and documents in CDS. The experiments conclude that the personalized search result outperform both latest first and word similarity in terms of click position in the search result for global search in CDS.

  18. ESG: extended similarity group method for automated protein function prediction.

    Science.gov (United States)

    Chitale, Meghana; Hawkins, Troy; Park, Changsoon; Kihara, Daisuke

    2009-07-15

    Importance of accurate automatic protein function prediction is ever increasing in the face of a large number of newly sequenced genomes and proteomics data that are awaiting biological interpretation. Conventional methods have focused on high sequence similarity-based annotation transfer which relies on the concept of homology. However, many cases have been reported that simple transfer of function from top hits of a homology search causes erroneous annotation. New methods are required to handle the sequence similarity in a more robust way to combine together signals from strongly and weakly similar proteins for effectively predicting function for unknown proteins with high reliability. We present the extended similarity group (ESG) method, which performs iterative sequence database searches and annotates a query sequence with Gene Ontology terms. Each annotation is assigned with probability based on its relative similarity score with the multiple-level neighbors in the protein similarity graph. We will depict how the statistical framework of ESG improves the prediction accuracy by iteratively taking into account the neighborhood of query protein in the sequence similarity space. ESG outperforms conventional PSI-BLAST and the protein function prediction (PFP) algorithm. It is found that the iterative search is effective in capturing multiple-domains in a query protein, enabling accurately predicting several functions which originate from different domains. ESG web server is available for automated protein function prediction at http://dragon.bio.purdue.edu/ESG/.

  19. Proteomic analysis of cellular soluble proteins from human bronchial smooth muscle cells by combining nondenaturing micro 2DE and quantitative LC-MS/MS. 2. Similarity search between protein maps for the analysis of protein complexes.

    Science.gov (United States)

    Jin, Ya; Yuan, Qi; Zhang, Jun; Manabe, Takashi; Tan, Wen

    2015-09-01

    Human bronchial smooth muscle cell soluble proteins were analyzed by a combined method of nondenaturing micro 2DE, grid gel-cutting, and quantitative LC-MS/MS and a native protein map was prepared for each of the identified 4323 proteins [1]. A method to evaluate the degree of similarity between the protein maps was developed since we expected the proteins comprising a protein complex would be separated together under nondenaturing conditions. The following procedure was employed using Excel macros; (i) maps that have three or more squares with protein quantity data were selected (2328 maps), (ii) within each map, the quantity values of the squares were normalized setting the highest value to be 1.0, (iii) in comparing a map with another map, the smaller normalized quantity in two corresponding squares was taken and summed throughout the map to give an "overlap score," (iv) each map was compared against all the 2328 maps and the largest overlap score, obtained when a map was compared with itself, was set to be 1.0 thus providing 2328 "overlap factors," (v) step (iv) was repeated for all maps providing 2328 × 2328 matrix of overlap factors. From the matrix, protein pairs that showed overlap factors above 0.65 from both protein sides were selected (431 protein pairs). Each protein pair was searched in a database (UniProtKB) on complex formation and 301 protein pairs, which comprise 35 protein complexes, were found to be documented. These results demonstrated that native protein maps and their similarity search would enable simultaneous analysis of multiple protein complexes in cells. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  20. Proteomic analysis of cellular soluble proteins from human bronchial smooth muscle cells by combining nondenaturing micro 2DE and quantitative LC‐MS/MS. 2. Similarity search between protein maps for the analysis of protein complexes

    Science.gov (United States)

    Jin, Ya; Yuan, Qi; Zhang, Jun; Manabe, Takashi

    2015-01-01

    Human bronchial smooth muscle cell soluble proteins were analyzed by a combined method of nondenaturing micro 2DE, grid gel‐cutting, and quantitative LC‐MS/MS and a native protein map was prepared for each of the identified 4323 proteins [1]. A method to evaluate the degree of similarity between the protein maps was developed since we expected the proteins comprising a protein complex would be separated together under nondenaturing conditions. The following procedure was employed using Excel macros; (i) maps that have three or more squares with protein quantity data were selected (2328 maps), (ii) within each map, the quantity values of the squares were normalized setting the highest value to be 1.0, (iii) in comparing a map with another map, the smaller normalized quantity in two corresponding squares was taken and summed throughout the map to give an “overlap score,” (iv) each map was compared against all the 2328 maps and the largest overlap score, obtained when a map was compared with itself, was set to be 1.0 thus providing 2328 “overlap factors,” (v) step (iv) was repeated for all maps providing 2328 × 2328 matrix of overlap factors. From the matrix, protein pairs that showed overlap factors above 0.65 from both protein sides were selected (431 protein pairs). Each protein pair was searched in a database (UniProtKB) on complex formation and 301 protein pairs, which comprise 35 protein complexes, were found to be documented. These results demonstrated that native protein maps and their similarity search would enable simultaneous analysis of multiple protein complexes in cells. PMID:26031785

  1. Applying ligands profiling using multiple extended electron distribution based field templates and feature trees similarity searching in the discovery of new generation of urea-based antineoplastic kinase inhibitors.

    Directory of Open Access Journals (Sweden)

    Eman M Dokla

    Full Text Available This study provides a comprehensive computational procedure for the discovery of novel urea-based antineoplastic kinase inhibitors while focusing on diversification of both chemotype and selectivity pattern. It presents a systematic structural analysis of the different binding motifs of urea-based kinase inhibitors and the corresponding configurations of the kinase enzymes. The computational model depends on simultaneous application of two protocols. The first protocol applies multiple consecutive validated virtual screening filters including SMARTS, support vector-machine model (ROC = 0.98, Bayesian model (ROC = 0.86 and structure-based pharmacophore filters based on urea-based kinase inhibitors complexes retrieved from literature. This is followed by hits profiling against different extended electron distribution (XED based field templates representing different kinase targets. The second protocol enables cancericidal activity verification by using the algorithm of feature trees (Ftrees similarity searching against NCI database. Being a proof-of-concept study, this combined procedure was experimentally validated by its utilization in developing a novel series of urea-based derivatives of strong anticancer activity. This new series is based on 3-benzylbenzo[d]thiazol-2(3H-one scaffold which has interesting chemical feasibility and wide diversification capability. Antineoplastic activity of this series was assayed in vitro against NCI 60 tumor-cell lines showing very strong inhibition of GI(50 as low as 0.9 uM. Additionally, its mechanism was unleashed using KINEX™ protein kinase microarray-based small molecule inhibitor profiling platform and cell cycle analysis showing a peculiar selectivity pattern against Zap70, c-src, Mink1, csk and MeKK2 kinases. Interestingly, it showed activity on syk kinase confirming the recent studies finding of the high activity of diphenyl urea containing compounds against this kinase. Allover, the new series

  2. Efficient protein alignment algorithm for protein search.

    Science.gov (United States)

    Lu, Zaixin; Zhao, Zhiyu; Fu, Bin

    2010-01-18

    Proteins show a great variety of 3D conformations, which can be used to infer their evolutionary relationship and to classify them into more general groups; therefore protein structure alignment algorithms are very helpful for protein biologists. However, an accurate alignment algorithm itself may be insufficient for effective discovering of structural relationships among tens of thousands of proteins. Due to the exponentially increasing amount of protein structural data, a fast and accurate structure alignment tool is necessary to access protein classification and protein similarity search; however, the complexity of current alignment algorithms are usually too high to make a fully alignment-based classification and search practical. We have developed an efficient protein pairwise alignment algorithm and applied it to our protein search tool, which aligns a query protein structure in the pairwise manner with all protein structures in the Protein Data Bank (PDB) to output similar protein structures. The algorithm can align hundreds of pairs of protein structures in one second. Given a protein structure, the tool efficiently discovers similar structures from tens of thousands of structures stored in the PDB always in 2 minutes in a single machine and 20 seconds in our cluster of 6 machines. The algorithm has been fully implemented and is accessible online at our webserver, which is supported by a cluster of computers. Our algorithm can work out hundreds of pairs of protein alignments in one second. Therefore, it is very suitable for protein search. Our experimental results show that it is more accurate than other well known protein search systems in finding proteins which are structurally similar at SCOP family and superfamily levels, and its speed is also competitive with those systems. In terms of the pairwise alignment performance, it is as good as some well known alignment algorithms.

  3. Grading More Accurately

    Science.gov (United States)

    Rom, Mark Carl

    2011-01-01

    Grades matter. College grading systems, however, are often ad hoc and prone to mistakes. This essay focuses on one factor that contributes to high-quality grading systems: grading accuracy (or "efficiency"). I proceed in several steps. First, I discuss the elements of "efficient" (i.e., accurate) grading. Next, I present analytical results…

  4. Custom Search Engines: Tools & Tips

    Science.gov (United States)

    Notess, Greg R.

    2008-01-01

    Few have the resources to build a Google or Yahoo! from scratch. Yet anyone can build a search engine based on a subset of the large search engines' databases. Use Google Custom Search Engine or Yahoo! Search Builder or any of the other similar programs to create a vertical search engine targeting sites of interest to users. The basic steps to…

  5. SETI The Search for Extraterrestrial Intelligence.

    Science.gov (United States)

    Jones, Barrie W.

    1991-01-01

    Discussed is the search for life on other planets similar to Earth based on the Drake equation. Described are search strategies and microwave searches. The reasons why people are searching are also discussed. (KR)

  6. Domain similarity based orthology detection.

    Science.gov (United States)

    Bitard-Feildel, Tristan; Kemena, Carsten; Greenwood, Jenny M; Bornberg-Bauer, Erich

    2015-05-13

    Orthologous protein detection software mostly uses pairwise comparisons of amino-acid sequences to assert whether two proteins are orthologous or not. Accordingly, when the number of sequences for comparison increases, the number of comparisons to compute grows in a quadratic order. A current challenge of bioinformatic research, especially when taking into account the increasing number of sequenced organisms available, is to make this ever-growing number of comparisons computationally feasible in a reasonable amount of time. We propose to speed up the detection of orthologous proteins by using strings of domains to characterize the proteins. We present two new protein similarity measures, a cosine and a maximal weight matching score based on domain content similarity, and new software, named porthoDom. The qualities of the cosine and the maximal weight matching similarity measures are compared against curated datasets. The measures show that domain content similarities are able to correctly group proteins into their families. Accordingly, the cosine similarity measure is used inside porthoDom, the wrapper developed for proteinortho. porthoDom makes use of domain content similarity measures to group proteins together before searching for orthologs. By using domains instead of amino acid sequences, the reduction of the search space decreases the computational complexity of an all-against-all sequence comparison. We demonstrate that representing and comparing proteins as strings of discrete domains, i.e. as a concatenation of their unique identifiers, allows a drastic simplification of search space. porthoDom has the advantage of speeding up orthology detection while maintaining a degree of accuracy similar to proteinortho. The implementation of porthoDom is released using python and C++ languages and is available under the GNU GPL licence 3 at http://www.bornberglab.org/pages/porthoda .

  7. Kolmogorov's Lagrangian similarity law revisited

    Science.gov (United States)

    Barjona, Manuel; da Silva, Carlos B.

    2017-10-01

    Kolmogorov's similarity turbulence theory in a Lagrangian frame is assessed with new direct numerical simulations of isotropic turbulence with and without hyperviscosity, which attain higher Reynolds numbers than previously available. It is demonstrated that hyperviscous simulations can be used to accurately predict the second order Lagrangian velocity structure function (LVSF-2) in the inertial range, by using an original new procedure. The results strongly support Kolmogorov's Lagrangian similarity assumption and allow the universal constant of LVSF-2 to be computed with a new level of confidence with C0=7.4 ±0.2 .

  8. Efficient protein structure search using indexing methods.

    Science.gov (United States)

    Kim, Sungchul; Sael, Lee; Yu, Hwanjo

    2013-01-01

    Understanding functions of proteins is one of the most important challenges in many studies of biological processes. The function of a protein can be predicted by analyzing the functions of structurally similar proteins, thus finding structurally similar proteins accurately and efficiently from a large set of proteins is crucial. A protein structure can be represented as a vector by 3D-Zernike Descriptor (3DZD) which compactly represents the surface shape of the protein tertiary structure. This simplified representation accelerates the searching process. However, computing the similarity of two protein structures is still computationally expensive, thus it is hard to efficiently process many simultaneous requests of structurally similar protein search. This paper proposes indexing techniques which substantially reduce the search time to find structurally similar proteins. In particular, we first exploit two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. After that, we extend the techniques to further improve the search speed for protein structures. The extended indexing techniques build and utilize an reduced index constructed from the first few attributes of 3DZDs of protein structures. To retrieve top-k similar structures, top-10 × k similar structures are first found using the reduced index, and top-k structures are selected among them. We also modify the indexing techniques to support θ-based nearest neighbor search, which returns data points less than θ to the query point. The results show that both iDistance and iKernel significantly enhance the searching speed. In top-k nearest neighbor search, the searching time is reduced 69.6%, 77%, 77.4% and 87.9%, respectively using iDistance, iKernel, the extended iDistance, and the extended iKernel. In θ-based nearest neighbor serach, the searching time is reduced 80%, 81%, 95.6% and 95.6% using iDistance, iKernel, the extended iDistance, and the extended iKernel, respectively.

  9. BIOACCESSIBILITY TESTS ACCURATELY ESTIMATE ...

    Science.gov (United States)

    Hazards of soil-borne Pb to wild birds may be more accurately quantified if the bioavailability of that Pb is known. To better understand the bioavailability of Pb to birds, we measured blood Pb concentrations in Japanese quail (Coturnix japonica) fed diets containing Pb-contaminated soils. Relative bioavailabilities were expressed by comparison with blood Pb concentrations in quail fed a Pb acetate reference diet. Diets containing soil from five Pb-contaminated Superfund sites had relative bioavailabilities from 33%-63%, with a mean of about 50%. Treatment of two of the soils with P significantly reduced the bioavailability of Pb. The bioaccessibility of the Pb in the test soils was then measured in six in vitro tests and regressed on bioavailability. They were: the “Relative Bioavailability Leaching Procedure” (RBALP) at pH 1.5, the same test conducted at pH 2.5, the “Ohio State University In vitro Gastrointestinal” method (OSU IVG), the “Urban Soil Bioaccessible Lead Test”, the modified “Physiologically Based Extraction Test” and the “Waterfowl Physiologically Based Extraction Test.” All regressions had positive slopes. Based on criteria of slope and coefficient of determination, the RBALP pH 2.5 and OSU IVG tests performed very well. Speciation by X-ray absorption spectroscopy demonstrated that, on average, most of the Pb in the sampled soils was sorbed to minerals (30%), bound to organic matter 24%, or present as Pb sulfate 18%. Ad

  10. In search of late time evolution self-similar scaling laws of Rayleigh-Taylor and Richtmyer-Meshkov hydrodynamic instabilities - recent theorical advance and NIF Discovery-Science experiments

    Science.gov (United States)

    Shvarts, Dov

    2017-10-01

    Hydrodynamic instabilities, and the mixing that they cause, are of crucial importance in describing many phenomena, from very large scales such as stellar explosions (supernovae) to very small scales, such as inertial confinement fusion (ICF) implosions. Such mixing causes the ejection of stellar core material in supernovae, and impedes attempts at ICF ignition. The Rayleigh-Taylor instability (RTI) occurs at an accelerated interface between two fluids with the lower density accelerating the higher density fluid. The Richtmyer-Meshkov (RM) instability occurs when a shock wave passes an interface between the two fluids of different density. In the RTI, buoyancy causes ``bubbles'' of the light fluid to rise through (penetrate) the denser fluid, while ``spikes'' of the heavy fluid sink through (penetrate) the lighter fluid. With realistic multi-mode initial conditions, in the deep nonlinear regime, the mixing zone width, H, and its internal structure, progress through an inverse cascade of spatial scales, reaching an asymptotic self-similar evolution: hRT =αRT Agt2 for RT and hRM =αRM tθ for RM. While this characteristic behavior has been known for years, the self-similar parameters αRT and θRM and their dependence on dimensionality and density ratio have continued to be intensively studied and a relatively wide distribution of those values have emerged. This talk will describe recent theoretical advances in the description of this turbulent mixing evolution that sheds light on the spread in αRT and θRM. Results of new and specially designed experiments, done by scientists from several laboratories, were performed recently using NIF, the only facility that is powerful enough to reach the self-similar regime, for quantitative testing of this theoretical advance, will be presented.

  11. Cube search, revisited

    Science.gov (United States)

    Zhang, Xuetao; Huang, Jie; Yigit-Elliott, Serap; Rosenholtz, Ruth

    2015-01-01

    Observers can quickly search among shaded cubes for one lit from a unique direction. However, replace the cubes with similar 2-D patterns that do not appear to have a 3-D shape, and search difficulty increases. These results have challenged models of visual search and attention. We demonstrate that cube search displays differ from those with “equivalent” 2-D search items in terms of the informativeness of fairly low-level image statistics. This informativeness predicts peripheral discriminability of target-present from target-absent patches, which in turn predicts visual search performance, across a wide range of conditions. Comparing model performance on a number of classic search tasks, cube search does not appear unexpectedly easy. Easy cube search, per se, does not provide evidence for preattentive computation of 3-D scene properties. However, search asymmetries derived from rotating and/or flipping the cube search displays cannot be explained by the information in our current set of image statistics. This may merely suggest a need to modify the model's set of 2-D image statistics. Alternatively, it may be difficult cube search that provides evidence for preattentive computation of 3-D scene properties. By attributing 2-D luminance variations to a shaded 3-D shape, 3-D scene understanding may slow search for 2-D features of the target. PMID:25780063

  12. Groundwater recharge: Accurately representing evapotranspiration

    CSIR Research Space (South Africa)

    Bugan, Richard DH

    2011-09-01

    Full Text Available Groundwater recharge is the basis for accurate estimation of groundwater resources, for determining the modes of water allocation and groundwater resource susceptibility to climate change. Accurate estimations of groundwater recharge with models...

  13. The effect of proximity, Tall Man lettering, and time pressure on accurate visual perception of drug names.

    Science.gov (United States)

    Irwin, Amy; Mearns, Kathryn; Watson, Margaret; Urquhart, Jim

    2013-04-01

    The aim of this study is to assess the effect of proximity and time pressure on accurate and effective visual search during medication selection from a computer screen. The presence of multiple similar objects in proximity to a target object increases the difficulty of a visual search. Visual similarity between drug names can also lead to selection error. The proximity of several similarly named drugs within a visual field could, therefore, adversely affect visual search. In Study 1, 60 nonpharmacy participants selected a target drug name from an array of mock drug packets shown on a computer screen, where one or four similarly named nontargets might be present. Of the participants, 30 completed the task with a time constraint, and the remainder did not. In Study 2, the same experiment was repeated with 28 pharmacy staff. In Study 1, the proximity of multiple similarly named nontargets within the specified visual field reduced selection accuracy and increased reaction times in the nonpharmacists. Time constraint also had an adverse effect. In Study 2, the pharmacy participants showed increased reaction times when multiple nontargets were present, but the time constraint had no effect. There was no effect of Tall Man lettering. The presence of multiple similarly named medications in close proximity to a target medication increases the difficulty of the visual search for the target. Tall Man lettering has no impact on this adverse effect. The widespread use of the alphabetical system in medication storage increases the risk of proximity-based errors in drug selection.

  14. FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately.

    Science.gov (United States)

    Budowski-Tal, Inbal; Nov, Yuval; Kolodny, Rachel

    2010-02-23

    Fast identification of protein structures that are similar to a specified query structure in the entire Protein Data Bank (PDB) is fundamental in structure and function prediction. We present FragBag: An ultrafast and accurate method for comparing protein structures. We describe a protein structure by the collection of its overlapping short contiguous backbone segments, and discretize this set using a library of fragments. Then, we succinctly represent the protein as a "bags-of-fragments"-a vector that counts the number of occurrences of each fragment-and measure the similarity between two structures by the similarity between their vectors. Our representation has two additional benefits: (i) it can be used to construct an inverted index, for implementing a fast structural search engine of the entire PDB, and (ii) one can specify a structure as a collection of substructures, without combining them into a single structure; this is valuable for structure prediction, when there are reliable predictions only of parts of the protein. We use receiver operating characteristic curve analysis to quantify the success of FragBag in identifying neighbor candidate sets in a dataset of over 2,900 structures. The gold standard is the set of neighbors found by six state of the art structural aligners. Our best FragBag library finds more accurate candidate sets than the three other filter methods: The SGM, PRIDE, and a method by Zotenko et al. More interestingly, FragBag performs on a par with the computationally expensive, yet highly trusted structural aligners STRUCTAL and CE.

  15. Search Patterns

    CERN Document Server

    Morville, Peter

    2010-01-01

    What people are saying about Search Patterns "Search Patterns is a delight to read -- very thoughtful and thought provoking. It's the most comprehensive survey of designing effective search experiences I've seen." --Irene Au, Director of User Experience, Google "I love this book! Thanks to Peter and Jeffery, I now know that search (yes, boring old yucky who cares search) is one of the coolest ways around of looking at the world." --Dan Roam, author, The Back of the Napkin (Portfolio Hardcover) "Search Patterns is a playful guide to the practical concerns of search interface design. It cont

  16. Automatic Content Creation for Games to Train Students Distinguishing Similar Chinese Characters

    Science.gov (United States)

    Lai, Kwong-Hung; Leung, Howard; Tang, Jeff K. T.

    In learning Chinese, many students often have the problem of mixing up similar characters. This can cause misunderstanding and miscommunication in daily life. It is thus important for students learning the Chinese language to be able to distinguish similar characters and understand their proper usage. In this paper, we propose a game style framework in which the game content in identifying similar Chinese characters in idioms and words is created automatically. Our prior work on analyzing students’ Chinese handwriting can be applied in the similarity measure of Chinese characters. We extend this work by adding the component of radical extraction to speed up the search process. Experimental results show that the proposed method is more accurate and faster in finding more similar Chinese characters compared with the baseline method without considering the radical information.

  17. Similarity transformations of MAPs

    Directory of Open Access Journals (Sweden)

    Andersen Allan T.

    1999-01-01

    Full Text Available We introduce the notion of similar Markovian Arrival Processes (MAPs and show that the event stationary point processes related to two similar MAPs are stochastically equivalent. This holds true for the time stationary point processes too. We show that several well known stochastical equivalences as e.g. that between the H 2 renewal process and the Interrupted Poisson Process (IPP can be expressed by the similarity transformations of MAPs. In the appendix the valid region of similarity transformations for two-state MAPs is characterized.

  18. Similarity after Goodman

    NARCIS (Netherlands)

    Decock, L.B.; Douven, I.

    2011-01-01

    In a famous critique, Goodman dismissed similarity as a slippery and both philosophically and scientifically useless notion. We revisit his critique in the light of important recent work on similarity in psychology and cognitive science. Specifically, we use Tversky's influential set-theoretic

  19. Judgments of brand similarity

    NARCIS (Netherlands)

    Bijmolt, THA; Wedel, M; Pieters, RGM; DeSarbo, WS

    This paper provides empirical insight into the way consumers make pairwise similarity judgments between brands, and how familiarity with the brands, serial position of the pair in a sequence, and the presentation format affect these judgments. Within the similarity judgment process both the

  20. Similarity Measure of Graphs

    Directory of Open Access Journals (Sweden)

    Amine Labriji

    2017-07-01

    Full Text Available The topic of identifying the similarity of graphs was considered as highly recommended research field in the Web semantic, artificial intelligence, the shape recognition and information research. One of the fundamental problems of graph databases is finding similar graphs to a graph query. Existing approaches dealing with this problem are usually based on the nodes and arcs of the two graphs, regardless of parental semantic links. For instance, a common connection is not identified as being part of the similarity of two graphs in cases like two graphs without common concepts, the measure of similarity based on the union of two graphs, or the one based on the notion of maximum common sub-graph (SCM, or the distance of edition of graphs. This leads to an inadequate situation in the context of information research. To overcome this problem, we suggest a new measure of similarity between graphs, based on the similarity measure of Wu and Palmer. We have shown that this new measure satisfies the properties of a measure of similarities and we applied this new measure on examples. The results show that our measure provides a run time with a gain of time compared to existing approaches. In addition, we compared the relevance of the similarity values obtained, it appears that this new graphs measure is advantageous and  offers a contribution to solving the problem mentioned above.

  1. New Similarity Functions

    DEFF Research Database (Denmark)

    Yazdani, Hossein; Ortiz-Arroyo, Daniel; Kwasnicka, Halina

    2016-01-01

    spaces, in addition to their similarity in the vector space. Prioritized Weighted Feature Distance (PWFD) works similarly as WFD, but provides the ability to give priorities to desirable features. The accuracy of the proposed functions are compared with other similarity functions on several data sets......In data science, there are important parameters that affect the accuracy of the algorithms used. Some of these parameters are: the type of data objects, the membership assignments, and distance or similarity functions. This paper discusses similarity functions as fundamental elements in membership...... assignments. The paper introduces Weighted Feature Distance (WFD), and Prioritized Weighted Feature Distance (PWFD), two new distance functions that take into account the diversity in feature spaces. WFD functions perform better in supervised and unsupervised methods by comparing data objects on their feature...

  2. Optimization of interactive visual-similarity-based search

    NARCIS (Netherlands)

    Nguyen, G.P.; Worring, M.

    2008-01-01

    At one end of the spectrum, research in interactive content-based retrieval concentrates on machine learning methods for effective use of relevance feedback. On the other end, the information visualization community focuses on effective methods for conveying information to the user. What is lacking

  3. The semantic similarity ensemble

    Directory of Open Access Journals (Sweden)

    Andrea Ballatore

    2013-12-01

    Full Text Available Computational measures of semantic similarity between geographic terms provide valuable support across geographic information retrieval, data mining, and information integration. To date, a wide variety of approaches to geo-semantic similarity have been devised. A judgment of similarity is not intrinsically right or wrong, but obtains a certain degree of cognitive plausibility, depending on how closely it mimics human behavior. Thus selecting the most appropriate measure for a specific task is a significant challenge. To address this issue, we make an analogy between computational similarity measures and soliciting domain expert opinions, which incorporate a subjective set of beliefs, perceptions, hypotheses, and epistemic biases. Following this analogy, we define the semantic similarity ensemble (SSE as a composition of different similarity measures, acting as a panel of experts having to reach a decision on the semantic similarity of a set of geographic terms. The approach is evaluated in comparison to human judgments, and results indicate that an SSE performs better than the average of its parts. Although the best member tends to outperform the ensemble, all ensembles outperform the average performance of each ensemble's member. Hence, in contexts where the best measure is unknown, the ensemble provides a more cognitively plausible approach.

  4. Additive Similarity Trees

    Science.gov (United States)

    Sattath, Shmuel; Tversky, Amos

    1977-01-01

    Tree representations of similarity data are investigated. Hierarchical clustering is critically examined, and a more general procedure, called the additive tree, is presented. The additive tree representation is then compared to multidimensional scaling. (Author/JKS)

  5. Gender similarities and differences.

    Science.gov (United States)

    Hyde, Janet Shibley

    2014-01-01

    Whether men and women are fundamentally different or similar has been debated for more than a century. This review summarizes major theories designed to explain gender differences: evolutionary theories, cognitive social learning theory, sociocultural theory, and expectancy-value theory. The gender similarities hypothesis raises the possibility of theorizing gender similarities. Statistical methods for the analysis of gender differences and similarities are reviewed, including effect sizes, meta-analysis, taxometric analysis, and equivalence testing. Then, relying mainly on evidence from meta-analyses, gender differences are reviewed in cognitive performance (e.g., math performance), personality and social behaviors (e.g., temperament, emotions, aggression, and leadership), and psychological well-being. The evidence on gender differences in variance is summarized. The final sections explore applications of intersectionality and directions for future research.

  6. Visual search

    NARCIS (Netherlands)

    Toet, A.; Bijl, P.

    2003-01-01

    Visual search, with or without the aid of optical or electro-optical instruments, plays a significant role in various types of military and civilian operations (e.g., reconnaissance, surveillance, and search and rescue). Advance knowledge of human visual search and target acquisition performance is

  7. NNLOPS accurate associated HW production

    CERN Document Server

    Astill, William; Re, Emanuele; Zanderighi, Giulia

    2016-01-01

    We present a next-to-next-to-leading order accurate description of associated HW production consistently matched to a parton shower. The method is based on reweighting events obtained with the HW plus one jet NLO accurate calculation implemented in POWHEG, extended with the MiNLO procedure, to reproduce NNLO accurate Born distributions. Since the Born kinematics is more complex than the cases treated before, we use a parametrization of the Collins-Soper angles to reduce the number of variables required for the reweighting. We present phenomenological results at 13 TeV, with cuts suggested by the Higgs Cross Section Working Group.

  8. Trajectory similarity join in spatial networks

    KAUST Repository

    Shang, Shuo

    2017-09-07

    The matching of similar pairs of objects, called similarity join, is fundamental functionality in data management. We consider the case of trajectory similarity join (TS-Join), where the objects are trajectories of vehicles moving in road networks. Thus, given two sets of trajectories and a threshold θ, the TS-Join returns all pairs of trajectories from the two sets with similarity above θ. This join targets applications such as trajectory near-duplicate detection, data cleaning, ridesharing recommendation, and traffic congestion prediction. With these applications in mind, we provide a purposeful definition of similarity. To enable efficient TS-Join processing on large sets of trajectories, we develop search space pruning techniques and take into account the parallel processing capabilities of modern processors. Specifically, we present a two-phase divide-and-conquer algorithm. For each trajectory, the algorithm first finds similar trajectories. Then it merges the results to achieve a final result. The algorithm exploits an upper bound on the spatiotemporal similarity and a heuristic scheduling strategy for search space pruning. The algorithm\\'s per-trajectory searches are independent of each other and can be performed in parallel, and the merging has constant cost. An empirical study with real data offers insight in the performance of the algorithm and demonstrates that is capable of outperforming a well-designed baseline algorithm by an order of magnitude.

  9. Similarity or difference?

    DEFF Research Database (Denmark)

    Villadsen, Anders Ryom

    2013-01-01

    While the organizational structures and strategies of public organizations have attracted substantial research attention among public management scholars, little research has explored how these organizational core dimensions are interconnected and influenced by pressures for similarity....... In this paper I address this topic by exploring the relation between expenditure strategy isomorphism and structure isomorphism in Danish municipalities. Different literatures suggest that organizations exist in concurrent pressures for being similar to and different from other organizations in their field...... of action. It is theorized that to meet this challenge organizations may substitute increased similarity on one core dimension for increased idiosyncrasy on another, but only after a certain level of isomorphism is reached. Results of quantitative analyses support this theory and show that an inverse U...

  10. Measuring Personalization of Web Search

    DEFF Research Database (Denmark)

    Hannak, Aniko; Sapiezynski, Piotr; Kakhki, Arash Molavi

    2013-01-01

    Web search is an integral part of our daily lives. Recently, there has been a trend of personalization in Web search, where different users receive different results for the same search query. The increasing personalization is leading to concerns about Filter Bubble effects, where certain users...... are simply unable to access information that the search engines’ algorithm decidesis irrelevant. Despitetheseconcerns, there has been little quantification of the extent of personalization in Web search today, or the user attributes that cause it. In light of this situation, we make three contributions....... First, we develop a methodology for measuring personalization in Web search results. While conceptually simple, there are numerous details that our methodology must handle in order to accurately attribute differences in search results to personalization. Second, we apply our methodology to 200 users...

  11. Incremental Similarity and Turbulence

    DEFF Research Database (Denmark)

    Barndorff-Nielsen, Ole E.; Hedevang, Emil; Schmiegel, Jürgen

    This paper discusses the mathematical representation of an empirically observed phenomenon, referred to as Incremental Similarity. We discuss this feature from the viewpoint of stochastic processes and present a variety of non-trivial examples, including those that are of relevance for turbulence...

  12. Information Extraction Using Distant Supervision and Semantic Similarities

    Directory of Open Access Journals (Sweden)

    PARK, Y.

    2016-02-01

    Full Text Available Information extraction is one of the main research tasks in natural language processing and text mining that extracts useful information from unstructured sentences. Information extraction techniques include named entity recognition, relation extraction, and co-reference resolution. Among them, relation extraction refers to a task that extracts semantic relations between entities such as personal and geographic names in documents. This is an important research area, which is used in knowledge base construction and question and answering systems. This study presents relation extraction using a distant supervision learning technique among semi-supervised learning methods, which have been spotlighted in recent years to reduce human manual work and costs required for supervised learning. That is, this study proposes a method that can improve relation extraction by improving a distant supervision learning technique by applying a clustering method to create a learning corpus and semantic analysis for relation extraction that is difficult to identify using existing distant supervision. Through comparison experiments of various semantic similarity comparison methods, similarity calculation methods that are useful to relation extraction using distant supervision are searched, and a large number of accurate relation triples can be extracted using the proposed structural advantages and semantic similarity comparison.

  13. Mobile Visual Search Based on Histogram Matching and Zone Weight Learning

    Science.gov (United States)

    Zhu, Chuang; Tao, Li; Yang, Fan; Lu, Tao; Jia, Huizhu; Xie, Xiaodong

    2018-01-01

    In this paper, we propose a novel image retrieval algorithm for mobile visual search. At first, a short visual codebook is generated based on the descriptor database to represent the statistical information of the dataset. Then, an accurate local descriptor similarity score is computed by merging the tf-idf weighted histogram matching and the weighting strategy in compact descriptors for visual search (CDVS). At last, both the global descriptor matching score and the local descriptor similarity score are summed up to rerank the retrieval results according to the learned zone weights. The results show that the proposed approach outperforms the state-of-the-art image retrieval method in CDVS.

  14. Efficient Similarity Retrieval in Music Databases

    DEFF Research Database (Denmark)

    Ruxanda, Maria Magdalena; Jensen, Christian Søndergaard

    2006-01-01

    Audio music is increasingly becoming available in digital form, and the digital music collections of individuals continue to grow. Addressing the need for effective means of retrieving music from such collections, this paper proposes new techniques for content-based similarity search. Each music...

  15. Music Retrieval based on Melodic Similarity

    NARCIS (Netherlands)

    Typke, R.

    2007-01-01

    This thesis introduces a method for measuring melodic similarity for notated music such as MIDI files. This music search algorithm views music as sets of notes that are represented as weighted points in the two-dimensional space of time and pitch. Two point sets can be compared by calculating how

  16. MOST: most-similar ligand based approach to target prediction.

    Science.gov (United States)

    Huang, Tao; Mi, Hong; Lin, Cheng-Yuan; Zhao, Ling; Zhong, Linda L D; Liu, Feng-Bin; Zhang, Ge; Lu, Ai-Ping; Bian, Zhao-Xiang

    2017-03-11

    Many computational approaches have been used for target prediction, including machine learning, reverse docking, bioactivity spectra analysis, and chemical similarity searching. Recent studies have suggested that chemical similarity searching may be driven by the most-similar ligand. However, the extent of bioactivity of most-similar ligands has been oversimplified or even neglected in these studies, and this has impaired the prediction power. Here we propose the MOst-Similar ligand-based Target inference approach, namely MOST, which uses fingerprint similarity and explicit bioactivity of the most-similar ligands to predict targets of the query compound. Performance of MOST was evaluated by using combinations of different fingerprint schemes, machine learning methods, and bioactivity representations. In sevenfold cross-validation with a benchmark Ki dataset from CHEMBL release 19 containing 61,937 bioactivity data of 173 human targets, MOST achieved high average prediction accuracy (0.95 for pKi ≥ 5, and 0.87 for pKi ≥ 6). Morgan fingerprint was shown to be slightly better than FP2. Logistic Regression and Random Forest methods performed better than Naïve Bayes. In a temporal validation, the Ki dataset from CHEMBL19 were used to train models and predict the bioactivity of newly deposited ligands in CHEMBL20. MOST also performed well with high accuracy (0.90 for pKi ≥ 5, and 0.76 for pKi ≥ 6), when Logistic Regression and Morgan fingerprint were employed. Furthermore, the p values associated with explicit bioactivity were found be a robust index for removing false positive predictions. Implicit bioactivity did not offer this capability. Finally, p values generated with Logistic Regression, Morgan fingerprint and explicit activity were integrated with a false discovery rate (FDR) control procedure to reduce false positives in multiple-target prediction scenario, and the success of this strategy it was demonstrated with a case of fluanisone

  17. An efficient and accurate 3D displacements tracking strategy for digital volume correlation

    KAUST Repository

    Pan, Bing

    2014-07-01

    Owing to its inherent computational complexity, practical implementation of digital volume correlation (DVC) for internal displacement and strain mapping faces important challenges in improving its computational efficiency. In this work, an efficient and accurate 3D displacement tracking strategy is proposed for fast DVC calculation. The efficiency advantage is achieved by using three improvements. First, to eliminate the need of updating Hessian matrix in each iteration, an efficient 3D inverse compositional Gauss-Newton (3D IC-GN) algorithm is introduced to replace existing forward additive algorithms for accurate sub-voxel displacement registration. Second, to ensure the 3D IC-GN algorithm that converges accurately and rapidly and avoid time-consuming integer-voxel displacement searching, a generalized reliability-guided displacement tracking strategy is designed to transfer accurate and complete initial guess of deformation for each calculation point from its computed neighbors. Third, to avoid the repeated computation of sub-voxel intensity interpolation coefficients, an interpolation coefficient lookup table is established for tricubic interpolation. The computational complexity of the proposed fast DVC and the existing typical DVC algorithms are first analyzed quantitatively according to necessary arithmetic operations. Then, numerical tests are performed to verify the performance of the fast DVC algorithm in terms of measurement accuracy and computational efficiency. The experimental results indicate that, compared with the existing DVC algorithm, the presented fast DVC algorithm produces similar precision and slightly higher accuracy at a substantially reduced computational cost. © 2014 Elsevier Ltd.

  18. Faceted Search

    CERN Document Server

    Tunkelang, Daniel

    2009-01-01

    We live in an information age that requires us, more than ever, to represent, access, and use information. Over the last several decades, we have developed a modern science and technology for information retrieval, relentlessly pursuing the vision of a "memex" that Vannevar Bush proposed in his seminal article, "As We May Think." Faceted search plays a key role in this program. Faceted search addresses weaknesses of conventional search approaches and has emerged as a foundation for interactive information retrieval. User studies demonstrate that faceted search provides more

  19. More Similar Than Different

    DEFF Research Database (Denmark)

    Pedersen, Mogens Jin

    2015-01-01

    What role do employee features play into the success of different personnel management practices for serving high performance? Using data from a randomized survey experiment among 5,982 individuals of all ages, this article examines how gender conditions the compliance effects of different...... incentive treatments—each relating to the basic content of distinct types of personnel management practices. The findings show that males and females are more similar than different in terms of the incentive treatments’ effects: Significant average effects are found for three out of five incentive...

  20. SiRen: Leveraging Similar Regions for Efficient and Accurate Variant Calling

    Science.gov (United States)

    2015-05-30

    Cloudera, EMC2, Ericsson, Facebook, Guavus, HP, Huawei, Informatica , Intel, Microsoft, NetApp, Pivotal, Samsung, Schlumberger, Splunk, Virdata and VMware...EMC2, Ericsson, Facebook, Guavus, HP, Huawei, Informatica , Intel, Microsoft, NetApp, Pivotal, Samsung, Schlumberger, Splunk, Virdata and VMware

  1. Accurate measurements in volume data

    NARCIS (Netherlands)

    Oliván Bescós, J.; Bosma, Marco; Smit, Jaap; Mun, S.K.

    2001-01-01

    An algorithm for very accurate visualization of an iso- surface in a 3D medical dataset has been developed in the past few years. This technique is extended in this paper to several kinds of measurements in which exact geometric information of a selected iso-surface is used to derive volume, length,

  2. Impaired serial visual search in children with developmental dyslexia.

    Science.gov (United States)

    Sireteanu, Ruxandra; Goebel, Claudia; Goertz, Ralf; Werner, Ingeborg; Nalewajko, Magdalena; Thiel, Aylin

    2008-12-01

    In order to test the hypothesis of attentional deficits in dyslexia, we investigated the performance of children with developmental dyslexia on a number of visual search tasks. When tested with conjunction tasks for orientation and form using complex, letter-like material, dyslexic children showed an increased number of errors accompanied by faster reaction times in comparison to control children matched to the dyslexics on age, gender, and intelligence. On conjunction tasks for orientation and color, dyslexic children were also less accurate, but showed slower reaction times than the age-matched control children. These differences between the two groups decreased with increasing age. In contrast to these differences, the performance of dyslexic children in feature search tasks was similar to that of control children. These results suggest that children with developmental dyslexia present selective deficits in complex serial visual search tasks, implying impairment in goal-directed, sustained visual attention.

  3. Advanced Search

    African Journals Online (AJOL)

    Items 151 - 200 of 500 ... Search tips: Search terms are case-insensitive; Common words are ignored; By default only articles containing all terms in the query are returned (i.e., AND is implied); Combine multiple words with OR to find articles containing either term; e.g., education OR research; Use parentheses to create more ...

  4. Search Results

    African Journals Online (AJOL)

    Search tips: Search terms are case-insensitive; Common words are ignored; By default only articles containing all terms in the query are returned (i.e., AND is implied); Combine multiple words with OR to find articles containing either term; e.g., education OR research; Use parentheses to create more complex queries; e.g., ...

  5. Advanced Search

    African Journals Online (AJOL)

    Search tips: Search terms are case-insensitive; Common words are ignored; By default only articles containing all terms in the query are returned (i.e., AND is implied); Combine multiple words with OR to find articles containing either term; e.g., education OR research; Use parentheses to create more complex queries; e.g., ...

  6. The application of similar image retrieval in electronic commerce.

    Science.gov (United States)

    Hu, YuPing; Yin, Hua; Han, Dezhi; Yu, Fei

    2014-01-01

    Traditional online shopping platform (OSP), which searches product information by keywords, faces three problems: indirect search mode, large search space, and inaccuracy in search results. For solving these problems, we discuss and research the application of similar image retrieval in electronic commerce. Aiming at improving the network customers' experience and providing merchants with the accuracy of advertising, we design a reasonable and extensive electronic commerce application system, which includes three subsystems: image search display subsystem, image search subsystem, and product information collecting subsystem. This system can provide seamless connection between information platform and OSP, on which consumers can automatically and directly search similar images according to the pictures from information platform. At the same time, it can be used to provide accuracy of internet marketing for enterprises. The experiment shows the efficiency of constructing the system.

  7. The Application of Similar Image Retrieval in Electronic Commerce

    Science.gov (United States)

    Hu, YuPing; Yin, Hua; Han, Dezhi; Yu, Fei

    2014-01-01

    Traditional online shopping platform (OSP), which searches product information by keywords, faces three problems: indirect search mode, large search space, and inaccuracy in search results. For solving these problems, we discuss and research the application of similar image retrieval in electronic commerce. Aiming at improving the network customers' experience and providing merchants with the accuracy of advertising, we design a reasonable and extensive electronic commerce application system, which includes three subsystems: image search display subsystem, image search subsystem, and product information collecting subsystem. This system can provide seamless connection between information platform and OSP, on which consumers can automatically and directly search similar images according to the pictures from information platform. At the same time, it can be used to provide accuracy of internet marketing for enterprises. The experiment shows the efficiency of constructing the system. PMID:24883411

  8. The Application of Similar Image Retrieval in Electronic Commerce

    Directory of Open Access Journals (Sweden)

    YuPing Hu

    2014-01-01

    Full Text Available Traditional online shopping platform (OSP, which searches product information by keywords, faces three problems: indirect search mode, large search space, and inaccuracy in search results. For solving these problems, we discuss and research the application of similar image retrieval in electronic commerce. Aiming at improving the network customers’ experience and providing merchants with the accuracy of advertising, we design a reasonable and extensive electronic commerce application system, which includes three subsystems: image search display subsystem, image search subsystem, and product information collecting subsystem. This system can provide seamless connection between information platform and OSP, on which consumers can automatically and directly search similar images according to the pictures from information platform. At the same time, it can be used to provide accuracy of internet marketing for enterprises. The experiment shows the efficiency of constructing the system.

  9. Similar or different?

    DEFF Research Database (Denmark)

    Cornér, Solveig; Pyhältö, Kirsi; Peltonen, Jouni

    2018-01-01

    experienced by PhD students within the same discipline. This study explores the support experiences of 381 PhD students within the humanities and social sciences from three research-intensive universities in Denmark (n=145) and Finland (n=236). The mixed methods design was utilized. The data were collected...... counter partners, whereas the Finnish students perceived lower levels of instrumental support than the Danish students. The findings imply that seemingly similar contexts hold valid differences in experienced social support and educational strategies at the PhD level.......Previous research has identified researcher community and supervisory support as key determinants of the doctoral journey contributing to students’ persistence and robustness. However, we still know little about cross-cultural variation in the researcher community and supervisory support...

  10. Information content of visual scenes influences systematic search of desert ants.

    Science.gov (United States)

    Schultheiss, Patrick; Wystrach, Antoine; Legge, Eric L G; Cheng, Ken

    2013-02-15

    Many animals - including insects - navigate visually through their environment. Solitary foraging desert ants are known to acquire visual information from the surrounding panorama and use it to navigate along habitual routes or to pinpoint a goal such as the nest. Returning foragers that fail to find the nest entrance engage in searching behaviour, during which they continue to use vision. The characteristics of searching behaviour have typically been investigated in unfamiliar environments. Here we investigated in detail the nest-searching behaviour of Melophorus bagoti foragers within the familiar visual environment of their nest. First, by relating search behaviour to the information content of panoramic (360 deg) images, we found that searches were more accurate in visually cluttered environments. Second, as observed in unfamiliar visual surrounds, searches were dynamic and gradually expanded with time, showing that nest pinpointing is not rigidly controlled by vision. Third, contrary to searches displayed in unfamiliar environments, searches observed here could be modelled as a single exponential search strategy, which is similar to a Brownian walk, and there was no evidence of a Lévy walk. Overall, our results revealed that searching behaviour is remarkably flexible and varies according to the relevance of information provided by the surrounding visual scenery.

  11. Visual search: a retrospective.

    Science.gov (United States)

    Eckstein, Miguel P

    2011-12-30

    Visual search, a vital task for humans and animals, has also become a common and important tool for studying many topics central to active vision and cognition ranging from spatial vision, attention, and oculomotor control to memory, decision making, and rewards. While visual search often seems effortless to humans, trying to recreate human visual search abilities in machines has represented an incredible challenge for computer scientists and engineers. What are the brain computations that ensure successful search? This review article draws on efforts from various subfields and discusses the mechanisms and strategies the brain uses to optimize visual search: the psychophysical evidence, their neural correlates, and if unknown, possible loci of the neural computations. Mechanisms and strategies include use of knowledge about the target, distractor, background statistical properties, location probabilities, contextual cues, scene context, rewards, target prevalence, and also the role of saliency, center-surround organization of search templates, and eye movement plans. I provide overviews of classic and contemporary theories of covert attention and eye movements during search explaining their differences and similarities. To allow the reader to anchor some of the laboratory findings to real-world tasks, the article includes interviews with three expert searchers: a radiologist, a fisherman, and a satellite image analyst.

  12. When Is Network Lasso Accurate?

    Directory of Open Access Journals (Sweden)

    Alexander Jung

    2018-01-01

    Full Text Available The “least absolute shrinkage and selection operator” (Lasso method has been adapted recently for network-structured datasets. In particular, this network Lasso method allows to learn graph signals from a small number of noisy signal samples by using the total variation of a graph signal for regularization. While efficient and scalable implementations of the network Lasso are available, only little is known about the conditions on the underlying network structure which ensure network Lasso to be accurate. By leveraging concepts of compressed sensing, we address this gap and derive precise conditions on the underlying network topology and sampling set which guarantee the network Lasso for a particular loss function to deliver an accurate estimate of the entire underlying graph signal. We also quantify the error incurred by network Lasso in terms of two constants which reflect the connectivity of the sampled nodes.

  13. Accurate Accident Reconstruction in VANET

    OpenAIRE

    Kopylova, Yuliya; Farkas, Csilla; Xu, Wenyuan

    2011-01-01

    Part 9: Short Papers; International audience; We propose a forensic VANET application to aid an accurate accident reconstruction. Our application provides a new source of objective real-time data impossible to collect using existing methods. By leveraging inter-vehicle communications, we compile digital evidence describing events before, during, and after an accident in its entirety. In addition to sensors data and major components’ status, we provide relative positions of all vehicles involv...

  14. Rapid and accurate identification of microorganisms contaminating cosmetic products based on DNA sequence homology.

    Science.gov (United States)

    Fujita, Y; Shibayama, H; Suzuki, Y; Karita, S; Takamatsu, S

    2005-12-01

    The aim of this study was to develop rapid and accurate procedures to identify microorganisms contaminating cosmetic products, based on the identity of the nucleotide sequences of the internal transcribed spacer (ITS) region of the ribosomal RNA coding DNA (rDNA). Five types of microorganisms were isolated from the inner portion of lotion bottle caps, skin care lotions, and cleansing gels. The rDNA ITS region of microorganisms was amplified through the use of colony-direct PCR or ordinal PCR using DNA extracts as templates. The nucleotide sequences of the amplified DNA were determined and subjected to homology search of a publicly available DNA database. Thereby, we obtained DNA sequences possessing high similarity with the query sequences from the databases of all the five organisms analyzed. The traditional identification procedure requires expert skills, and a time period of approximately 1 month to identify the microorganisms. On the contrary, 3-7 days were sufficient to complete all the procedures employed in the current method, including isolation and cultivation of organisms, DNA sequencing, and the database homology search. Moreover, it was possible to develop the skills necessary to perform the molecular techniques required for the identification procedures within 1 week. Consequently, the current method is useful for rapid and accurate identification of microorganisms, contaminating cosmetics.

  15. Autonomous search

    CERN Document Server

    Hamadi, Youssef; Saubion, Frédéric

    2012-01-01

    Autonomous combinatorial search (AS) represents a new field in combinatorial problem solving. Its major standpoint and originality is that it considers that problem solvers must be capable of self-improvement operations. This is the first book dedicated to AS.

  16. Predicting user click behaviour in search engine advertisements

    Science.gov (United States)

    Daryaie Zanjani, Mohammad; Khadivi, Shahram

    2015-10-01

    According to the specific requirements and interests of users, search engines select and display advertisements that match user needs and have higher probability of attracting users' attention based on their previous search history. New objects such as user, advertisement or query cause a deterioration of precision in targeted advertising due to their lack of history. This article surveys this challenge. In the case of new objects, we first extract similar observed objects to the new object and then we use their history as the history of new object. Similarity between objects is measured based on correlation, which is a relation between user and advertisement when the advertisement is displayed to the user. This method is used for all objects, so it has helped us to accurately select relevant advertisements for users' queries. In our proposed model, we assume that similar users behave in a similar manner. We find that users with few queries are similar to new users. We will show that correlation between users and advertisements' keywords is high. Thus, users who pay attention to advertisements' keywords, click similar advertisements. In addition, users who pay attention to specific brand names might have similar behaviours too.

  17. Accurate determination of antenna directivity

    DEFF Research Database (Denmark)

    Dich, Mikael

    1997-01-01

    The derivation of a formula for accurate estimation of the total radiated power from a transmitting antenna for which the radiated power density is known in a finite number of points on the far-field sphere is presented. The main application of the formula is determination of directivity from power......-pattern measurements. The derivation is based on the theory of spherical wave expansion of electromagnetic fields, which also establishes a simple criterion for the required number of samples of the power density. An array antenna consisting of Hertzian dipoles is used to test the accuracy and rate of convergence...

  18. Measuring Source Code Similarity Using Reference Vectors

    Science.gov (United States)

    Ohno, Asako; Murao, Hajime

    In this paper, we propose a novel method to measure similarity between program source codes. Different to others, our method doues not compare two source codes directly but compares two reference vectors, where a reference vector is calculated from one source code and a set of reference source codes. This means that our method requires no original source code when considering an application open to public such as a search engine for the program source code on the internet. We have built a simple search system and have evaluated with Java source codes made in the university course of basic programming. Results show that the system can achieve quite high average precision rate in very short time which means the proposed method can measure correct similarity very fast.

  19. Effects of categorical labels on similarity judgments: A critical analysis of similarity-based approaches

    OpenAIRE

    Noles, Nicholaus S.; Gelman, Susan A.

    2011-01-01

    The goal of the present study was to evaluate the claim that category labels affect children’s judgments of visual similarity. We presented preschool children with discriminable and identical sets of animal pictures and asked them to make perceptual judgments in the presence or absence of labels. Our findings indicate that children who are asked to make perceptual judgments about identical items judge discriminable items less accurately when making subsequent similarity judgments. Thus, label...

  20. $A$ searches

    CERN Document Server

    Beacham, James

    The Standard Model of particle physics encompasses three of the four known fundamental forces of nature, and has remarkably withstood all of the experimental tests of its predictions, a fact solidified by the discovery, in 2012, of the Higgs boson. However, it cannot be the complete picture. Many measurements have been made that hint at physics beyond the Standard Model, and the main task of the high- energy experimental physics community is to conduct searches for new physics in as many di↵erent places and regimes as possible. I present three searches for new phenomena in three di↵erent high-energy collider experiments, namely, a search for events with at least three photons in the final state, which is sensitive to an exotic decay of a Higgs boson into four photons via intermediate pseudoscalar particles, a, with ATLAS, at the Large Hadron Collider; a search for a dark photon, also known as an A0 , with APEX, at Thomas Je↵erson National Accelerator Facility; and a search for a Higgs decaying into four...

  1. Face Search at Scale.

    Science.gov (United States)

    Wang, Dayong; Otto, Charles; Jain, Anil K

    2017-06-01

    Given the prevalence of social media websites, one challenge facing computer vision researchers is to devise methods to search for persons of interest among the billions of shared photos on these websites. Despite significant progress in face recognition, searching a large collection of unconstrained face images remains a difficult problem. To address this challenge, we propose a face search system which combines a fast search procedure, coupled with a state-of-the-art commercial off the shelf (COTS) matcher, in a cascaded framework. Given a probe face, we first filter the large gallery of photos to find the top- k most similar faces using features learned by a convolutional neural network. The k retrieved candidates are re-ranked by combining similarities based on deep features and those output by the COTS matcher. We evaluate the proposed face search system on a gallery containing 80 million web-downloaded face images. Experimental results demonstrate that while the deep features perform worse than the COTS matcher on a mugshot dataset (93.7 percent versus 98.6 percent TAR@FAR of 0.01 percent), fusing the deep features with the COTS matcher improves the overall performance ( 99.5 percent TAR@FAR of 0.01 percent). This shows that the learned deep features provide complementary information over representations used in state-of-the-art face matchers. On the unconstrained face image benchmarks, the performance of the learned deep features is competitive with reported accuracies. LFW database: 98.20 percent accuracy under the standard protocol and 88.03 percent TAR@FAR of 0.1 percent under the BLUFR protocol; IJB-A benchmark: 51.0 percent TAR@FAR of 0.1 percent (verification), rank 1 retrieval of 82.2 percent (closed-set search), 61.5 percent FNIR@FAR of 1 percent (open-set search). The proposed face search system offers an excellent trade-off between accuracy and scalability on galleries with millions of images. Additionally, in a face search experiment involving

  2. Accurate Modeling of Advanced Reflectarrays

    DEFF Research Database (Denmark)

    Zhou, Min

    of the incident field, the choice of basis functions, and the technique to calculate the far-field. Based on accurate reference measurements of two offset reflectarrays carried out at the DTU-ESA Spherical NearField Antenna Test Facility, it was concluded that the three latter factors are particularly important...... to the conventional phase-only optimization technique (POT), the geometrical parameters of the array elements are directly optimized to fulfill the far-field requirements, thus maintaining a direct relation between optimization goals and optimization variables. As a result, better designs can be obtained compared...... using the GDOT to demonstrate its capabilities. To verify the accuracy of the GDOT, two offset contoured beam reflectarrays that radiate a high-gain beam on a European coverage have been designed and manufactured, and subsequently measured at the DTU-ESA Spherical Near-Field Antenna Test Facility...

  3. The Accurate Particle Tracer Code

    CERN Document Server

    Wang, Yulei; Qin, Hong; Yu, Zhi

    2016-01-01

    The Accurate Particle Tracer (APT) code is designed for large-scale particle simulations on dynamical systems. Based on a large variety of advanced geometric algorithms, APT possesses long-term numerical accuracy and stability, which are critical for solving multi-scale and non-linear problems. Under the well-designed integrated and modularized framework, APT serves as a universal platform for researchers from different fields, such as plasma physics, accelerator physics, space science, fusion energy research, computational mathematics, software engineering, and high-performance computation. The APT code consists of seven main modules, including the I/O module, the initialization module, the particle pusher module, the parallelization module, the field configuration module, the external force-field module, and the extendible module. The I/O module, supported by Lua and Hdf5 projects, provides a user-friendly interface for both numerical simulation and data analysis. A series of new geometric numerical methods...

  4. Learning deep similarity in fundus photography

    Science.gov (United States)

    Chudzik, Piotr; Al-Diri, Bashir; Caliva, Francesco; Ometto, Giovanni; Hunter, Andrew

    2017-02-01

    Similarity learning is one of the most fundamental tasks in image analysis. The ability to extract similar images in the medical domain as part of content-based image retrieval (CBIR) systems has been researched for many years. The vast majority of methods used in CBIR systems are based on hand-crafted feature descriptors. The approximation of a similarity mapping for medical images is difficult due to the big variety of pixel-level structures of interest. In fundus photography (FP) analysis, a subtle difference in e.g. lesions and vessels shape and size can result in a different diagnosis. In this work, we demonstrated how to learn a similarity function for image patches derived directly from FP image data without the need of manually designed feature descriptors. We used a convolutional neural network (CNN) with a novel architecture adapted for similarity learning to accomplish this task. Furthermore, we explored and studied multiple CNN architectures. We show that our method can approximate the similarity between FP patches more efficiently and accurately than the state-of- the-art feature descriptors, including SIFT and SURF using a publicly available dataset. Finally, we observe that our approach, which is purely data-driven, learns that features such as vessels calibre and orientation are important discriminative factors, which resembles the way how humans reason about similarity. To the best of authors knowledge, this is the first attempt to approximate a visual similarity mapping in FP.

  5. Fast and accurate marker-based projective registration method for uncalibrated transmission electron microscope tilt series

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Ho; Xing Lei [Department of Radiation Oncology, Stanford University, Stanford, CA 94305-5847 (United States); Lee, Jeongjin [Department of Digital Media, Catholic University of Korea, Gyeonggi-do, 420-743 (Korea, Republic of); Shin, Yeong Gil [School of Computer Science and Engineering, Seoul National University, Seoul, 151-742 (Korea, Republic of); Lee, Rena, E-mail: leeho@stanford.ed [Department of Radiation Oncology, Ewha Womans University School of Medicine, Seoul, 158-710 (Korea, Republic of)

    2010-06-21

    This paper presents a fast and accurate marker-based automatic registration technique for aligning uncalibrated projections taken from a transmission electron microscope (TEM) with different tilt angles and orientations. Most of the existing TEM image alignment methods estimate the similarity between images using the projection model with least-squares metric and guess alignment parameters by computationally expensive nonlinear optimization schemes. Approaches based on the least-squares metric which is sensitive to outliers may cause misalignment since automatic tracking methods, though reliable, can produce a few incorrect trajectories due to a large number of marker points. To decrease the influence of outliers, we propose a robust similarity measure using the projection model with a Gaussian weighting function. This function is very effective in suppressing outliers that are far from correct trajectories and thus provides a more robust metric. In addition, we suggest a fast search strategy based on the non-gradient Powell's multidimensional optimization scheme to speed up optimization as only meaningful parameters are considered during iterative projection model estimation. Experimental results show that our method brings more accurate alignment with less computational cost compared to conventional automatic alignment methods.

  6. New generation of the multimedia search engines

    Science.gov (United States)

    Mijes Cruz, Mario Humberto; Soto Aldaco, Andrea; Maldonado Cano, Luis Alejandro; López Rodríguez, Mario; Rodríguez Vázqueza, Manuel Antonio; Amaya Reyes, Laura Mariel; Cano Martínez, Elizabeth; Pérez Rosas, Osvaldo Gerardo; Rodríguez Espejo, Luis; Flores Secundino, Jesús Abimelek; Rivera Martínez, José Luis; García Vázquez, Mireya Saraí; Zamudio Fuentes, Luis Miguel; Sánchez Valenzuela, Juan Carlos; Montoya Obeso, Abraham; Ramírez Acosta, Alejandro Álvaro

    2016-09-01

    Current search engines are based upon search methods that involve the combination of words (text-based search); which has been efficient until now. However, the Internet's growing demand indicates that there's more diversity on it with each passing day. Text-based searches are becoming limited, as most of the information on the Internet can be found in different types of content denominated multimedia content (images, audio files, video files). Indeed, what needs to be improved in current search engines is: search content, and precision; as well as an accurate display of expected search results by the user. Any search can be more precise if it uses more text parameters, but it doesn't help improve the content or speed of the search itself. One solution is to improve them through the characterization of the content for the search in multimedia files. In this article, an analysis of the new generation multimedia search engines is presented, focusing the needs according to new technologies. Multimedia content has become a central part of the flow of information in our daily life. This reflects the necessity of having multimedia search engines, as well as knowing the real tasks that it must comply. Through this analysis, it is shown that there are not many search engines that can perform content searches. The area of research of multimedia search engines of new generation is a multidisciplinary area that's in constant growth, generating tools that satisfy the different needs of new generation systems.

  7. Accurate metacognition for visual sensory memory representations.

    Science.gov (United States)

    Vandenbroucke, Annelinde R E; Sligte, Ilja G; Barrett, Adam B; Seth, Anil K; Fahrenfort, Johannes J; Lamme, Victor A F

    2014-04-01

    The capacity to attend to multiple objects in the visual field is limited. However, introspectively, people feel that they see the whole visual world at once. Some scholars suggest that this introspective feeling is based on short-lived sensory memory representations, whereas others argue that the feeling of seeing more than can be attended to is illusory. Here, we investigated this phenomenon by combining objective memory performance with subjective confidence ratings during a change-detection task. This allowed us to compute a measure of metacognition--the degree of knowledge that subjects have about the correctness of their decisions--for different stages of memory. We show that subjects store more objects in sensory memory than they can attend to but, at the same time, have similar metacognition for sensory memory and working memory representations. This suggests that these subjective impressions are not an illusion but accurate reflections of the richness of visual perception.

  8. Concept Search Tool for Multilingual Hadith Corpus

    OpenAIRE

    Hassan, SMO; Atwell, ES

    2016-01-01

    The search for the Hadith and understanding of the concepts and meaning is the important science. Moreover searching using concepts is a technique used with large data and purpose to obtain relevant results and more accurate. Accordingly the idea of building a search tool for Hadith by concepts to facilitate searches for users on the Web and make the Hadith is available in several ways. In this paper, we will design a concept search tool for Hadith. Our plan to work in three phases the first ...

  9. Notions of similarity for computational biology models

    KAUST Repository

    Waltemath, Dagmar

    2016-03-21

    Computational models used in biology are rapidly increasing in complexity, size, and numbers. To build such large models, researchers need to rely on software tools for model retrieval, model combination, and version control. These tools need to be able to quantify the differences and similarities between computational models. However, depending on the specific application, the notion of similarity may greatly vary. A general notion of model similarity, applicable to various types of models, is still missing. Here, we introduce a general notion of quantitative model similarities, survey the use of existing model comparison methods in model building and management, and discuss potential applications of model comparison. To frame model comparison as a general problem, we describe a theoretical approach to defining and computing similarities based on different model aspects. Potentially relevant aspects of a model comprise its references to biological entities, network structure, mathematical equations and parameters, and dynamic behaviour. Future similarity measures could combine these model aspects in flexible, problem-specific ways in order to mimic users\\' intuition about model similarity, and to support complex model searches in databases.

  10. The accurate particle tracer code

    Science.gov (United States)

    Wang, Yulei; Liu, Jian; Qin, Hong; Yu, Zhi; Yao, Yicun

    2017-11-01

    The Accurate Particle Tracer (APT) code is designed for systematic large-scale applications of geometric algorithms for particle dynamical simulations. Based on a large variety of advanced geometric algorithms, APT possesses long-term numerical accuracy and stability, which are critical for solving multi-scale and nonlinear problems. To provide a flexible and convenient I/O interface, the libraries of Lua and Hdf5 are used. Following a three-step procedure, users can efficiently extend the libraries of electromagnetic configurations, external non-electromagnetic forces, particle pushers, and initialization approaches by use of the extendible module. APT has been used in simulations of key physical problems, such as runaway electrons in tokamaks and energetic particles in Van Allen belt. As an important realization, the APT-SW version has been successfully distributed on the world's fastest computer, the Sunway TaihuLight supercomputer, by supporting master-slave architecture of Sunway many-core processors. Based on large-scale simulations of a runaway beam under parameters of the ITER tokamak, it is revealed that the magnetic ripple field can disperse the pitch-angle distribution significantly and improve the confinement of energetic runaway beam on the same time.

  11. Internet Search Engines

    OpenAIRE

    Fatmaa El Zahraa Mohamed Abdou

    2004-01-01

    A general study about the internet search engines, the study deals main 7 points; the differance between search engines and search directories, components of search engines, the percentage of sites covered by search engines, cataloging of sites, the needed time for sites appearance in search engines, search capabilities, and types of search engines.

  12. Textual and chemical information processing: different domains but similar algorithms

    Directory of Open Access Journals (Sweden)

    Peter Willett

    2000-01-01

    Full Text Available This paper discusses the extent to which algorithms developed for the processing of textual databases are also applicable to the processing of chemical structure databases, and vice versa. Applications discussed include: an algorithm for distribution sorting that has been applied to the design of screening systems for rapid chemical substructure searching; the use of measures of inter-molecular structural similarity for the analysis of hypertext graphs; a genetic algorithm for calculating term weights for relevance feedback searching for determining whether a molecule is likely to exhibit biological activity; and the use of data fusion to combine the results of different chemical similarity searches.

  13. Effective and Accurate Colormap Selection

    Science.gov (United States)

    Thyng, K. M.; Greene, C. A.; Hetland, R. D.; Zimmerle, H.; DiMarco, S. F.

    2016-12-01

    Science is often communicated through plots, and design choices can elucidate or obscure the presented data. The colormap used can honestly and clearly display data in a visually-appealing way, or can falsely exaggerate data gradients and confuse viewers. Fortunately, there is a large resource of literature in color science on how color is perceived which we can use to inform our own choices. Following this literature, colormaps can be designed to be perceptually uniform; that is, so an equally-sized jump in the colormap at any location is perceived by the viewer as the same size. This ensures that gradients in the data are accurately percieved. The same colormap is often used to represent many different fields in the same paper or presentation. However, this can cause difficulty in quick interpretation of multiple plots. For example, in one plot the viewer may have trained their eye to recognize that red represents high salinity, and therefore higher density, while in the subsequent temperature plot they need to adjust their interpretation so that red represents high temperature and therefore lower density. In the same way that a single Greek letter is typically chosen to represent a field for a paper, we propose to choose a single colormap to represent a field in a paper, and use multiple colormaps for multiple fields. We have created a set of colormaps that are perceptually uniform, and follow several other design guidelines. There are 18 colormaps to give options to choose from for intuitive representation. For example, a colormap of greens may be used to represent chlorophyll concentration, or browns for turbidity. With careful consideration of human perception and design principles, colormaps may be chosen which faithfully represent the data while also engaging viewers.

  14. Practical fulltext search in medical records

    Directory of Open Access Journals (Sweden)

    Vít Volšička

    2015-09-01

    Full Text Available Performing a search through previously existing documents, including medical reports, is an integral part of acquiring new information and educational processes. Unfortunately, finding relevant information is not always easy, since many documents are saved in free text formats, thereby making it difficult to search through them. A full-text search is a viable solution for searching through documents. The full-text search makes it possible to efficiently search through large numbers of documents and to find those that contain specific search phrases in a short time. All leading database systems currently offer full-text search, but some do not support the complex morphology of the Czech language. Apache Solr provides full support options and some full-text libraries. This programme provides the good support of the Czech language in the basic installation, and a wide range of settings and options for its deployment over any platform. The library had been satisfactorily tested using real data from the hospitals. Solr provided useful, fast, and accurate searches. However, there is still a need to make adjustments in order to receive effective search results, particularly by correcting typographical errors made not only in the text, but also when entering words in the search box and creating a list of frequently used abbreviations and synonyms for more accurate results.

  15. 38 CFR 4.46 - Accurate measurement.

    Science.gov (United States)

    2010-07-01

    ... 38 Pensions, Bonuses, and Veterans' Relief 1 2010-07-01 2010-07-01 false Accurate measurement. 4... RATING DISABILITIES Disability Ratings The Musculoskeletal System § 4.46 Accurate measurement. Accurate measurement of the length of stumps, excursion of joints, dimensions and location of scars with respect to...

  16. Geometrical and frequential watermarking scheme using similarities

    Science.gov (United States)

    Bas, Patrick; Chassery, Jean-Marc; Davoine, Franck

    1999-04-01

    Watermarking schemes are more and more robust to classical degradations. The NEC system developed by Cox, using both original and marked images, can detect the mark with a JPEG compression ratio of 30. Nevertheless a very simple geometric attack done by the program Stirmark can remove the watermark. Most of the present watermarking schemes only map a mark on the image without geometric reference and therefore are not robust to geometric transformation. We present a scheme based on the modification of a collage map (issued from a fractal code used in fractal compression). We add a mark introducing similarities in the image. The embedding of the mark is done by selection of points of interest supporting blocks on which similarities are hided. This selection is done by the Stephens-Harris detector. The similarity is embedded locally to be robust to cropping. Contrary to many schemes, the reference mark used for the detection comes from the marked image and thus undergoes geometrical distortions. The detection of the mark is done by searching interest blocks and their similarities. It does not use the original image and the robustness is guaranteed by a key. Our first results show that the similarities-based watermarking is quite robust to geometric transformation such as translations, rotations and cropping.

  17. Pentaquark searches with ALICE

    CERN Document Server

    Bobulska, Dana

    2016-01-01

    In this report we present the results of the data analysis for searching for possible invariant mass signals from pentaquarks in the ALICE data. Analysis was based on filtered data from real p-Pb events at psNN=5.02 TeV collected in 2013. The motivation for this project was the recent discovery of pentaquark states by the LHCb collaboration (c ¯ cuud resonance P+ c ) [1]. The search for similar not yet observed pentaquarks is an interesting research topic [2]. In this analysis we searched for a s ¯ suud pentaquark resonance P+ s and its possible decay channel to f meson and proton. The ALICE detector is well suited for the search of certain candidates thanks to its low material budget and strong PID capabilities. Additionally we might expect the production of such particles in ALICE as in heavy-ion and proton-ion collisions the thermal models describes well the particle yields and ratios [3]. Therefore it is reasonable to expect other species of hadrons, including also possible pentaquarks, to be produced w...

  18. Efficient heuristics for maximum common substructure search.

    Science.gov (United States)

    Englert, Péter; Kovács, Péter

    2015-05-26

    Maximum common substructure search is a computationally hard optimization problem with diverse applications in the field of cheminformatics, including similarity search, lead optimization, molecule alignment, and clustering. Most of these applications have strict constraints on running time, so heuristic methods are often preferred. However, the development of an algorithm that is both fast enough and accurate enough for most practical purposes is still a challenge. Moreover, in some applications, the quality of a common substructure depends not only on its size but also on various topological features of the one-to-one atom correspondence it defines. Two state-of-the-art heuristic algorithms for finding maximum common substructures have been implemented at ChemAxon Ltd., and effective heuristics have been developed to improve both their efficiency and the relevance of the atom mappings they provide. The implementations have been thoroughly evaluated and compared with existing solutions (KCOMBU and Indigo). The heuristics have been found to greatly improve the performance and applicability of the algorithms. The purpose of this paper is to introduce the applied methods and present the experimental results.

  19. Earthquake Fingerprints: Representing Earthquake Waveforms for Similarity-Based Detection

    Science.gov (United States)

    Bergen, K.; Beroza, G. C.

    2016-12-01

    New earthquake detection methods, such as Fingerprint and Similarity Thresholding (FAST), use fast approximate similarity search to identify similar waveforms in long-duration data without templates (Yoon et al. 2015). These methods have two key components: fingerprint extraction and an efficient search algorithm. Fingerprint extraction converts waveforms into fingerprints, compact signatures that represent short-duration waveforms for identification and search. Earthquakes are detected using an efficient indexing and search scheme, such as locality-sensitive hashing, that identifies similar waveforms in a fingerprint database. The quality of the search results, and thus the earthquake detection results, is strongly dependent on the fingerprinting scheme. Fingerprint extraction should map similar earthquake waveforms to similar waveform fingerprints to ensure a high detection rate, even under additive noise and small distortions. Additionally, fingerprints corresponding to noise intervals should have mutually dissimilar fingerprints to minimize false detections. In this work, we compare the performance of multiple fingerprint extraction approaches for the earthquake waveform similarity search problem. We apply existing audio fingerprinting (used in content-based audio identification systems) and time series indexing techniques and present modified versions that are specifically adapted for seismic data. We also explore data-driven fingerprinting approaches that can take advantage of labeled or unlabeled waveform data. For each fingerprinting approach we measure its ability to identify similar waveforms in a low signal-to-noise setting, and quantify the trade-off between true and false detection rates in the presence of persistent noise sources. We compare the performance using known event waveforms from eight independent stations in the Northern California Seismic Network.

  20. Is visual search really like foraging?

    Science.gov (United States)

    Gilchrist, I D; North, A; Hood, B

    2001-01-01

    The visual-search paradigm provides a controlled and easy to implement experimental situation in which to study the search process. However, little work has been carried out in humans to investigate the extent to which traditional visual-search tasks are similar to more general search or foraging. Here we report results from a task in which search involves walking around a room and leaning down to inspect individual locations. Consistent with more traditional search tasks, search time increases linearly with display size, and the target-present to target-absent search slope is 1:2. However, although rechecking of locations did occur, compared to more traditional search it was relatively rare, suggesting an increased role for memory.

  1. A COMPARISON OF SEMANTIC SIMILARITY MODELS IN EVALUATING CONCEPT SIMILARITY

    Directory of Open Access Journals (Sweden)

    Q. X. Xu

    2012-08-01

    Full Text Available The semantic similarities are important in concept definition, recognition, categorization, interpretation, and integration. Many semantic similarity models have been established to evaluate semantic similarities of objects or/and concepts. To find out the suitability and performance of different models in evaluating concept similarities, we make a comparison of four main types of models in this paper: the geometric model, the feature model, the network model, and the transformational model. Fundamental principles and main characteristics of these models are introduced and compared firstly. Land use and land cover concepts of NLCD92 are employed as examples in the case study. The results demonstrate that correlations between these models are very high for a possible reason that all these models are designed to simulate the similarity judgement of human mind.

  2. Renewing the Respect for Similarity

    Directory of Open Access Journals (Sweden)

    Shimon eEdelman

    2012-07-01

    Full Text Available In psychology, the concept of similarity has traditionally evoked a mixture of respect, stemmingfrom its ubiquity and intuitive appeal, and concern, due to its dependence on the framing of the problemat hand and on its context. We argue for a renewed focus on similarity as an explanatory concept, bysurveying established results and new developments in the theory and methods of similarity-preservingassociative lookup and dimensionality reduction — critical components of many cognitive functions, aswell as of intelligent data management in computer vision. We focus in particular on the growing familyof algorithms that support associative memory by performing hashing that respects local similarity, andon the uses of similarity in representing structured objects and scenes. Insofar as these similarity-basedideas and methods are useful in cognitive modeling and in AI applications, they should be included inthe core conceptual toolkit of computational neuroscience.

  3. Design of transonic airfoil sections using a similarity theory

    Science.gov (United States)

    Nixon, D.

    1978-01-01

    A study of the available methods for transonic airfoil and wing design indicates that the most powerful technique is the numerical optimization procedure. However, the computer time for this method is relatively large because of the amount of computation required in the searches during optimization. The optimization method requires that base and calibration solutions be computed to determine a minimum drag direction. The design space is then computationally searched in this direction; it is these searches that dominate the computation time. A recent similarity theory allows certain transonic flows to be calculated rapidly from the base and calibration solutions. In this paper the application of the similarity theory to design problems is examined with the object of at least partially eliminating the costly searches of the design optimization method. An example of an airfoil design is presented.

  4. Meta Search Engines.

    Science.gov (United States)

    Garman, Nancy

    1999-01-01

    Describes common options and features to consider in evaluating which meta search engine will best meet a searcher's needs. Discusses number and names of engines searched; other sources and specialty engines; search queries; other search options; and results options. (AEF)

  5. Similarity-Based Prediction of Travel Times for Vehicles Traveling on Known Routes

    DEFF Research Database (Denmark)

    Tiesyte, Dalia; Jensen, Christian Søndergaard

    2008-01-01

    , historical data in combination with real-time data may be used to predict the future travel times of vehicles more accurately, thus improving the experience of the users who rely on such information. We propose a Nearest-Neighbor Trajectory (NNT) technique that identifies the historical trajectory......The use of centralized, real-time position tracking is proliferating in the areas of logistics and public transportation. Real-time positions can be used to provide up-to-date information to a variety of users, and they can also be accumulated for uses in subsequent data analyses. In particular...... of vehicles that travel along known routes. In empirical studies with real data from buses, we evaluate how well the proposed distance functions are capable of predicting future vehicle movements. Second, we propose a main-memory index structure that enables incremental similarity search and that is capable...

  6. Content-Based Visual Landmark Search via Multimodal Hypergraph Learning.

    Science.gov (United States)

    Zhu, Lei; Shen, Jialie; Jin, Hai; Zheng, Ran; Xie, Liang

    2015-12-01

    While content-based landmark image search has recently received a lot of attention and became a very active domain, it still remains a challenging problem. Among the various reasons, high diverse visual content is the most significant one. It is common that for the same landmark, images with a wide range of visual appearances can be found from different sources and different landmarks may share very similar sets of images. As a consequence, it is very hard to accurately estimate the similarities between the landmarks purely based on single type of visual feature. Moreover, the relationships between landmark images can be very complex and how to develop an effective modeling scheme to characterize the associations still remains an open question. Motivated by these concerns, we propose multimodal hypergraph (MMHG) to characterize the complex associations between landmark images. In MMHG, images are modeled as independent vertices and hyperedges contain several vertices corresponding to particular views. Multiple hypergraphs are firstly constructed independently based on different visual modalities to describe the hidden high-order relations from different aspects. Then, they are integrated together to involve discriminative information from heterogeneous sources. We also propose a novel content-based visual landmark search system based on MMHG to facilitate effective search. Distinguished from the existing approaches, we design a unified computational module to support query-specific combination weight learning. An extensive experiment study on a large-scale test collection demonstrates the effectiveness of our scheme over state-of-the-art approaches.

  7. Quantum-Accurate Molecular Dynamics Potential for Tungsten

    Energy Technology Data Exchange (ETDEWEB)

    Wood, Mitchell; Thompson, Aidan P.

    2017-03-01

    The purpose of this short contribution is to report on the development of a Spectral Neighbor Analysis Potential (SNAP) for tungsten. We have focused on the characterization of elastic and defect properties of the pure material in order to support molecular dynamics simulations of plasma-facing materials in fusion reactors. A parallel genetic algorithm approach was used to efficiently search for fitting parameters optimized against a large number of objective functions. In addition, we have shown that this many-body tungsten potential can be used in conjunction with a simple helium pair potential1 to produce accurate defect formation energies for the W-He binary system.

  8. Dynamic similarity in erosional processes

    Science.gov (United States)

    Scheidegger, A.E.

    1963-01-01

    A study is made of the dynamic similarity conditions obtaining in a variety of erosional processes. The pertinent equations for each type of process are written in dimensionless form; the similarity conditions can then easily be deduced. The processes treated are: raindrop action, slope evolution and river erosion. ?? 1963 Istituto Geofisico Italiano.

  9. Renewing the respect for similarity.

    Science.gov (United States)

    Edelman, Shimon; Shahbazi, Reza

    2012-01-01

    In psychology, the concept of similarity has traditionally evoked a mixture of respect, stemming from its ubiquity and intuitive appeal, and concern, due to its dependence on the framing of the problem at hand and on its context. We argue for a renewed focus on similarity as an explanatory concept, by surveying established results and new developments in the theory and methods of similarity-preserving associative lookup and dimensionality reduction-critical components of many cognitive functions, as well as of intelligent data management in computer vision. We focus in particular on the growing family of algorithms that support associative memory by performing hashing that respects local similarity, and on the uses of similarity in representing structured objects and scenes. Insofar as these similarity-based ideas and methods are useful in cognitive modeling and in AI applications, they should be included in the core conceptual toolkit of computational neuroscience. In support of this stance, the present paper (1) offers a discussion of conceptual, mathematical, computational, and empirical aspects of similarity, as applied to the problems of visual object and scene representation, recognition, and interpretation, (2) mentions some key computational problems arising in attempts to put similarity to use, along with their possible solutions, (3) briefly states a previously developed similarity-based framework for visual object representation, the Chorus of Prototypes, along with the empirical support it enjoys, (4) presents new mathematical insights into the effectiveness of this framework, derived from its relationship to locality-sensitive hashing (LSH) and to concomitant statistics, (5) introduces a new model, the Chorus of Relational Descriptors (ChoRD), that extends this framework to scene representation and interpretation, (6) describes its implementation and testing, and finally (7) suggests possible directions in which the present research program can be

  10. Self-similar aftershock rates

    CERN Document Server

    Davidsen, Jörn

    2016-01-01

    In many important systems exhibiting crackling noise --- intermittent avalanche-like relaxation response with power-law and, thus, self-similar distributed event sizes --- the "laws" for the rate of activity after large events are not consistent with the overall self-similar behavior expected on theoretical grounds. This is in particular true for the case of seismicity and a satisfying solution to this paradox has remained outstanding. Here, we propose a generalized description of the aftershock rates which is both self-similar and consistent with all other known self-similar features. Comparing our theoretical predictions with high resolution earthquake data from Southern California we find excellent agreement, providing in particular clear evidence for a unified description of aftershocks and foreshocks. This may offer an improved way of time-dependent seismic hazard assessment and earthquake forecasting.

  11. Method of similarity for cavitation

    Energy Technology Data Exchange (ETDEWEB)

    Espanet, L.; Tekatlian, A.; Barbier, D. [CEA/Cadarache, Dept. d' Etudes des Combustibles (DEC), 13 - Saint-Paul-lez-Durance (France); Gouin, H. [Aix-Marseille-3 Univ., 13 - Marseille (France). Laboratoire de Modelisation en Mecanique et Thermodynamique

    1998-07-01

    The knowledge of possible cavitation in subassembly nozzles of the fast reactor core implies the realization of a fluid dynamic model test. We propose a method of similarity based on the non-dimensionalization of the equation of motion for viscous capillarity fluid issued from the Cahn and Hilliard model. Taking into account the dissolved gas effect, a condition of compatibility is determined. This condition must be respected by the fluid in experiment, along with the scaling between the two similar flows. (author)

  12. Self-Similar Isentropic Implosions

    Energy Technology Data Exchange (ETDEWEB)

    Rodriguez, M.; Amable, L.

    1980-07-01

    The self-similar compression of an isentropic spherical gas pellet Is analyzed for large values of the ratio of the final to initial densities. An asymptotic analysis provides the solution corresponding to a prescribed value of the final density when it is high. In addition an approximate solution is given when the specific heat ratio is not constant. The time evolution of the pressure on the outer surface leading to the self-similar solutions, is calculated for large density ratios. (Author)

  13. Accurate Biomass Estimation via Bayesian Adaptive Sampling

    Science.gov (United States)

    Wheeler, K.; Knuth, K.; Castle, P.

    2005-12-01

    Typical estimates of standing wood derived from remote sensing sources take advantage of aggregate measurements of canopy heights (e.g. LIDAR) and canopy diameters (segmentation of IKONOS imagery) to obtain a wood volume estimate by assuming homogeneous species and a fixed function that returns volume. The validation of such techniques use manually measured diameter at breast height records (DBH). Our goal is to improve the accuracy and applicability of biomass estimation methods to heterogeneous forests and transitional areas. We are developing estimates with quantifiable uncertainty using a new form of estimation function, active sampling, and volumetric reconstruction image rendering for species specific mass truth. Initially we are developing a Bayesian adaptive sampling method for BRDF associated with the MISR Rahman model with respect to categorical biomes. This involves characterizing the probability distributions of the 3 free parameters of the Rahman model for the 6 categories of biomes used by MISR. Subsequently, these distributions can be used to determine the optimal sampling methodology to distinguish biomes during acquisition. We have a remotely controlled semi-autonomous helicopter that has stereo imaging, lidar, differential GPS, and spectrometers covering wavelengths from visible to NIR. We intend to automatically vary the way points of the flight path via the Bayesian adaptive sampling method. The second critical part of this work is in automating the validation of biomass estimates via using machine vision techniques. This involves taking 2-D pictures of trees of known species, and then via Bayesian techniques, reconstructing 3-D models of the trees to estimate the distribution moments associated with wood volume. Similar techniques have been developed by the medical imaging community. This then provides probability distributions conditional upon species. The final part of this work is in relating the BRDF actively sampled measurements to species

  14. Spectral clustering based on learning similarity matrix.

    Science.gov (United States)

    Park, Seyoung; Zhao, Hongyu; Birol, Inanc

    2018-02-08

    Single-cell RNA-sequencing (scRNA-seq) technology can generate genome-wide expression data at the single-cell levels. One important objective in scRNA-seq analysis is to cluster cells where each cluster consists of cells belonging to the same cell type based on gene expression patterns. We introduce a novel spectral clustering framework that imposes sparse structures on a target matrix. Specifically, we utilize multiple doubly stochastic similarity matrices to learn a similarity matrix, motivated by the observation that each similarity matrix can be a different informative representation of the data. We impose a sparse structure on the target matrix followed by shrinking pairwise differences of the rows in the target matrix, motivated by the fact that the target matrix should have these structures in the ideal case. We solve the proposed non-convex problem iteratively using the ADMM algorithm and show the convergence of the algorithm. We evaluate the performance of the proposed clustering method on various simulated as well as real scRNA-seq data, and show that it can identify clusters accurately and robustly. The algorithm is implemented in MATLAB. The source code can be downloaded at https://github.com/ishspsy/project/tree/master/MPSSC. seyoung.park@yale.edu. Supplementary data are available at Bioinformatics online.

  15. Conversation level syntax similarity metric.

    Science.gov (United States)

    Boghrati, Reihane; Hoover, Joe; Johnson, Kate M; Garten, Justin; Dehghani, Morteza

    2017-07-11

    The syntax and semantics of human language can illuminate many individual psychological differences and important dimensions of social interaction. Accordingly, psychological and psycholinguistic research has begun incorporating sophisticated representations of semantic content to better understand the connection between word choice and psychological processes. In this work we introduce ConversAtion level Syntax SImilarity Metric (CASSIM), a novel method for calculating conversation-level syntax similarity. CASSIM estimates the syntax similarity between conversations by automatically generating syntactical representations of the sentences in conversation, estimating the structural differences between them, and calculating an optimized estimate of the conversation-level syntax similarity. After introducing and explaining this method, we report results from two method validation experiments (Study 1) and conduct a series of analyses with CASSIM to investigate syntax accommodation in social media discourse (Study 2). We run the same experiments using two well-known existing syntactic metrics, LSM and Coh-Metrix, and compare their results to CASSIM. Overall, our results indicate that CASSIM is able to reliably measure syntax similarity and to provide robust evidence of syntax accommodation within social media discourse.

  16. Visualizing multiple word similarity measures.

    Science.gov (United States)

    Kievit-Kylar, Brent; Jones, Michael N

    2012-09-01

    Although many recent advances have taken place in corpus-based tools, the techniques used to guide exploration and evaluation of these systems have advanced little. Typically, the plausibility of a semantic space is explored by sampling the nearest neighbors to a target word and evaluating the neighborhood on the basis of the modeler's intuition. Tools for visualization of these large-scale similarity spaces are nearly nonexistent. We present a new open-source tool to plot and visualize semantic spaces, thereby allowing researchers to rapidly explore patterns in visual data that describe the statistical relations between words. Words are visualized as nodes, and word similarities are shown as directed edges of varying strengths. The "Word-2-Word" visualization environment allows for easy manipulation of graph data to test word similarity measures on their own or in comparisons between multiple similarity metrics. The system contains a large library of statistical relationship models, along with an interface to teach them from various language sources. The modularity of the visualization environment allows for quick insertion of new similarity measures so as to compare new corpus-based metrics against the current state of the art. The software is available at www.indiana.edu/~semantic/word2word/.

  17. Similarity measures for face recognition

    CERN Document Server

    Vezzetti, Enrico

    2015-01-01

    Face recognition has several applications, including security, such as (authentication and identification of device users and criminal suspects), and in medicine (corrective surgery and diagnosis). Facial recognition programs rely on algorithms that can compare and compute the similarity between two sets of images. This eBook explains some of the similarity measures used in facial recognition systems in a single volume. Readers will learn about various measures including Minkowski distances, Mahalanobis distances, Hansdorff distances, cosine-based distances, among other methods. The book also summarizes errors that may occur in face recognition methods. Computer scientists "facing face" and looking to select and test different methods of computing similarities will benefit from this book. The book is also useful tool for students undertaking computer vision courses.

  18. A Survey of Binary Similarity and Distance Measures

    Directory of Open Access Journals (Sweden)

    Seung-Seok Choi

    2010-02-01

    Full Text Available The binary feature vector is one of the most common representations of patterns and measuring similarity and distance measures play a critical role in many problems such as clustering, classification, etc. Ever since Jaccard proposed a similarity measure to classify ecological species in 1901, numerous binary similarity and distance measures have been proposed in various fields. Applying appropriate measures results in more accurate data analysis. Notwithstanding, few comprehensive surveys on binary measures have been conducted. Hence we collected 76 binary similarity and distance measures used over the last century and reveal their correlations through the hierarchical clustering technique.

  19. Revisiting Inter-Genre Similarity

    DEFF Research Database (Denmark)

    Sturm, Bob L.; Gouyon, Fabien

    2013-01-01

    We revisit the idea of ``inter-genre similarity'' (IGS) for machine learning in general, and music genre recognition in particular. We show analytically that the probability of error for IGS is higher than naive Bayes classification with zero-one loss (NB). We show empirically that IGS does...

  20. A study of Consistency in the Selection of Search Terms and Search Concepts: A Case Study in National Taiwan University

    Directory of Open Access Journals (Sweden)

    Mu-hsuan Huang

    2001-12-01

    Full Text Available This article analyzes the consistency in the selection of search terms and search contents of college and graduate students in National Taiwan University when they are using PsycLIT CD-ROM database. 31 students conducted pre-assigned searches, doing 59 searches generating 609 search terms. The study finds the consistency in selection of search terms of first level is 22.14% and second level is 35%. These results are similar with others’ researches. About the consistency in search concepts, no matter the overlaps of searched articles or judge relevant articles are lower than other researches. [Article content in Chinese

  1. Searching for Dark Sectors

    Science.gov (United States)

    Zhong, Yi-Ming

    The existence of Dark Matter suggests the presence of a dark sector, consisting of particles neutral under all Standard Model forces. Various "portals" can connect the dark sector to the Standard Model sector. Two popular examples are the vector portal, which gives rise to a "dark photon" ( A'), and the Higgs portal, which gives rise to a "dark Higgs". Such dark forces appear in many well-motivated extensions of the Standard Model. In some cases, they may resolve discrepancies between experimental data and theoretical predictions, such as the muon anomalous magnetic moment. We show that dark sectors and forces can be constrained from several novel probes in current and future experiments, including mono- photon searches in low-energy positron-electron colliders, rare muon decays in the Mu3e, and exotic Higgs decays at the Large Hadron Collider (LHC). We first investigate the power of low-energy, high-luminosity electron-positron colliders to probe dark sectors with a mass below 10 GeV, which couple to Standard Model particles through a low-mass dark mediator. Dark matter candidates in this mass range are well- motivated and can give rise to distinctive mono-photon signals at B-factories and similar experiments. We use data from an existing mono-photon search by BABAR to place new constraints on this class of models, and give projections for the sensitivity of a similar search at a future B-factory such as Belle II. We find that the sensitivity of such searches are more powerful than searches at other collider or fixed-target facilities for dark-sector mediators and particles with masses between a few hundred MeV and 10 GeV. We compare our results to existing and future direct detection experiments and show that low-energy colliders provide an indispensable and complementary avenue to search for light dark matter. We also find that dark photons with masses 10 MeV - 80 MeV can probed in the rare muon decay process mu+→ e+nue numuA', A '→ e+ e-, in the upcoming Mu

  2. A Semantics-Based Measure of Emoji Similarity

    OpenAIRE

    Wijeratne, Sanjaya; Balasuriya, Lakshika; Sheth, Amit; Doran, Derek

    2017-01-01

    Emoji have grown to become one of the most important forms of communication on the web. With its widespread use, measuring the similarity of emoji has become an important problem for contemporary text processing since it lies at the heart of sentiment analysis, search, and interface design tasks. This paper presents a comprehensive analysis of the semantic similarity of emoji through embedding models that are learned over machine-readable emoji meanings in the EmojiNet knowledge base. Using e...

  3. Accurate test limits under prescribed consumer risk

    NARCIS (Netherlands)

    Albers, Willem/Wim; Arts, G.R.J.; Kallenberg, W.C.M.

    1997-01-01

    Measurement errors occurring during inspection of manufactured parts force producers to replace specification limits by slightly more strict test limits. Here accurate test limits are presented which maximize the yield while limiting the fraction of defectives reaching the consumer.

  4. How to use medical search engines?

    Directory of Open Access Journals (Sweden)

    Saurav Khatiwada

    2015-01-01

    Full Text Available In this era of Google search, it is easy for beginners to fancy literature review limited to this popular search engine. Unfortunately, this will miss a vast index of articles which exist within our reach. Some specialized search portals leading to their corresponding databases deal with the tremendous medical literature that has been generated over decades. This article deals with the “what?” and “how to?” of these available databases of the medical articles. This will make your literature review efficient and the confidence in collected evidence - accurate.

  5. Efficient k-NN search on vertically decomposed data

    NARCIS (Netherlands)

    A.P. de Vries (Arjen); N. Mamoulis; N.J. Nes (Niels); M.L. Kersten (Martin)

    2002-01-01

    textabstractApplications like multimedia retrieval require efficient support for similarity search on large data collections. Yet, nearest neighbor search is a difficult problem in high dimensional spaces, rendering efficient applications hard to realize: index structures degrade rapidly with

  6. Scaling, similarity, and the fourth paradigm for hydrology

    NARCIS (Netherlands)

    Peters-Lidard, Christa D.; Clark, Martyn; Samaniego, Luis; Verhoest, Niko E.C.; Emmerik, Van Tim; Uijlenhoet, Remko; Achieng, Kevin; Franz, Trenton E.; Woods, Ross A.

    2017-01-01

    In this synthesis paper addressing hydrologic scaling and similarity, we posit that roadblocks in the search for universal laws of hydrology are hindered by our focus on computational simulation (the third paradigm) and assert that it is time for hydrology to embrace a fourth paradigm of

  7. Similarity measures for protein ensembles

    DEFF Research Database (Denmark)

    Lindorff-Larsen, Kresten; Ferkinghoff-Borg, Jesper

    2009-01-01

    a synthetic example from molecular dynamics simulations. We then apply the algorithms to revisit the problem of ensemble averaging during structure determination of proteins, and find that an ensemble refinement method is able to recover the correct distribution of conformations better than standard single....... However, instead of examining individual conformations it is in many cases more relevant to analyse ensembles of conformations that have been obtained either through experiments or from methods such as molecular dynamics simulations. We here present three approaches that can be used to compare......Analyses of similarities and changes in protein conformation can provide important information regarding protein function and evolution. Many scores, including the commonly used root mean square deviation, have therefore been developed to quantify the similarities of different protein conformations...

  8. Google Ajax Search API

    CERN Document Server

    Fitzgerald, Michael

    2007-01-01

    Use the Google Ajax Search API to integrateweb search, image search, localsearch, and other types of search intoyour web site by embedding a simple, dynamicsearch box to display search resultsin your own web pages using a fewlines of JavaScript. For those who do not want to write code,the search wizards and solutions builtwith the Google Ajax Search API generatecode to accomplish common taskslike adding local search results to a GoogleMaps API mashup, adding videosearch thumbnails to your web site, oradding a news reel with the latest up todate stories to your blog. More advanced users can

  9. Vaccine-related internet search activity predicts H1N1 and HPV vaccine coverage: implications for vaccine acceptance.

    Science.gov (United States)

    Kalichman, Seth C; Kegler, Christopher

    2015-01-01

    The Internet is a primary source for health-related information, and Internet search activity is associated with infectious disease outbreaks. The authors hypothesized that Internet search activity for vaccine-related information would predict vaccination coverage. They examined Internet search activity for H1N1 and human papilloma virus (HPV) disease and vaccine information in relation to H1N1 and HPV vaccine uptake. Google Insight for Search was used to assess the volume of Internet search queries for H1N1- and vaccine-related terms in the United States in 2009, the year of the H1N1 pandemic. Vaccine coverage data were also obtained from the Centers for Disease Control and Prevention at the state level for H1N1 vaccinations in 2009. These same measures were collected at the state level for HPV- and vaccine-related search terms in 2010 as well as HPV vaccine uptake in that year. Analyses showed that the search terms H1N1 and vaccine were correlated with H1N1 vaccine uptake; ordinal regression found the H1N1 search term was independently associated with H1N1 vaccine coverage. Similarly, the correlation between vaccine search volume and HPV coverage was significant; ordinal regression showed the search term vaccine independently predicted HPV vaccination coverage. This is among the first studies to show that Internet search activity is associated with vaccination coverage. The Internet should be exploited as an opportunity to dispel vaccine misinformation by providing accurate information to support vaccine decision making.

  10. Reading and visual search: a developmental study in normal children.

    Directory of Open Access Journals (Sweden)

    Magali Seassau

    Full Text Available Studies dealing with developmental aspects of binocular eye movement behaviour during reading are scarce. In this study we have explored binocular strategies during reading and during visual search tasks in a large population of normal young readers. Binocular eye movements were recorded using an infrared video-oculography system in sixty-nine children (aged 6 to 15 and in a group of 10 adults (aged 24 to 39. The main findings are (i in both tasks the number of progressive saccades (to the right and regressive saccades (to the left decreases with age; (ii the amplitude of progressive saccades increases with age in the reading task only; (iii in both tasks, the duration of fixations as well as the total duration of the task decreases with age; (iv in both tasks, the amplitude of disconjugacy recorded during and after the saccades decreases with age; (v children are significantly more accurate in reading than in visual search after 10 years of age. Data reported here confirms and expands previous studies on children's reading. The new finding is that younger children show poorer coordination than adults, both while reading and while performing a visual search task. Both reading skills and binocular saccades coordination improve with age and children reach a similar level to adults after the age of 10. This finding is most likely related to the fact that learning mechanisms responsible for saccade yoking develop during childhood until adolescence.

  11. Reading and visual search: a developmental study in normal children.

    Science.gov (United States)

    Seassau, Magali; Bucci, Maria-Pia

    2013-01-01

    Studies dealing with developmental aspects of binocular eye movement behaviour during reading are scarce. In this study we have explored binocular strategies during reading and during visual search tasks in a large population of normal young readers. Binocular eye movements were recorded using an infrared video-oculography system in sixty-nine children (aged 6 to 15) and in a group of 10 adults (aged 24 to 39). The main findings are (i) in both tasks the number of progressive saccades (to the right) and regressive saccades (to the left) decreases with age; (ii) the amplitude of progressive saccades increases with age in the reading task only; (iii) in both tasks, the duration of fixations as well as the total duration of the task decreases with age; (iv) in both tasks, the amplitude of disconjugacy recorded during and after the saccades decreases with age; (v) children are significantly more accurate in reading than in visual search after 10 years of age. Data reported here confirms and expands previous studies on children's reading. The new finding is that younger children show poorer coordination than adults, both while reading and while performing a visual search task. Both reading skills and binocular saccades coordination improve with age and children reach a similar level to adults after the age of 10. This finding is most likely related to the fact that learning mechanisms responsible for saccade yoking develop during childhood until adolescence.

  12. Haptic Influence on Visual Search

    Directory of Open Access Journals (Sweden)

    Marcia Grabowecky

    2011-10-01

    Full Text Available Information from different sensory modalities influences perception and attention in tasks such as visual search. We have previously reported identity-based crossmodal influences of audition on visual search (Iordanescu, Guzman-Martinez, Grabowecky, & Suzuki, 2008; Iordanescu, Grabowecky, Franconeri, Theeuwes, and Suzuki, 2010; Iordanescu, Grabowecky and Suzuki, 2011. Here, we extend those results and demonstrate a novel crossmodal interaction between haptic shape information and visual attention. Manually-explored, but unseen, shapes facilitated visual search for similarly-shaped objects. This effect manifests as a reduction in both overall search times and initial saccade latencies when the haptic shape (eg, a sphere is consistent with a visual target (eg, an orange compared to when it is inconsistent with a visual target (eg, a hockey puck]. This haptic-visual interaction occurred even though the manually held shapes were not predictive of the visual target's shape or location, suggesting that the interaction occurs automatically. Furthermore, when the haptic shape was consistent with a distracter in the visual search array (instead of with the target, initial saccades toward the target were disrupted. Together, these results demonstrate a robust shape-specific haptic influence on visual search.

  13. Search Engine For Ebook Portal

    Directory of Open Access Journals (Sweden)

    Prashant Kanade

    2017-05-01

    Full Text Available The purpose of this paper is to establish the textual analytics involved in developing a search engine for an ebook portal. We have extracted our dataset from Project Gutenberg using a robot harvester. Textual Analytics is used for efficient search retrieval. The entire dataset is represented using Vector Space Model where each document is a vector in the vector space. Further for computational purposes we represent our dataset in the form of a Term Frequency- Inverse Document Frequency tf-idf matrix. The first step involves obtaining the most coherent sequence of words of the search query entered. The entered query is processed using Front End algorithms this includes-Spell Checker Text Segmentation and Language Modeling. Back End processing includes Similarity Modeling Clustering Indexing and Retrieval. The relationship between documents and words is established using cosine similarity measured between the documents and words in Vector Space. Clustering performed is used to suggest books that are similar to the search query entered by the user. Lastly the Lucene Based Elasticsearch engine is used for indexing on the documents. This allows faster retrieval of data. Elasticsearch returns a dictionary and creates a tf-idf matrix. The processed query is compared with the dictionary obtained and tf-idf matrix is used to calculate the score for each match to give most relevant result.

  14. How doctors search

    DEFF Research Database (Denmark)

    Lykke, Marianne; Price, Susan; Delcambre, Lois

    2012-01-01

    Professional, workplace searching is different from general searching, because it is typically limited to specific facets and targeted to a single answer. We have developed the semantic component (SC) model, which is a search feature that allows searchers to structure and specify the search...... to context-specific aspects of the main topic of the documents. We have tested the model in an interactive searching study with family doctors with the purpose to explore doctors’ querying behaviour, how they applied the means for specifying a search, and how these features contributed to the search outcome....... In general, the doctors were capable of exploiting system features and search tactics during the searching. Most searchers produced well-structured queries that contained appropriate search facets. When searches failed it was not due to query structure or query length. Failures were mostly caused by the well...

  15. An intelligent method for geographic Web search

    Science.gov (United States)

    Mei, Kun; Yuan, Ying

    2008-10-01

    While the electronically available information in the World-Wide Web is explosively growing and thus increasing, the difficulty to find relevant information is also increasing for search engine user. In this paper we discuss how to constrain web queries geographically. A number of search queries are associated with geographical locations, either explicitly or implicitly. Accurately and effectively detecting the locations where search queries are truly about has huge potential impact on increasing search relevance, bringing better targeted search results, and improving search user satisfaction. Our approach focus on both in the way geographic information is extracted from the web and, as far as we can tell, in the way it is integrated into query processing. This paper gives an overview of a spatially aware search engine for semantic querying of web document. It also illustrates algorithms for extracting location from web documents and query requests using the location ontologies to encode and reason about formal semantics of geographic web search. Based on a real-world scenario of tourism guide search, the application of our approach shows that the geographic information retrieval can be efficiently supported.

  16. Landscape similarity, retrieval, and machine mapping of physiographic units

    Science.gov (United States)

    Jasiewicz, Jaroslaw; Netzel, Pawel; Stepinski, Tomasz F.

    2014-09-01

    We introduce landscape similarity - a numerical measure that assesses affinity between two landscapes on the basis of similarity between the patterns of their constituent landform elements. Such a similarity function provides core technology for a landscape search engine - an algorithm that parses the topography of a study area and finds all places with landscapes broadly similar to a landscape template. A landscape search can yield answers to a query in real time, enabling a highly effective means to explore large topographic datasets. In turn, a landscape search facilitates auto-mapping of physiographic units within a study area. The country of Poland serves as a test bed for these novel concepts. The topography of Poland is given by a 30 m resolution DEM. The geomorphons method is applied to this DEM to classify the topography into ten common types of landform elements. A local landscape is represented by a square tile cut out of a map of landform elements. A histogram of cell-pair features is used to succinctly encode the composition and texture of a pattern within a local landscape. The affinity between two local landscapes is assessed using the Wave-Hedges similarity function applied to the two corresponding histograms. For a landscape search the study area is organized into a lattice of local landscapes. During the search the algorithm calculates the similarity between each local landscape and a given query. Our landscape search for Poland is implemented as a GeoWeb application called TerraEx-Pl and is available at http://sil.uc.edu/. Given a sample, or a number of samples, from a target physiographic unit the landscape search delineates this unit using the principles of supervised machine learning. Repeating this procedure for all units yields a complete physiographic map. The application of this methodology to topographic data of Poland results in the delineation of nine physiographic units. The resultant map bears a close resemblance to a conventional

  17. Code Tagging and Similarity-based Retrieval with myCBR

    Science.gov (United States)

    Roth-Berghofer, Thomas R.; Bahls, Daniel

    This paper describes the code tagging plug-in coTag, which allows annotating code snippets in the integrated development environment eclipse. coTag offers an easy-to-use interface for tagging and searching. Using the similarity-based search engine of the open-source tool myCBR, the user can search not only for exactly the same tags as offered by other code tagging extensions, but also for similar tags and, thus, for similar code snippets. coTag provides means for context-based adding of new as well as changing of existing similarity links between tags, supported by myCBR's explanation component.

  18. Dark matter searches

    Science.gov (United States)

    Bettini, Alessandro

    These lectures begin with a brief survey of the astrophysical and cosmological evidence for dark matter. We then consider the three principal theoretically motivated types of dark matter, sterile neutrinos, axions and SUSY WIMPs. In chapter 4 we discuss the motivations for the so-called neutrino minimal standard model, nuMSM, an extension of the SM with three sterile neutrinos with masses similar to the charged fermions. In chapter 5 we briefly recall the strong CP problem of the SM and the solution proposed by Peccei and Quinn leading to the prediction of axions and of their characteristics. We then discuss the experimental status and perspectives. In chapter 6 we assume that the reader to be acquainted with the theoretical motivations for SUSY and move directly to the direct search for dark matter and the description of the principal detector techniques: scintillators, noble fluids and bolometers. We conclude with an outlook on the future perspectives.

  19. OrthoReD: a rapid and accurate orthology prediction tool with low computational requirement.

    Science.gov (United States)

    Battenberg, Kai; Lee, Ernest K; Chiu, Joanna C; Berry, Alison M; Potter, Daniel

    2017-06-21

    Identifying orthologous genes is an initial step required for phylogenetics, and it is also a common strategy employed in functional genetics to find candidates for functionally equivalent genes across multiple species. At the same time, in silico orthology prediction tools often require large computational resources only available on computing clusters. Here we present OrthoReD, an open-source orthology prediction tool with accuracy comparable to published tools that requires only a desktop computer. The low computational resource requirement of OrthoReD is achieved by repeating orthology searches on one gene of interest at a time, thereby generating a reduced dataset to limit the scope of orthology search for each gene of interest. The output of OrthoReD was highly similar to the outputs of two other published orthology prediction tools, OrthologID and/or OrthoDB, for the three dataset tested, which represented three phyla with different ranges of species diversity and different number of genomes included. Median CPU time for ortholog prediction per gene by OrthoReD executed on a desktop computer was <15 min even for the largest dataset tested, which included all coding sequences of 100 bacterial species. With high-throughput sequencing, unprecedented numbers of genes from non-model organisms are available with increasing need for clear information about their orthologies and/or functional equivalents in model organisms. OrthoReD is not only fast and accurate as an orthology prediction tool, but also gives researchers flexibility in the number of genes analyzed at a time, without requiring a high-performance computing cluster.

  20. submitter Searches at LEP

    CERN Document Server

    Kawagoe, Kiyotomo

    2001-01-01

    Searches for new particles and new physics were extensively performed at LEP. Although no evidence for new particle/physics was discovered, the null results set very stringent limits to theories beyond the standard model. In this paper, searches at LEP and anomalies observed in the searches are presented. Future prospect of searches at the new energy frontier machines is also discussed.

  1. Web Search Engines

    OpenAIRE

    Rajashekar, TB

    1998-01-01

    The World Wide Web is emerging as an all-in-one information source. Tools for searching Web-based information include search engines, subject directories and meta search tools. We take a look at key features of these tools and suggest practical hints for effective Web searching.

  2. Sound Search Engine Concept

    DEFF Research Database (Denmark)

    2006-01-01

    Sound search is provided by the major search engines, however, indexing is text based, not sound based. We will establish a dedicated sound search services with based on sound feature indexing. The current demo shows the concept of the sound search engine. The first engine will be realased June...

  3. MultiMSOAR 2.0: an accurate tool to identify ortholog groups among multiple genomes.

    Directory of Open Access Journals (Sweden)

    Guanqun Shi

    Full Text Available The identification of orthologous genes shared by multiple genomes plays an important role in evolutionary studies and gene functional analyses. Based on a recently developed accurate tool, called MSOAR 2.0, for ortholog assignment between a pair of closely related genomes based on genome rearrangement, we present a new system MultiMSOAR 2.0, to identify ortholog groups among multiple genomes in this paper. In the system, we construct gene families for all the genomes using sequence similarity search and clustering, run MSOAR 2.0 for all pairs of genomes to obtain the pairwise orthology relationship, and partition each gene family into a set of disjoint sets of orthologous genes (called super ortholog groups or SOGs such that each SOG contains at most one gene from each genome. For each such SOG, we label the leaves of the species tree using 1 or 0 to indicate if the SOG contains a gene from the corresponding species or not. The resulting tree is called a tree of ortholog groups (or TOGs. We then label the internal nodes of each TOG based on the parsimony principle and some biological constraints. Ortholog groups are finally identified from each fully labeled TOG. In comparison with a popular tool MultiParanoid on simulated data, MultiMSOAR 2.0 shows significantly higher prediction accuracy. It also outperforms MultiParanoid, the Roundup multi-ortholog repository and the Ensembl ortholog database in real data experiments using gene symbols as a validation tool. In addition to ortholog group identification, MultiMSOAR 2.0 also provides information about gene births, duplications and losses in evolution, which may be of independent biological interest. Our experiments on simulated data demonstrate that MultiMSOAR 2.0 is able to infer these evolutionary events much more accurately than a well-known software tool Notung. The software MultiMSOAR 2.0 is available to the public for free.

  4. Similarity recognition of online data curves based on dynamic spatial time warping for the estimation of lithium-ion battery capacity

    Science.gov (United States)

    Tao, Laifa; Lu, Chen; Noktehdan, Azadeh

    2015-10-01

    Battery capacity estimation is a significant recent challenge given the complex physical and chemical processes that occur within batteries and the restrictions on the accessibility of capacity degradation data. In this study, we describe an approach called dynamic spatial time warping, which is used to determine the similarities of two arbitrary curves. Unlike classical dynamic time warping methods, this approach can maintain the invariance of curve similarity to the rotations and translations of curves, which is vital in curve similarity search. Moreover, it utilizes the online charging or discharging data that are easily collected and do not require special assumptions. The accuracy of this approach is verified using NASA battery datasets. Results suggest that the proposed approach provides a highly accurate means of estimating battery capacity at less time cost than traditional dynamic time warping methods do for different individuals and under various operating conditions.

  5. VaST: A variability search toolkit

    Science.gov (United States)

    Sokolovsky, K. V.; Lebedev, A. A.

    2018-01-01

    Variability Search Toolkit (VaST) is a software package designed to find variable objects in a series of sky images. It can be run from a script or interactively using its graphical interface. VaST relies on source list matching as opposed to image subtraction. SExtractor is used to generate source lists and perform aperture or PSF-fitting photometry (with PSFEx). Variability indices that characterize scatter and smoothness of a lightcurve are computed for all objects. Candidate variables are identified as objects having high variability index values compared to other objects of similar brightness. The two distinguishing features of VaST are its ability to perform accurate aperture photometry of images obtained with non-linear detectors and handle complex image distortions. The software has been successfully applied to images obtained with telescopes ranging from 0.08 to 2.5 m in diameter equipped with a variety of detectors including CCD, CMOS, MIC and photographic plates. About 1800 variable stars have been discovered with VaST. It is used as a transient detection engine in the New Milky Way (NMW) nova patrol. The code is written in C and can be easily compiled on the majority of UNIX-like systems. VaST is free software available at http://scan.sai.msu.ru/vast/.

  6. Search for Pentaquarks at BABAR

    CERN Document Server

    Goetzen, K

    2007-01-01

    The results of searches for the strange pentaquark states, Theta(1540)+, Xi(1860)-- and Xi(1860)0 in data recorded by the BABAR experiment are presented. We search for these three states inclusively in 123.4 fb^-1 of e+ e- annihilation data produced at the PEP-II asymmetric storage rings; we find no evidence for their production in any physics process, and set limits on their production rates that are well below the measured rates for conventional baryons. We also search for Theta(1540)+ produced in interactions of electrons or hadrons in the material of the inner part of the detector. No evidence for this state is found in a sample with much higher statistics than similar electroproduction experiments that claim a signal.

  7. Fast and accurate methods for phylogenomic analyses

    Directory of Open Access Journals (Sweden)

    Warnow Tandy

    2011-10-01

    Full Text Available Abstract Background Species phylogenies are not estimated directly, but rather through phylogenetic analyses of different gene datasets. However, true gene trees can differ from the true species tree (and hence from one another due to biological processes such as horizontal gene transfer, incomplete lineage sorting, and gene duplication and loss, so that no single gene tree is a reliable estimate of the species tree. Several methods have been developed to estimate species trees from estimated gene trees, differing according to the specific algorithmic technique used and the biological model used to explain differences between species and gene trees. Relatively little is known about the relative performance of these methods. Results We report on a study evaluating several different methods for estimating species trees from sequence datasets, simulating sequence evolution under a complex model including indels (insertions and deletions, substitutions, and incomplete lineage sorting. The most important finding of our study is that some fast and simple methods are nearly as accurate as the most accurate methods, which employ sophisticated statistical methods and are computationally quite intensive. We also observe that methods that explicitly consider errors in the estimated gene trees produce more accurate trees than methods that assume the estimated gene trees are correct. Conclusions Our study shows that highly accurate estimations of species trees are achievable, even when gene trees differ from each other and from the species tree, and that these estimations can be obtained using fairly simple and computationally tractable methods.

  8. Accurate overlaying for mobile augmented reality

    NARCIS (Netherlands)

    Pasman, W; van der Schaaf, A; Lagendijk, RL; Jansen, F.W.

    1999-01-01

    Mobile augmented reality requires accurate alignment of virtual information with objects visible in the real world. We describe a system for mobile communications to be developed to meet these strict alignment criteria using a combination of computer vision. inertial tracking and low-latency

  9. Accurate automatic profile monitoring. Genaue automatische Profilkontrolle

    Energy Technology Data Exchange (ETDEWEB)

    Sacher, F. (Amberg Messtechnik AG (Germany))

    1994-06-09

    It is almost inconceivable that the present tunnelling methods will not employ modern surveying and monitoring technologies. Accurate, automatic profile monitoring is an aid to optimization of construction work in technical, financial and scheduling respects. These aspects are explained in more detail on the basis of a description of use, various practical examples and a cost analysis. (orig.)

  10. Partial Recurrent Laryngeal Nerve Paralysis or Paresis? In Search for the Accurate Diagnosis

    Directory of Open Access Journals (Sweden)

    Alexander Delides

    2015-01-01

    Full Text Available “Partial paralysis” of the larynx is a term often used to describe a hypomobile vocal fold as is the term “paresis.” We present a case of a dysphonic patient with a mobility disorder of the vocal fold, for whom idiopathic “partial paralysis” was the diagnosis made after laryngeal electromyography, and discuss a proposition for a different implementation of the term.

  11. Accurate scanning of the BssHII endonuclease in search for its DNA cleavage site

    NARCIS (Netherlands)

    Berkhout, B.; van Wamel, J.

    1996-01-01

    A facilitated diffusion mechanism has been proposed to account for the kinetic efficiency with which restriction endonucleases are able to locate DNA recognition sites. Such a mechanism involves the initial formation of a nonspecific complex upon collision of the protein with the DNA, with the

  12. Alaska, Gulf spills share similarities

    Energy Technology Data Exchange (ETDEWEB)

    Usher, D. (Marine Pollution Control Co., Detroit, MI (United States))

    The accidental Exxon Valdez oil spill in Alaska and the deliberate dumping of crude oil into the Persian Gulf as a tactic of war contain both glaring differences and surprising similarities. Public reaction and public response was much greater to the Exxon Valdez spill in pristine Prince William Sound than to the war-related tragedy in the Persian Gulf. More than 12,000 workers helped in the Alaskan cleanup; only 350 have been involved in Kuwait. But in both instances, environmental damages appear to be less than anticipated. Natures highly effective self-cleansing action is primarily responsible for minimizing the damages. One positive action growing out of the two incidents is increased international cooperation and participation in oil-spill clean-up efforts. In 1990, in the aftermath of the Exxon Valdez spill, 94 nations signed an international accord on cooperation in future spills. The spills can be historic environmental landmarks leading to creation of more sophisticated response systems worldwide.

  13. Kalign – an accurate and fast multiple sequence alignment algorithm

    Directory of Open Access Journals (Sweden)

    Sonnhammer Erik LL

    2005-12-01

    Full Text Available Abstract Background The alignment of multiple protein sequences is a fundamental step in the analysis of biological data. It has traditionally been applied to analyzing protein families for conserved motifs, phylogeny, structural properties, and to improve sensitivity in homology searching. The availability of complete genome sequences has increased the demands on multiple sequence alignment (MSA programs. Current MSA methods suffer from being either too inaccurate or too computationally expensive to be applied effectively in large-scale comparative genomics. Results We developed Kalign, a method employing the Wu-Manber string-matching algorithm, to improve both the accuracy and speed of multiple sequence alignment. We compared the speed and accuracy of Kalign to other popular methods using Balibase, Prefab, and a new large test set. Kalign was as accurate as the best other methods on small alignments, but significantly more accurate when aligning large and distantly related sets of sequences. In our comparisons, Kalign was about 10 times faster than ClustalW and, depending on the alignment size, up to 50 times faster than popular iterative methods. Conclusion Kalign is a fast and robust alignment method. It is especially well suited for the increasingly important task of aligning large numbers of sequences.

  14. To Search or Not to Search!

    Science.gov (United States)

    Taylor, J. Rodney

    Constitutional guarantees, as provided by the Bill of Rights, are enjoyed by all citizens. This principle applies no less to students with respect to their college or university domicile. Case law on this subject suggests that three questions must be answered to determine the reasonableness of residence searching: (1) by whom the search is…

  15. Accurate source location from waves scattered by surface topography

    Science.gov (United States)

    Wang, Nian; Shen, Yang; Flinders, Ashton; Zhang, Wei

    2016-06-01

    Accurate source locations of earthquakes and other seismic events are fundamental in seismology. The location accuracy is limited by several factors, including velocity models, which are often poorly known. In contrast, surface topography, the largest velocity contrast in the Earth, is often precisely mapped at the seismic wavelength (>100 m). In this study, we explore the use of P coda waves generated by scattering at surface topography to obtain high-resolution locations of near-surface seismic events. The Pacific Northwest region is chosen as an example to provide realistic topography. A grid search algorithm is combined with the 3-D strain Green's tensor database to improve search efficiency as well as the quality of hypocenter solutions. The strain Green's tensor is calculated using a 3-D collocated-grid finite difference method on curvilinear grids. Solutions in the search volume are obtained based on the least squares misfit between the "observed" and predicted P and P coda waves. The 95% confidence interval of the solution is provided as an a posteriori error estimation. For shallow events tested in the study, scattering is mainly due to topography in comparison with stochastic lateral velocity heterogeneity. The incorporation of P coda significantly improves solution accuracy and reduces solution uncertainty. The solution remains robust with wide ranges of random noises in data, unmodeled random velocity heterogeneities, and uncertainties in moment tensors. The method can be extended to locate pairs of sources in close proximity by differential waveforms using source-receiver reciprocity, further reducing errors caused by unmodeled velocity structures.

  16. Accurate source location from P waves scattered by surface topography

    Science.gov (United States)

    Wang, N.; Shen, Y.

    2015-12-01

    Accurate source locations of earthquakes and other seismic events are fundamental in seismology. The location accuracy is limited by several factors, including velocity models, which are often poorly known. In contrast, surface topography, the largest velocity contrast in the Earth, is often precisely mapped at the seismic wavelength (> 100 m). In this study, we explore the use of P-coda waves generated by scattering at surface topography to obtain high-resolution locations of near-surface seismic events. The Pacific Northwest region is chosen as an example. The grid search method is combined with the 3D strain Green's tensor database type method to improve the search efficiency as well as the quality of hypocenter solution. The strain Green's tensor is calculated by the 3D collocated-grid finite difference method on curvilinear grids. Solutions in the search volume are then obtained based on the least-square misfit between the 'observed' and predicted P and P-coda waves. A 95% confidence interval of the solution is also provided as a posterior error estimation. We find that the scattered waves are mainly due to topography in comparison with random velocity heterogeneity characterized by the von Kάrmάn-type power spectral density function. When only P wave data is used, the 'best' solution is offset from the real source location mostly in the vertical direction. The incorporation of P coda significantly improves solution accuracy and reduces its uncertainty. The solution remains robust with a range of random noises in data, un-modeled random velocity heterogeneities, and uncertainties in moment tensors that we tested.

  17. Feedback about More Accurate versus Less Accurate Trials: Differential Effects on Self-Confidence and Activation

    Science.gov (United States)

    Badami, Rokhsareh; VaezMousavi, Mohammad; Wulf, Gabriele; Namazizadeh, Mahdi

    2012-01-01

    One purpose of the present study was to examine whether self-confidence or anxiety would be differentially affected by feedback from more accurate rather than less accurate trials. The second purpose was to determine whether arousal variations (activation) would predict performance. On Day 1, participants performed a golf putting task under one of…

  18. Accurate estimation of indoor travel times

    DEFF Research Database (Denmark)

    Prentow, Thor Siiger; Blunck, Henrik; Stisen, Allan

    2014-01-01

    The ability to accurately estimate indoor travel times is crucial for enabling improvements within application areas such as indoor navigation, logistics for mobile workers, and facility management. In this paper, we study the challenges inherent in indoor travel time estimation, and we propose...... the InTraTime method for accurately estimating indoor travel times via mining of historical and real-time indoor position traces. The method learns during operation both travel routes, travel times and their respective likelihood---both for routes traveled as well as for sub-routes thereof. InTraTime...... allows to specify temporal and other query parameters, such as time-of-day, day-of-week or the identity of the traveling individual. As input the method is designed to take generic position traces and is thus interoperable with a variety of indoor positioning systems. The method's advantages include...

  19. Stereotypes of Age Differences in Personality Traits: Universal and Accurate?

    Science.gov (United States)

    Chan, Wayne; McCrae, Robert R.; De Fruyt, Filip; Jussim, Lee; Löckenhoff, Corinna E.; De Bolle, Marleen; Costa, Paul T.; Sutin, Angelina R.; Realo, Anu; Allik, Jüri; Nakazato, Katsuharu; Shimonaka, Yoshiko; Hřebíčková, Martina; Kourilova, Sylvie; Yik, Michelle; Ficková, Emília; Brunner-Sciarra, Marina; de Figueora, Nora Leibovich; Schmidt, Vanina; Ahn, Chang-kyu; Ahn, Hyun-nie; Aguilar-Vafaie, Maria E.; Siuta, Jerzy; Szmigielska, Barbara; Cain, Thomas R.; Crawford, Jarret T.; Mastor, Khairul Anwar; Rolland, Jean-Pierre; Nansubuga, Florence; Miramontez, Daniel R.; Benet-Martínez, Veronica; Rossier, Jérôme; Bratko, Denis; Halberstadt, Jamin; Yamaguchi, Mami; Knežević, Goran; Martin, Thomas A.; Gheorghiu, Mirona; Smith, Peter B.; Barbaranelli, Claduio; Wang, Lei; Shakespeare-Finch, Jane; Lima, Margarida P.; Klinkosz, Waldemar; Sekowski, Andrzej; Alcalay, Lidia; Simonetti, Franco; Avdeyeva, Tatyana V.; Pramila, V. S.; Terracciano, Antonio

    2012-01-01

    Age trajectories for personality traits are known to be similar across cultures. To address whether stereotypes of age groups reflect these age-related changes in personality, we asked participants in 26 countries (N = 3,323) to rate typical adolescents, adults, and old persons in their own country. Raters across nations tended to share similar beliefs about different age groups; adolescents were seen as impulsive, rebellious, undisciplined, preferring excitement and novelty, whereas old people were consistently considered lower on impulsivity, activity, antagonism, and Openness. These consensual age group stereotypes correlated strongly with published age differences on the five major dimensions of personality and most of 30 specific traits, using as criteria of accuracy both self-reports and observer ratings, different survey methodologies, and data from up to 50 nations. However, personal stereotypes were considerably less accurate, and consensual stereotypes tended to exaggerate differences across age groups. PMID:23088227

  20. Highly Accurate Prediction of Jobs Runtime Classes

    OpenAIRE

    Anat Reiner-Benaim; Anna Grabarnick; Edi Shmueli

    2016-01-01

    Separating the short jobs from the long is a known technique to improve scheduling performance. In this paper we describe a method we developed for accurately predicting the runtimes classes of the jobs to enable this separation. Our method uses the fact that the runtimes can be represented as a mixture of overlapping Gaussian distributions, in order to train a CART classifier to provide the prediction. The threshold that separates the short jobs from the long jobs is determined during the ev...

  1. Accurate phase-shift velocimetry in rock.

    Science.gov (United States)

    Shukla, Matsyendra Nath; Vallatos, Antoine; Phoenix, Vernon R; Holmes, William M

    2016-06-01

    Spatially resolved Pulsed Field Gradient (PFG) velocimetry techniques can provide precious information concerning flow through opaque systems, including rocks. This velocimetry data is used to enhance flow models in a wide range of systems, from oil behaviour in reservoir rocks to contaminant transport in aquifers. Phase-shift velocimetry is the fastest way to produce velocity maps but critical issues have been reported when studying flow through rocks and porous media, leading to inaccurate results. Combining PFG measurements for flow through Bentheimer sandstone with simulations, we demonstrate that asymmetries in the molecular displacement distributions within each voxel are the main source of phase-shift velocimetry errors. We show that when flow-related average molecular displacements are negligible compared to self-diffusion ones, symmetric displacement distributions can be obtained while phase measurement noise is minimised. We elaborate a complete method for the production of accurate phase-shift velocimetry maps in rocks and low porosity media and demonstrate its validity for a range of flow rates. This development of accurate phase-shift velocimetry now enables more rapid and accurate velocity analysis, potentially helping to inform both industrial applications and theoretical models. Copyright © 2016. Published by Elsevier Inc.

  2. Accurate Calculation of Electric Fields Inside Enzymes.

    Science.gov (United States)

    Wang, X; He, X; Zhang, J Z H

    2016-01-01

    The specific electric field generated by a protease at its active site is considered as an important source of the catalytic power. Accurate calculation of electric field at the active site of an enzyme has both fundamental and practical importance. Measuring site-specific changes of electric field at internal sites of proteins due to, eg, mutation, has been realized by using molecular probes with CO or CN groups in the context of vibrational Stark effect. However, theoretical prediction of change in electric field inside a protein based on a conventional force field, such as AMBER or OPLS, is often inadequate. For such calculation, quantum chemical approach or quantum-based polarizable or polarized force field is highly preferable. Compared with the result from conventional force field, significant improvement is found in predicting experimentally measured mutation-induced electric field change using quantum-based methods, indicating that quantum effect such as polarization plays an important role in accurate description of electric field inside proteins. In comparison, the best theoretical prediction comes from fully quantum mechanical calculation in which both polarization and inter-residue charge transfer effects are included for accurate prediction of electrostatics in proteins. © 2016 Elsevier Inc. All rights reserved.

  3. Dynamic Search and Working Memory in Social Recall

    Science.gov (United States)

    Hills, Thomas T.; Pachur, Thorsten

    2012-01-01

    What are the mechanisms underlying search in social memory (e.g., remembering the people one knows)? Do the search mechanisms involve dynamic local-to-global transitions similar to semantic search, and are these transitions governed by the general control of attention, associated with working memory span? To find out, we asked participants to…

  4. Large Neighborhood Search

    DEFF Research Database (Denmark)

    Pisinger, David; Røpke, Stefan

    2010-01-01

    Heuristics based on large neighborhood search have recently shown outstanding results in solving various transportation and scheduling problems. Large neighborhood search methods explore a complex neighborhood by use of heuristics. Using large neighborhoods makes it possible to find better candid...

  5. New particle searches

    CERN Document Server

    Ruhlmann-Kleider, V

    2000-01-01

    This review presents recent results on new particle searches achieved at Tevatron, Hera and LEP. After a brief outline of the searches on exotic particles, results on supersymmetric particles and Higgs bosons are detailed. Near future prospects are also given.

  6. Cerebral fat embolism: Use of MR spectroscopy for accurate diagnosis

    Directory of Open Access Journals (Sweden)

    Laxmi Kokatnur

    2015-01-01

    Full Text Available Cerebral fat embolism (CFE is an uncommon but serious complication following orthopedic procedures. It usually presents with altered mental status, and can be a part of fat embolism syndrome (FES if associated with cutaneous and respiratory manifestations. Because of the presence of other common factors affecting the mental status, particularly in the postoperative period, the diagnosis of CFE can be challenging. Magnetic resonance imaging (MRI of brain typically shows multiple lesions distributed predominantly in the subcortical region, which appear as hyperintense lesions on T2 and diffusion weighted images. Although the location offers a clue, the MRI findings are not specific for CFE. Watershed infarcts, hypoxic encephalopathy, disseminated infections, demyelinating disorders, diffuse axonal injury can also show similar changes on MRI of brain. The presence of fat in these hyperintense lesions, identified by MR spectroscopy as raised lipid peaks will help in accurate diagnosis of CFE. Normal brain tissue or conditions producing similar MRI changes will not show any lipid peak on MR spectroscopy. We present a case of CFE initially misdiagnosed as brain stem stroke based on clinical presentation and cranial computed tomography (CT scan, and later, MR spectroscopy elucidated the accurate diagnosis.

  7. Integrated vs. Federated Search

    DEFF Research Database (Denmark)

    Løvschall, Kasper

    2009-01-01

    Oplæg om forskelle og ligheder mellem integrated og federated search i bibliotekskontekst. Holdt ved temadag om "Integrated Search - samsøgning i alle kilder" på Danmarks Biblioteksskole den 22. januar 2009.......Oplæg om forskelle og ligheder mellem integrated og federated search i bibliotekskontekst. Holdt ved temadag om "Integrated Search - samsøgning i alle kilder" på Danmarks Biblioteksskole den 22. januar 2009....

  8. Searching with Kids.

    Science.gov (United States)

    Valenza, Joyce Kasman

    2000-01-01

    Discusses student search tools for the World Wide Web and describes and compares seven that are appropriate for pre-kindergarten through secondary school. Highlights include access; age appropriateness; directories versus full search engines; filters; limits of some tools; and adult-oriented search tools that can be limited for students' access.…

  9. Measurement and Improvement of Subject Specialists Performance Searching Chemical Abstracts Online as Available on SDC Search Systems.

    Science.gov (United States)

    Copeland, Richard F.; And Others

    The first phase of a project to design a prompting system to help semi-experienced end users to search Chemical Abstracts online, this study focused on the differences and similarities in the search approaches used by experienced users and those with less expertise. Four online searches on topics solicited from chemistry professors in small…

  10. FACT: Functional annotation transfer between proteins with similar feature architectures

    Directory of Open Access Journals (Sweden)

    Koestler Tina

    2010-08-01

    Full Text Available Abstract Background The increasing number of sequenced genomes provides the basis for exploring the genetic and functional diversity within the tree of life. Only a tiny fraction of the encoded proteins undergoes a thorough experimental characterization. For the remainder, bioinformatics annotation tools are the only means to infer their function. Exploiting significant sequence similarities to already characterized proteins, commonly taken as evidence for homology, is the prevalent method to deduce functional equivalence. Such methods fail when homologs are too diverged, or when they have assumed a different function. Finally, due to convergent evolution, functional equivalence is not necessarily linked to common ancestry. Therefore complementary approaches are required to identify functional equivalents. Results We present the Feature Architecture Comparison Tool http://www.cibiv.at/FACT to search for functionally equivalent proteins. FACT uses the similarity between feature architectures of two proteins, i.e., the arrangements of functional domains, secondary structure elements and compositional properties, as a proxy for their functional equivalence. A scoring function measures feature architecture similarities, which enables searching for functional equivalents in entire proteomes. Our evaluation of 9,570 EC classified enzymes revealed that FACT, using the full feature, set outperformed the existing architecture-based approaches by identifying significantly more functional equivalents as highest scoring proteins. We show that FACT can identify functional equivalents that share no significant sequence similarity. However, when the highest scoring protein of FACT is also the protein with the highest local sequence similarity, it is in 99% of the cases functionally equivalent to the query. We demonstrate the versatility of FACT by identifying a missing link in the yeast glutathione metabolism and also by searching for the human GolgA5

  11. Predicting consumer behavior with Web search.

    Science.gov (United States)

    Goel, Sharad; Hofman, Jake M; Lahaie, Sébastien; Pennock, David M; Watts, Duncan J

    2010-10-12

    Recent work has demonstrated that Web search volume can "predict the present," meaning that it can be used to accurately track outcomes such as unemployment levels, auto and home sales, and disease prevalence in near real time. Here we show that what consumers are searching for online can also predict their collective future behavior days or even weeks in advance. Specifically we use search query volume to forecast the opening weekend box-office revenue for feature films, first-month sales of video games, and the rank of songs on the Billboard Hot 100 chart, finding in all cases that search counts are highly predictive of future outcomes. We also find that search counts generally boost the performance of baseline models fit on other publicly available data, where the boost varies from modest to dramatic, depending on the application in question. Finally, we reexamine previous work on tracking flu trends and show that, perhaps surprisingly, the utility of search data relative to a simple autoregressive model is modest. We conclude that in the absence of other data sources, or where small improvements in predictive performance are material, search queries provide a useful guide to the near future.

  12. Visual search in barn owls: Task difficulty and saccadic behavior.

    Science.gov (United States)

    Orlowski, Julius; Ben-Shahar, Ohad; Wagner, Hermann

    2018-01-01

    How do we find what we are looking for? A target can be in plain view, but it may be detected only after extensive search. During a search we make directed attentional deployments like saccades to segment the scene until we detect the target. Depending on difficulty, the search may be fast with few attentional deployments or slow with many, shorter deployments. Here we study visual search in barn owls by tracking their overt attentional deployments-that is, their head movements-with a camera. We conducted a low-contrast feature search, a high-contrast orientation conjunction search, and a low-contrast orientation conjunction search, each with set sizes varying from 16 to 64 items. The barn owls were able to learn all of these tasks and showed serial search behavior. In a subsequent step, we analyzed how search behavior of owls changes with search complexity. We compared the search mechanisms in these three serial searches with results from pop-out searches our group had reported earlier. Saccade amplitude shortened and fixation duration increased in difficult searches. Also, in conjunction search saccades were guided toward items with shared target features. These data suggest that during visual search, barn owls utilize mechanisms similar to those that humans use.

  13. Scaling, Similarity, and the Fourth Paradigm for Hydrology

    Science.gov (United States)

    Peters-Lidard, Christa D.; Clark, Martyn; Samaniego, Luis; Verhoest, Niko E. C.; van Emmerik, Tim; Uijlenhoet, Remko; Achieng, Kevin; Franz, Trenton E.; Woods, Ross

    2017-01-01

    In this synthesis paper addressing hydrologic scaling and similarity, we posit that roadblocks in the search for universal laws of hydrology are hindered by our focus on computational simulation (the third paradigm), and assert that it is time for hydrology to embrace a fourth paradigm of data-intensive science. Advances in information-based hydrologic science, coupled with an explosion of hydrologic data and advances in parameter estimation and modelling, have laid the foundation for a data-driven framework for scrutinizing hydrological scaling and similarity hypotheses. We summarize important scaling and similarity concepts (hypotheses) that require testing, describe a mutual information framework for testing these hypotheses, describe boundary condition, state flux, and parameter data requirements across scales to support testing these hypotheses, and discuss some challenges to overcome while pursuing the fourth hydrological paradigm. We call upon the hydrologic sciences community to develop a focused effort towards adopting the fourth paradigm and apply this to outstanding challenges in scaling and similarity.

  14. Scaling, similarity, and the fourth paradigm for hydrology

    Science.gov (United States)

    Peters-Lidard, Christa D.; Clark, Martyn; Samaniego, Luis; Verhoest, Niko E. C.; van Emmerik, Tim; Uijlenhoet, Remko; Achieng, Kevin; Franz, Trenton E.; Woods, Ross

    2017-07-01

    In this synthesis paper addressing hydrologic scaling and similarity, we posit that roadblocks in the search for universal laws of hydrology are hindered by our focus on computational simulation (the third paradigm) and assert that it is time for hydrology to embrace a fourth paradigm of data-intensive science. Advances in information-based hydrologic science, coupled with an explosion of hydrologic data and advances in parameter estimation and modeling, have laid the foundation for a data-driven framework for scrutinizing hydrological scaling and similarity hypotheses. We summarize important scaling and similarity concepts (hypotheses) that require testing; describe a mutual information framework for testing these hypotheses; describe boundary condition, state, flux, and parameter data requirements across scales to support testing these hypotheses; and discuss some challenges to overcome while pursuing the fourth hydrological paradigm. We call upon the hydrologic sciences community to develop a focused effort towards adopting the fourth paradigm and apply this to outstanding challenges in scaling and similarity.

  15. Log analysis to understand medical professionals' image searching behaviour.

    Science.gov (United States)

    Tsikrika, Theodora; Müller, Henning; Kahn, Charles E

    2012-01-01

    This paper reports on the analysis of the query logs of a visual medical information retrieval system that provides access to radiology resources. Our analysis shows that, despite sharing similarities with general Web search and also with biomedical text search, query formulation and query modification when searching for visual biomedical information have unique characteristics that need to be taken into account in order to enhance the effectiveness of the search support offered by such systems. Typical information needs of medical professionals searching radiology resources are also identified with the goal to create realistic search tasks for a medical image retrieval evaluation benchmark.

  16. Accurate measurement of unsteady state fluid temperature

    Science.gov (United States)

    Jaremkiewicz, Magdalena

    2017-03-01

    In this paper, two accurate methods for determining the transient fluid temperature were presented. Measurements were conducted for boiling water since its temperature is known. At the beginning the thermometers are at the ambient temperature and next they are immediately immersed into saturated water. The measurements were carried out with two thermometers of different construction but with the same housing outer diameter equal to 15 mm. One of them is a K-type industrial thermometer widely available commercially. The temperature indicated by the thermometer was corrected considering the thermometers as the first or second order inertia devices. The new design of a thermometer was proposed and also used to measure the temperature of boiling water. Its characteristic feature is a cylinder-shaped housing with the sheath thermocouple located in its center. The temperature of the fluid was determined based on measurements taken in the axis of the solid cylindrical element (housing) using the inverse space marching method. Measurements of the transient temperature of the air flowing through the wind tunnel using the same thermometers were also carried out. The proposed measurement technique provides more accurate results compared with measurements using industrial thermometers in conjunction with simple temperature correction using the inertial thermometer model of the first or second order. By comparing the results, it was demonstrated that the new thermometer allows obtaining the fluid temperature much faster and with higher accuracy in comparison to the industrial thermometer. Accurate measurements of the fast changing fluid temperature are possible due to the low inertia thermometer and fast space marching method applied for solving the inverse heat conduction problem.

  17. Adaptive Large Neighbourhood Search

    DEFF Research Database (Denmark)

    Røpke, Stefan

    Large neighborhood search is a metaheuristic that has gained popularity in recent years. The heuristic repeatedly moves from solution to solution by first partially destroying the solution and then repairing it. The best solution observed during this search is presented as the final solution....... This tutorial introduces the large neighborhood search metaheuristic and the variant adaptive large neighborhood search that dynamically tunes parameters of the heuristic while it is running. Both heuristics belong to a broader class of heuristics that are searching a solution space using very large...

  18. Keyword Search in Databases

    CERN Document Server

    Yu, Jeffrey Xu; Chang, Lijun

    2009-01-01

    It has become highly desirable to provide users with flexible ways to query/search information over databases as simple as keyword search like Google search. This book surveys the recent developments on keyword search over databases, and focuses on finding structural information among objects in a database using a set of keywords. Such structural information to be returned can be either trees or subgraphs representing how the objects, that contain the required keywords, are interconnected in a relational database or in an XML database. The structural keyword search is completely different from

  19. Semi-supervised hashing for large-scale search.

    Science.gov (United States)

    Wang, Jun; Kumar, Sanjiv; Chang, Shih-Fu

    2012-12-01

    Hashing-based approximate nearest neighbor (ANN) search in huge databases has become popular due to its computational and memory efficiency. The popular hashing methods, e.g., Locality Sensitive Hashing and Spectral Hashing, construct hash functions based on random or principal projections. The resulting hashes are either not very accurate or are inefficient. Moreover, these methods are designed for a given metric similarity. On the contrary, semantic similarity is usually given in terms of pairwise labels of samples. There exist supervised hashing methods that can handle such semantic similarity, but they are prone to overfitting when labeled data are small or noisy. In this work, we propose a semi-supervised hashing (SSH) framework that minimizes empirical error over the labeled set and an information theoretic regularizer over both labeled and unlabeled sets. Based on this framework, we present three different semi-supervised hashing methods, including orthogonal hashing, nonorthogonal hashing, and sequential hashing. Particularly, the sequential hashing method generates robust codes in which each hash function is designed to correct the errors made by the previous ones. We further show that the sequential learning paradigm can be extended to unsupervised domains where no labeled pairs are available. Extensive experiments on four large datasets (up to 80 million samples) demonstrate the superior performance of the proposed SSH methods over state-of-the-art supervised and unsupervised hashing techniques.

  20. Climate Models have Accurately Predicted Global Warming

    Science.gov (United States)

    Nuccitelli, D. A.

    2016-12-01

    Climate model projections of global temperature changes over the past five decades have proven remarkably accurate, and yet the myth that climate models are inaccurate or unreliable has formed the basis of many arguments denying anthropogenic global warming and the risks it poses to the climate system. Here we compare average global temperature predictions made by both mainstream climate scientists using climate models, and by contrarians using less physically-based methods. We also explore the basis of the myth by examining specific arguments against climate model accuracy and their common characteristics of science denial.

  1. A toolbox for representational similarity analysis.

    Directory of Open Access Journals (Sweden)

    Hamed Nili

    2014-04-01

    Full Text Available Neuronal population codes are increasingly being investigated with multivariate pattern-information analyses. A key challenge is to use measured brain-activity patterns to test computational models of brain information processing. One approach to this problem is representational similarity analysis (RSA, which characterizes a representation in a brain or computational model by the distance matrix of the response patterns elicited by a set of stimuli. The representational distance matrix encapsulates what distinctions between stimuli are emphasized and what distinctions are de-emphasized in the representation. A model is tested by comparing the representational distance matrix it predicts to that of a measured brain region. RSA also enables us to compare representations between stages of processing within a given brain or model, between brain and behavioral data, and between individuals and species. Here, we introduce a Matlab toolbox for RSA. The toolbox supports an analysis approach that is simultaneously data- and hypothesis-driven. It is designed to help integrate a wide range of computational models into the analysis of multichannel brain-activity measurements as provided by modern functional imaging and neuronal recording techniques. Tools for visualization and inference enable the user to relate sets of models to sets of brain regions and to statistically test and compare the models using nonparametric inference methods. The toolbox supports searchlight-based RSA, to continuously map a measured brain volume in search of a neuronal population code with a specific geometry. Finally, we introduce the linear-discriminant t value as a measure of representational discriminability that bridges the gap between linear decoding analyses and RSA. In order to demonstrate the capabilities of the toolbox, we apply it to both simulated and real fMRI data. The key functions are equally applicable to other modalities of brain-activity measurement. The

  2. Quantum Search Algorithms

    Science.gov (United States)

    Korepin, Vladimir E.; Xu, Ying

    This article reviews recent progress in quantum database search algorithms. The subject is presented in a self-contained and pedagogical way. The problem of searching a large database (a Hilbert space) for a target item is performed by the famous Grover algorithm which locates the target item with high probability and a quadratic speed-up compared with the corresponding classical algorithm. If the database is partitioned into blocks and one is searching for the block containing the target item instead of the target item itself, then the problem is referred to as partial search. Partial search trades accuracy for speed and the most efficient version is the Grover-Radhakrishnan-Korepin (GRK) algorithm. The target block can be further partitioned into sub-blocks so that GRK's can be performed in a sequence called a hierarchy. We study the Grover search and GRK partial search in detail and prove that a GRK hierarchy is less efficient than a direct GRK partial search. Both the Grover search and the GRK partial search can be generalized to the case with several target items (or target blocks for a GRK). The GRK partial search algorithm can also be represented in terms of group theory.

  3. The Influence of Contour on Similarity Perception of Star Glyphs.

    Science.gov (United States)

    Fuchs, Johannes; Isenberg, Petra; Bezerianos, Anastasia; Fischer, Fabian; Bertini, Enrico

    2014-12-01

    We conducted three experiments to investigate the effects of contours on the detection of data similarity with star glyph variations. A star glyph is a small, compact, data graphic that represents a multi-dimensional data point. Star glyphs are often used in small-multiple settings, to represent data points in tables, on maps, or as overlays on other types of data graphics. In these settings, an important task is the visual comparison of the data points encoded in the star glyph, for example to find other similar data points or outliers. We hypothesized that for data comparisons, the overall shape of a star glyph--enhanced through contour lines--would aid the viewer in making accurate similarity judgments. To test this hypothesis, we conducted three experiments. In our first experiment, we explored how the use of contours influenced how visualization experts and trained novices chose glyphs with similar data values. Our results showed that glyphs without contours make the detection of data similarity easier. Given these results, we conducted a second study to understand intuitive notions of similarity. Star glyphs without contours most intuitively supported the detection of data similarity. In a third experiment, we tested the effect of star glyph reference structures (i.e., tickmarks and gridlines) on the detection of similarity. Surprisingly, our results show that adding reference structures does improve the correctness of similarity judgments for star glyphs with contours, but not for the standard star glyph. As a result of these experiments, we conclude that the simple star glyph without contours performs best under several criteria, reinforcing its practice and popularity in the literature. Contours seem to enhance the detection of other types of similarity, e. g., shape similarity and are distracting when data similarity has to be judged. Based on these findings we provide design considerations regarding the use of contours and reference structures on star

  4. Durham Zoo: Powering a Search-&-Innovation Engine with Collective Intelligence

    Directory of Open Access Journals (Sweden)

    Richard Absalom

    2015-02-01

    graphical representations has been used.Practical implications – Concept searching is seen as having two categories: prior art searching, which is searching for what already exists, and solution searching: a search for a novel solution to an existing problem.Prior art searching is not as efficient a process, as all encompassing in scope, or as accurate in result, as it could and probably should be. The prior art includes library collections, journals, conference proceedings and everything else that has been written, drawn, spoken or made public in any way. Much technical information is only published in patents. There is a good reason to improve prior art searching: research, industry, and indeed humanity faces the spectre of patent thickets: an impenetrable legal space that effectively hinders innovation rather than promotes it. Improved prior-art searching would help with the gardening and result in fewer and higher-quality patents. Poor-quality patents can reward patenting activity per se, which is not what the system was designed for. Improved prior-art searching could also result in less duplication in research, and/or lead to improved collaboration.As regards solution search, the authors of the paper believe that much better use could be made of the existing literature to find solutions from non-obvious areas of science and technology. The so-called cross industry innovation could be joined by biomimetics, the inspiration of solutions from nature.Crowdsourcing the concept shorthand could produce a system ‘by the people, for the people’, to quote Abraham Lincoln out of context. A Citizen Science and Technology initiative that developed a working search engine could generate revenue for academia. Any monies accruing could be invested in research for the common good, such as the development of climate change mitigation technologies, or the discovery of new antibiotics.Originality – The authors know of no similar systems in development

  5. Keep Searching and You’ll Find

    DEFF Research Database (Denmark)

    Laursen, Keld

    2012-01-01

    This article critically reviews and synthesizes the contributions found in theoretical and empirical studies of firm-level innovation search processes. It explores the advantages and disadvantages of local and non-local search, discusses organizational responses, and identifies potential exogenous...... different search strategies, but end up with very similar technological profiles in fast-growing technologies. The article concludes by highlighting what we have learnt from the literature and suggesting some new avenues for research....... triggers for different kinds of search. It argues that the initial focus on local search was a consequence, in part, of the attention in evolutionary economics to path-dependent behavior, but that as localized behavior was increasingly accepted as the standard mode, studies began to question whether local...

  6. Searching and Indexing Genomic Databases via Kernelization

    Directory of Open Access Journals (Sweden)

    Travis eGagie

    2015-02-01

    Full Text Available The rapid advance of DNA sequencing technologies has yielded databases of thousands of genomes. To search and index these databases effectively, it is important that we take advantage of the similarity between those genomes. Several authors have recently suggested searching or indexing only one reference genome and the parts of the other genomes where they differ. In this paper we survey the twenty-year history of this idea and discuss its relation to kernelization in parameterized complexity.

  7. Interactions of visual odometry and landmark guidance during food search in honeybees

    NARCIS (Netherlands)

    Vladusich, T; Hemmi, JM; Srinivasan, MV; Zeil, J

    2005-01-01

    How do honeybees use visual odometry and goal-defining landmarks to guide food search? In one experiment, bees were trained to forage in an optic-flow-rich tunnel with a landmark positioned directly above the feeder. Subsequent food-search tests indicated that bees searched much more accurately when

  8. Book Review: Web Search-Public Searching of the Web

    OpenAIRE

    Yazdan Mansourian

    2004-01-01

    The book consists of four sections including (1) the context of web search, (2) how people search the web, (3) subjects of web search and (4) conclusion: trends and future directions. The first section includes three chapters addressing a brief but informative introduction about the main involved elements of web search process and web search research including search engines mechanism, human computer interaction in web searching and research design in web search studies.

  9. Automatic Planning of External Search Engine Optimization

    Directory of Open Access Journals (Sweden)

    Vita Jasevičiūtė

    2015-07-01

    Full Text Available This paper describes an investigation of the external search engine optimization (SEO action planning tool, dedicated to automatically extract a small set of most important keywords for each month during whole year period. The keywords in the set are extracted accordingly to external measured parameters, such as average number of searches during the year and for every month individually. Additionally the position of the optimized web site for each keyword is taken into account. The generated optimization plan is similar to the optimization plans prepared manually by the SEO professionals and can be successfully used as a support tool for web site search engine optimization.

  10. [Advanced online search techniques and dedicated search engines for physicians].

    Science.gov (United States)

    Nahum, Yoav

    2008-02-01

    In recent years search engines have become an essential tool in the work of physicians. This article will review advanced search techniques from the world of information specialists, as well as some advanced search engine operators that may help physicians improve their online search capabilities, and maximize the yield of their searches. This article also reviews popular dedicated scientific and biomedical literature search engines.

  11. University Students' Online Information Searching Strategies in Different Search Contexts

    Science.gov (United States)

    Tsai, Meng-Jung; Liang, Jyh-Chong; Hou, Huei-Tse; Tsai, Chin-Chung

    2012-01-01

    This study investigates the role of search context played in university students' online information searching strategies. A total of 304 university students in Taiwan were surveyed with questionnaires in which two search contexts were defined as searching for learning, and searching for daily life information. Students' online search strategies…

  12. Toward Accurate and Quantitative Comparative Metagenomics.

    Science.gov (United States)

    Nayfach, Stephen; Pollard, Katherine S

    2016-08-25

    Shotgun metagenomics and computational analysis are used to compare the taxonomic and functional profiles of microbial communities. Leveraging this approach to understand roles of microbes in human biology and other environments requires quantitative data summaries whose values are comparable across samples and studies. Comparability is currently hampered by the use of abundance statistics that do not estimate a meaningful parameter of the microbial community and biases introduced by experimental protocols and data-cleaning approaches. Addressing these challenges, along with improving study design, data access, metadata standardization, and analysis tools, will enable accurate comparative metagenomics. We envision a future in which microbiome studies are replicable and new metagenomes are easily and rapidly integrated with existing data. Only then can the potential of metagenomics for predictive ecological modeling, well-powered association studies, and effective microbiome medicine be fully realized. Copyright © 2016 Elsevier Inc. All rights reserved.

  13. Accurate renormalization group analyses in neutrino sector

    Energy Technology Data Exchange (ETDEWEB)

    Haba, Naoyuki [Graduate School of Science and Engineering, Shimane University, Matsue 690-8504 (Japan); Kaneta, Kunio [Kavli IPMU (WPI), The University of Tokyo, Kashiwa, Chiba 277-8568 (Japan); Takahashi, Ryo [Graduate School of Science and Engineering, Shimane University, Matsue 690-8504 (Japan); Yamaguchi, Yuya [Department of Physics, Faculty of Science, Hokkaido University, Sapporo 060-0810 (Japan)

    2014-08-15

    We investigate accurate renormalization group analyses in neutrino sector between ν-oscillation and seesaw energy scales. We consider decoupling effects of top quark and Higgs boson on the renormalization group equations of light neutrino mass matrix. Since the decoupling effects are given in the standard model scale and independent of high energy physics, our method can basically apply to any models beyond the standard model. We find that the decoupling effects of Higgs boson are negligible, while those of top quark are not. Particularly, the decoupling effects of top quark affect neutrino mass eigenvalues, which are important for analyzing predictions such as mass squared differences and neutrinoless double beta decay in an underlying theory existing at high energy scale.

  14. Accurate predictions for the LHC made easy

    CERN Multimedia

    CERN. Geneva

    2014-01-01

    The data recorded by the LHC experiments is of a very high quality. To get the most out of the data, precise theory predictions, including uncertainty estimates, are needed to reduce as much as possible theoretical bias in the experimental analyses. Recently, significant progress has been made in computing Next-to-Leading Order (NLO) computations, including matching to the parton shower, that allow for these accurate, hadron-level predictions. I shall discuss one of these efforts, the MadGraph5_aMC@NLO program, that aims at the complete automation of predictions at the NLO accuracy within the SM as well as New Physics theories. I’ll illustrate some of the theoretical ideas behind this program, show some selected applications to LHC physics, as well as describe the future plans.

  15. Fast and accurate multicomponent transport property evaluation

    Energy Technology Data Exchange (ETDEWEB)

    Ern, A. [CERMICS-ENPC, Le Grand (France)]|[Yale Univ., New Haven, CT (United States); Giovangigli, V. [CMAP-CNRS, Palaiseau (France)

    1995-08-01

    We investigate iterative methods for solving linear systems arising from the kinetic theory and providing transport coefficients of dilute polyatomic gas mixtures. These linear systems are obtained in their naturally constrained, singular, and symmetric form, using the formalism of Waldmann and Truebenbacher. The transport coefficients associated with the systems obtained by Monchick, Yun, and Mason are also recovered, if two misprints are corrected in the work of these authors. Using the recent theory of Ern and Giovangigli, all the transport coefficients are expressed as convergent series. By truncating these series, new, accurate, approximate expressions are obtained for all the transport coefficients. Finally, the computational efficiency of the present transport algorithms in multicomponent flow applications is illustrated with several numerical experiments. 38 refs., 12 tabs.

  16. Apparatus for accurately measuring high temperatures

    Science.gov (United States)

    Smith, D.D.

    The present invention is a thermometer used for measuring furnace temperatures in the range of about 1800/sup 0/ to 2700/sup 0/C. The thermometer comprises a broadband multicolor thermal radiation sensor positioned to be in optical alignment with the end of a blackbody sight tube extending into the furnace. A valve-shutter arrangement is positioned between the radiation sensor and the sight tube and a chamber for containing a charge of high pressure gas is positioned between the valve-shutter arrangement and the radiation sensor. A momentary opening of the valve shutter arrangement allows a pulse of the high gas to purge the sight tube of air-borne thermal radiation contaminants which permits the radiation sensor to accurately measure the thermal radiation emanating from the end of the sight tube.

  17. Genetic algorithms as global random search methods

    Science.gov (United States)

    Peck, Charles C.; Dhawan, Atam P.

    1995-01-01

    Genetic algorithm behavior is described in terms of the construction and evolution of the sampling distributions over the space of candidate solutions. This novel perspective is motivated by analysis indicating that the schema theory is inadequate for completely and properly explaining genetic algorithm behavior. Based on the proposed theory, it is argued that the similarities of candidate solutions should be exploited directly, rather than encoding candidate solutions and then exploiting their similarities. Proportional selection is characterized as a global search operator, and recombination is characterized as the search process that exploits similarities. Sequential algorithms and many deletion methods are also analyzed. It is shown that by properly constraining the search breadth of recombination operators, convergence of genetic algorithms to a global optimum can be ensured.

  18. Noesis: Ontology based Scoped Search Engine and Resource Aggregator for Atmospheric Science

    Science.gov (United States)

    Ramachandran, R.; Movva, S.; Li, X.; Cherukuri, P.; Graves, S.

    2006-12-01

    The goal for search engines is to return results that are both accurate and complete. The search engines should find only what you really want and find everything you really want. Search engines (even meta search engines) lack semantics. The basis for search is simply based on string matching between the user's query term and the resource database and the semantics associated with the search string is not captured. For example, if an atmospheric scientist is searching for "pressure" related web resources, most search engines return inaccurate results such as web resources related to blood pressure. In this presentation Noesis, which is a meta-search engine and a resource aggregator that uses domain ontologies to provide scoped search capabilities will be described. Noesis uses domain ontologies to help the user scope the search query to ensure that the search results are both accurate and complete. The domain ontologies guide the user to refine their search query and thereby reduce the user's burden of experimenting with different search strings. Semantics are captured by refining the query terms to cover synonyms, specializations, generalizations and related concepts. Noesis also serves as a resource aggregator. It categorizes the search results from different online resources such as education materials, publications, datasets, web search engines that might be of interest to the user.

  19. Search on Rugged Landscapes

    DEFF Research Database (Denmark)

    Billinger, Stephan; Stieglitz, Nils; Schumacher, Terry

    2014-01-01

    This paper presents findings from a laboratory experiment on human decision-making in a complex combinatorial task. We find strong evidence for a behavioral model of adaptive search. Success narrows down search to the neighborhood of the status quo, while failure promotes gradually more explorative...... search. Task complexity does not have a direct effect on behavior, but systematically affects the feedback conditions that guide success-induced exploitation and failure-induced exploration. The analysis also shows that human participants were prone to over-exploration, since they broke off the search...... for local improvements too early. We derive stylized decision rules that generate the search behavior observed in the experiment and discuss the implications of our findings for individual decision-making and organizational search....

  20. Supporting complex search tasks

    DEFF Research Database (Denmark)

    Gäde, Maria; Hall, Mark; Huurdeman, Hugo

    2015-01-01

    There is broad consensus in the field of IR that search is complex in many use cases and applications, both on the Web and in domain specific collections, and both professionally and in our daily life. Yet our understanding of complex search tasks, in comparison to simple look up tasks......, is fragmented at best. The workshop addressed the many open research questions: What are the obvious use cases and applications of complex search? What are essential features of work tasks and search tasks to take into account? And how do these evolve over time? With a multitude of information, varying from...... introductory to specialized, and from authoritative to speculative or opinionated, when to show what sources of information? How does the information seeking process evolve and what are relevant differences between different stages? With complex task and search process management, blending searching, browsing...

  1. The Development of a Combined Search for a Heterogeneous Chemistry Database

    Directory of Open Access Journals (Sweden)

    Lulu Jiang

    2015-05-01

    Full Text Available A combined search, which joins a slow molecule structure search with a fast compound property search, results in more accurate search results and has been applied in several chemistry databases. However, the problems of search speed differences and combining the two separate search results are two major challenges. In this paper, two kinds of search strategies, synchronous search and asynchronous search, are proposed to solve these problems in the heterogeneous structure database and the property database found in ChemDB, a chemistry database owned by the Institute of Process Engineering, CAS. Their advantages and disadvantages under different conditions are discussed in detail. Furthermore, we applied these two searches to ChemDB and used them to screen for potential molecules that can work as CO2 absorbents. The results reveal that this combined search discovers reasonable target molecules within an acceptable time frame.

  2. ElasticSearch cookbook

    CERN Document Server

    Paro, Alberto

    2013-01-01

    Written in an engaging, easy-to-follow style, the recipes will help you to extend the capabilities of ElasticSearch to manage your data effectively.If you are a developer who implements ElasticSearch in your web applications, manage data, or have decided to start using ElasticSearch, this book is ideal for you. This book assumes that you've got working knowledge of JSON and Java

  3. Mastering ElasticSearch

    CERN Document Server

    Kuc, Rafal

    2013-01-01

    A practical tutorial that covers the difficult design, implementation, and management of search solutions.Mastering ElasticSearch is aimed at to intermediate users who want to extend their knowledge about ElasticSearch. The topics that are described in the book are detailed, but we assume that you already know the basics, like the query DSL or data indexing. Advanced users will also find this book useful, as the examples are getting deep into the internals where it is needed.

  4. Demystifying the Search Button

    Science.gov (United States)

    McKeever, Liam; Nguyen, Van; Peterson, Sarah J.; Gomez-Perez, Sandra

    2015-01-01

    A thorough review of the literature is the basis of all research and evidence-based practice. A gold-standard efficient and exhaustive search strategy is needed to ensure all relevant citations have been captured and that the search performed is reproducible. The PubMed database comprises both the MEDLINE and non-MEDLINE databases. MEDLINE-based search strategies are robust but capture only 89% of the total available citations in PubMed. The remaining 11% include the most recent and possibly relevant citations but are only searchable through less efficient techniques. An effective search strategy must employ both the MEDLINE and the non-MEDLINE portion of PubMed to ensure all studies have been identified. The robust MEDLINE search strategies are used for the MEDLINE portion of the search. Usage of the less robust strategies is then efficiently confined to search only the remaining 11% of PubMed citations that have not been indexed for MEDLINE. The current article offers step-by-step instructions for building such a search exploring methods for the discovery of medical subject heading (MeSH) terms to search MEDLINE, text-based methods for exploring the non-MEDLINE database, information on the limitations of convenience algorithms such as the “related citations feature,” the strengths and pitfalls associated with commonly used filters, the proper usage of Boolean operators to organize a master search strategy, and instructions for automating that search through “MyNCBI” to receive search query updates by email as new citations become available. PMID:26129895

  5. Interest in Anesthesia as Reflected by Keyword Searches using Common Search Engines.

    Science.gov (United States)

    Liu, Renyu; García, Paul S; Fleisher, Lee A

    2012-01-23

    Since current general interest in anesthesia is unknown, we analyzed internet keyword searches to gauge general interest in anesthesia in comparison with surgery and pain. The trend of keyword searches from 2004 to 2010 related to anesthesia and anaesthesia was investigated using Google Insights for Search. The trend of number of peer reviewed articles on anesthesia cited on PubMed and Medline from 2004 to 2010 was investigated. The average cost on advertising on anesthesia, surgery and pain was estimated using Google AdWords. Searching results in other common search engines were also analyzed. Correlation between year and relative number of searches was determined with psearch engines may provide different total number of searching results (available posts), the ratios of searching results between some common keywords related to perioperative care are comparable, indicating similar trend. The peer reviewed manuscripts on "anesthesia" and the proportion of papers on "anesthesia and outcome" are trending up. Estimates for spending of advertising dollars are less for anesthesia-related terms when compared to that for pain or surgery due to relative smaller number of searching traffic. General interest in anesthesia (anaesthesia) as measured by internet searches appears to be decreasing. Pain, preanesthesia evaluation, anesthesia and outcome and side effects of anesthesia are the critical areas that anesthesiologists should focus on to address the increasing concerns.

  6. Interest in Anesthesia as Reflected by Keyword Searches using Common Search Engines

    Science.gov (United States)

    Liu, Renyu; García, Paul S.; Fleisher, Lee A.

    2012-01-01

    Background Since current general interest in anesthesia is unknown, we analyzed internet keyword searches to gauge general interest in anesthesia in comparison with surgery and pain. Methods The trend of keyword searches from 2004 to 2010 related to anesthesia and anaesthesia was investigated using Google Insights for Search. The trend of number of peer reviewed articles on anesthesia cited on PubMed and Medline from 2004 to 2010 was investigated. The average cost on advertising on anesthesia, surgery and pain was estimated using Google AdWords. Searching results in other common search engines were also analyzed. Correlation between year and relative number of searches was determined with pengines may provide different total number of searching results (available posts), the ratios of searching results between some common keywords related to perioperative care are comparable, indicating similar trend. The peer reviewed manuscripts on “anesthesia” and the proportion of papers on “anesthesia and outcome” are trending up. Estimates for spending of advertising dollars are less for anesthesia-related terms when compared to that for pain or surgery due to relative smaller number of searching traffic. Conclusions General interest in anesthesia (anaesthesia) as measured by internet searches appears to be decreasing. Pain, preanesthesia evaluation, anesthesia and outcome and side effects of anesthesia are the critical areas that anesthesiologists should focus on to address the increasing concerns. PMID:23853739

  7. When Gravity Fails: Local Search Topology

    Science.gov (United States)

    Frank, Jeremy; Cheeseman, Peter; Stutz, John; Lau, Sonie (Technical Monitor)

    1997-01-01

    Local search algorithms for combinatorial search problems frequently encounter a sequence of states in which it is impossible to improve the value of the objective function; moves through these regions, called {\\em plateau moves), dominate the time spent in local search. We analyze and characterize {\\em plateaus) for three different classes of randomly generated Boolean Satisfiability problems. We identify several interesting features of plateaus that impact the performance of local search algorithms. We show that local minima tend to be small but occasionally may be very large. We also show that local minima can be escaped without unsatisfying a large number of clauses, but that systematically searching for an escape route may be computationally expensive if the local minimum is large. We show that plateaus with exits, called benches, tend to be much larger than minima, and that some benches have very few exit states which local search can use to escape. We show that the solutions (i.e. global minima) of randomly generated problem instances form clusters, which behave similarly to local minima. We revisit several enhancements of local search algorithms and explain their performance in light of our results. Finally we discuss strategies for creating the next generation of local search algorithms.

  8. Automatization and training in visual search.

    Science.gov (United States)

    Czerwinski, M; Lightfoot, N; Shiffrin, R M

    1992-01-01

    In several search tasks, the amount of practice on particular combinations of targets and distractors was equated in varied-mapping (VM) and consistent-mapping (CM) conditions. The results indicate the importance of distinguishing between memory and visual search tasks, and implicate a number of factors that play important roles in visual search and its learning. Visual search was studied in Experiment 1. VM and CM performance were almost equal, and slope reductions occurred during practice for both, suggesting the learning of efficient attentive search based on features, and no important role for automatic attention attraction. However, positive transfer effects occurred when previous CM targets were re-paired with previous CM distractors, even though these targets and distractors had not been trained together. Also, the introduction of a demanding simultaneous task produced advantages of CM over VM. These latter two results demonstrated the operation of automatic attention attraction. Visual search was further studied in Experiment 2, using novel characters for which feature overlap and similarity were controlled. The design and many of the findings paralleled Experiment 1. In addition, enormous search improvement was seen over 35 sessions of training, suggesting the operation of perceptual unitization for the novel characters. Experiment 3 showed a large, persistent advantage for CM over VM performance in memory search, even when practice on particular combinations of targets and distractors was equated in the two training conditions. A multifactor theory of automatization and attention is put forth to account for these findings and others in the literature.

  9. Intrinsic position uncertainty impairs overt search performance.

    Science.gov (United States)

    Semizer, Yelda; Michel, Melchi M

    2017-08-01

    Uncertainty regarding the position of the search target is a fundamental component of visual search. However, due to perceptual limitations of the human visual system, this uncertainty can arise from intrinsic, as well as extrinsic, sources. The current study sought to characterize the role of intrinsic position uncertainty (IPU) in overt visual search and to determine whether it significantly limits human search performance. After completing a preliminary detection experiment to characterize sensitivity as a function of visual field position, observers completed a search task that required localizing a Gabor target within a field of synthetic luminance noise. The search experiment included two clutter conditions designed to modulate the effect of IPU across search displays of varying set size. In the Cluttered condition, the display was tiled uniformly with feature clutter to maximize the effects of IPU. In the Uncluttered condition, the clutter at irrelevant locations was removed to attenuate the effects of IPU. Finally, we derived an IPU-constrained ideal searcher model, limited by the IPU measured in human observers. Ideal searchers were simulated based on the detection sensitivity and fixation sequences measured for individual human observers. The IPU-constrained ideal searcher predicted performance trends similar to those exhibited by the human observers. In the Uncluttered condition, performance decreased steeply as a function of increasing set size. However, in the Cluttered condition, the effect of IPU dominated and performance was approximately constant as a function of set size. Our findings suggest that IPU substantially limits overt search performance, especially in crowded displays.

  10. Exotic Higgs searches

    CERN Document Server

    Pelliccioni, Mario

    2016-01-01

    Exotic Higgs searches cover a wide range of signatures, thus leading to indications to new physics beyond Standard Model. We report a review on exotic Higgs searches for lepton flavour violating Higgs decays, for "mono-Higgs" searches, for Higgs decays to invisible and for high mass Higgs searches. Both ATLAS and CMS results will be shown, for Run-1 data statistics collected at the energy of $\\sqrt s$ = 7,8 TeV and for the first data collected during Run-2 phase at the energy of $\\sqrt s$ = 13 TeV.

  11. Google Power Search

    CERN Document Server

    Spencer, Stephan

    2011-01-01

    Behind Google's deceptively simple interface is immense power for both market and competitive research-if you know how to use it well. Sure, basic searches are easy, but complex searches require specialized skills. This concise book takes you through the full range of Google's powerful search-refinement features, so you can quickly find the specific information you need. Learn techniques ranging from simple Boolean logic to URL parameters and other advanced tools, and see how they're applied to real-world market research examples. Incorporate advanced search operators such as filetype:, intit

  12. Delaying information search

    Directory of Open Access Journals (Sweden)

    Yaniv Shani

    2012-11-01

    Full Text Available In three studies, we examined factors that may temporarily attenuate information search. People are generally curious and dislike uncertainty, which typically encourages them to look for relevant information. Despite these strong forces that promote information search, people sometimes deliberately delay obtaining valuable information. We find they may do so when they are concerned that the information might interfere with future pleasurable activities. Interestingly, the decision to search or to postpone searching for information is influenced not only by the value and importance of the information itself but also by well-being maintenance goals related to possible detrimental effects that negative knowledge may have on unrelated future plans.

  13. Accurate measurement of streamwise vortices using dual-plane PIV

    Science.gov (United States)

    Waldman, Rye M.; Breuer, Kenneth S.

    2012-11-01

    Low Reynolds number aerodynamic experiments with flapping animals (such as bats and small birds) are of particular interest due to their application to micro air vehicles which operate in a similar parameter space. Previous PIV wake measurements described the structures left by bats and birds and provided insight into the time history of their aerodynamic force generation; however, these studies have faced difficulty drawing quantitative conclusions based on said measurements. The highly three-dimensional and unsteady nature of the flows associated with flapping flight are major challenges for accurate measurements. The challenge of animal flight measurements is finding small flow features in a large field of view at high speed with limited laser energy and camera resolution. Cross-stream measurement is further complicated by the predominately out-of-plane flow that requires thick laser sheets and short inter-frame times, which increase noise and measurement uncertainty. Choosing appropriate experimental parameters requires compromise between the spatial and temporal resolution and the dynamic range of the measurement. To explore these challenges, we do a case study on the wake of a fixed wing. The fixed model simplifies the experiment and allows direct measurements of the aerodynamic forces via load cell. We present a detailed analysis of the wake measurements, discuss the criteria for making accurate measurements, and present a solution for making quantitative aerodynamic load measurements behind free-flyers.

  14. Target similarity effects: support for the parallel distributed processing assumptions.

    Science.gov (United States)

    Humphreys, M S; Tehan, G; O'Shea, A; Bolland, S W

    2000-07-01

    Recent research has begun to provide support for the assumptions that memories are stored as a composite and are accessed in parallel (Tehan & Humphreys, 1998). New predictions derived from these assumptions and from the Chappell and Humphreys (1994) implementation of these assumptions were tested. In three experiments, subjects studied relatively short lists of words. Some of the lists contained two similar targets (thief and theft) or two dissimilar targets (thief and steal) associated with the same cue (robbery). As predicted, target similarity affected performance in cued recall but not free association. Contrary to predictions, two spaced presentations of a target did not improve performance in free association. Two additional experiments confirmed and extended this finding. Several alternative explanations for the target similarity effect, which incorporate assumptions about separate representations and sequential search, are rejected. The importance of the finding that, in at least one implicit memory paradigm, repetition does not improve performance is also discussed.

  15. The Hofmethode: Computing Semantic Similarities between E-Learning Products

    Directory of Open Access Journals (Sweden)

    Oliver Michel

    2009-11-01

    Full Text Available The key task in building useful e-learning repositories is to develop a system with an algorithm allowing users to retrieve information that corresponds to their specific requirements. To achieve this, products (or their verbal descriptions, i.e. presented in metadata need to be compared and structured according to the results of this comparison. Such structuring is crucial insofar as there are many search results that correspond to the entered keyword. The Hofmethode is an algorithm (based on psychological considerations to compute semantic similarities between texts and therefore offer a way to compare e-learning products. The computed similarity values are used to build semantic maps in which the products are visually arranged according to their similarities. The paper describes how the Hofmethode is implemented in the online database edulap, and how it contributes to help the user to explore the data in which he is interested.

  16. Application of Similarity in Fault Diagnosis of Power Electronics Circuits

    Science.gov (United States)

    Rongjie, Wang; Yiju, Zhan; Meiqian, Chen; Haifeng, Zhou; Kewei, Guo

    A method of fault diagnosis was proposed for power electronics circuits based on S transforms similarity. At first, the standard module time-frequency matrixes of S transforms for all fault signals were constructed, then the similarity of fault signals' module time-frequency matrixes to standard module time-frequency matrixes were calculated, and according to the principle of maximum similarity, the faults were diagnosed. The simulation result of fault diagnosis of a thyristor in a three-phase full-bridge controlled rectifier shows that the method can accurately diagnose faults and locate the fault element for power electronics circuits, and it has excellent performance for noise robustness and calculation complexity, thus it also has good practical engineering value in the solution to the fault problems for power electronics circuits.

  17. Self-similarity of complex networks and hidden metric spaces.

    Science.gov (United States)

    Serrano, M Angeles; Krioukov, Dmitri; Boguñá, Marián

    2008-02-22

    We demonstrate that the self-similarity of some scale-free networks with respect to a simple degree-thresholding renormalization scheme finds a natural interpretation in the assumption that network nodes exist in hidden metric spaces. Clustering, i.e., cycles of length three, plays a crucial role in this framework as a topological reflection of the triangle inequality in the hidden geometry. We prove that a class of hidden variable models with underlying metric spaces are able to accurately reproduce the self-similarity properties that we measured in the real networks. Our findings indicate that hidden geometries underlying these real networks are a plausible explanation for their observed topologies and, in particular, for their self-similarity with respect to the degree-based renormalization.

  18. Modified cuckoo search algorithm in microscopic image segmentation of hippocampus.

    Science.gov (United States)

    Chakraborty, Shouvik; Chatterjee, Sankhadeep; Dey, Nilanjan; Ashour, Amira S; Ashour, Ahmed S; Shi, Fuqian; Mali, Kalyani

    2017-10-01

    Microscopic image analysis is one of the challenging tasks due to the presence of weak correlation and different segments of interest that may lead to ambiguity. It is also valuable in foremost meadows of technology and medicine. Identification and counting of cells play a vital role in features extraction to diagnose particular diseases precisely. Different segments should be identified accurately in order to identify and to count cells in a microscope image. Consequently, in the current work, a novel method for cell segmentation and identification has been proposed that incorporated marking cells. Thus, a novel method based on cuckoo search after pre-processing step is employed. The method is developed and evaluated on light microscope images of rats' hippocampus which used as a sample for the brain cells. The proposed method can be applied on the color images directly. The proposed approach incorporates the McCulloch's method for lévy flight production in cuckoo search (CS) algorithm. Several objective functions, namely Otsu's method, Kapur entropy and Tsallis entropy are used for segmentation. In the cuckoo search process, the Otsu's between class variance, Kapur's entropy and Tsallis entropy are employed as the objective functions to be optimized. Experimental results are validated by different metrics, namely the peak signal to noise ratio (PSNR), mean square error, feature similarity index and CPU running time for all the test cases. The experimental results established that the Kapur's entropy segmentation method based on the modified CS required the least computational time compared to Otsu's between-class variance segmentation method and the Tsallis entropy segmentation method. Nevertheless, Tsallis entropy method with optimized multi-threshold levels achieved superior performance compared to the other two segmentation methods in terms of the PSNR. © 2017 Wiley Periodicals, Inc.

  19. Stonehenge: A Simple and Accurate Predictor of Lunar Eclipses

    Science.gov (United States)

    Challener, S.

    1999-12-01

    Over the last century, much has been written about the astronomical significance of Stonehenge. The rage peaked in the mid to late 1960s when new computer technology enabled astronomers to make the first complete search for celestial alignments. Because there are hundreds of rocks or holes at Stonehenge and dozens of bright objects in the sky, the quest was fraught with obvious statistical problems. A storm of controversy followed and the subject nearly vanished from print. Only a handful of these alignments remain compelling. Today, few astronomers and still fewer archaeologists would argue that Stonehenge served primarily as an observatory. Instead, Stonehenge probably served as a sacred meeting place, which was consecrated by certain celestial events. These would include the sun's risings and settings at the solstices and possibly some lunar risings as well. I suggest that Stonehenge was also used to predict lunar eclipses. While Hawkins and Hoyle also suggested that Stonehenge was used in this way, their methods are complex and they make use of only early, minor, or outlying areas of Stonehenge. In contrast, I suggest a way that makes use of the imposing, central region of Stonehenge; the area built during the final phase of activity. To predict every lunar eclipse without predicting eclipses that do not occur, I use the less familiar lunar cycle of 47 lunar months. By moving markers about the Sarsen Circle, the Bluestone Circle, and the Bluestone Horseshoe, all umbral lunar eclipses can be predicted accurately.

  20. Location-based Services using Image Search

    DEFF Research Database (Denmark)

    Vertongen, Pieter-Paulus; Hansen, Dan Witzner

    2008-01-01

    Recent developments in image search has made them sufficiently efficient to be used in real-time applications. GPS has become a popular navigation tool. While GPS information provide reasonably good accuracy, they are not always present in all hand held devices nor are they accurate in all situat...... of the image search engine and database image location knowledge, the location is determined of the query image and associated data can be presented to the user.......Recent developments in image search has made them sufficiently efficient to be used in real-time applications. GPS has become a popular navigation tool. While GPS information provide reasonably good accuracy, they are not always present in all hand held devices nor are they accurate in all...... situations, for example in urban environments. We propose a system to provide location-based services using image searches without requiring GPS. The goal of this system is to assist tourists in cities with additional information using their mobile phones and built-in cameras. Based upon the result...

  1. Selective scanpath repetition during memory-guided visual search.

    Science.gov (United States)

    Wynn, Jordana S; Bone, Michael B; Dragan, Michelle C; Hoffman, Kari L; Buchsbaum, Bradley R; Ryan, Jennifer D

    2016-01-02

    Visual search efficiency improves with repetition of a search display, yet the mechanisms behind these processing gains remain unclear. According to Scanpath Theory, memory retrieval is mediated by repetition of the pattern of eye movements or "scanpath" elicited during stimulus encoding. Using this framework, we tested the prediction that scanpath recapitulation reflects relational memory guidance during repeated search events. Younger and older subjects were instructed to find changing targets within flickering naturalistic scenes. Search efficiency (search time, number of fixations, fixation duration) and scanpath similarity (repetition) were compared across age groups for novel (V1) and repeated (V2) search events. Younger adults outperformed older adults on all efficiency measures at both V1 and V2, while the search time benefit for repeated viewing (V1-V2) did not differ by age. Fixation-binned scanpath similarity analyses revealed repetition of initial and final (but not middle) V1 fixations at V2, with older adults repeating more initial V1 fixations than young adults. In young adults only, early scanpath similarity correlated negatively with search time at test, indicating increased efficiency, whereas the similarity of V2 fixations to middle V1 fixations predicted poor search performance. We conclude that scanpath compression mediates increased search efficiency by selectively recapitulating encoding fixations that provide goal-relevant input. Extending Scanpath Theory, results suggest that scanpath repetition varies as a function of time and memory integrity.

  2. A similarity-based data warehousing environment for medical images.

    Science.gov (United States)

    Teixeira, Jefferson William; Annibal, Luana Peixoto; Felipe, Joaquim Cezar; Ciferri, Ricardo Rodrigues; Ciferri, Cristina Dutra de Aguiar

    2015-11-01

    A core issue of the decision-making process in the medical field is to support the execution of analytical (OLAP) similarity queries over images in data warehousing environments. In this paper, we focus on this issue. We propose imageDWE, a non-conventional data warehousing environment that enables the storage of intrinsic features taken from medical images in a data warehouse and supports OLAP similarity queries over them. To comply with this goal, we introduce the concept of perceptual layer, which is an abstraction used to represent an image dataset according to a given feature descriptor in order to enable similarity search. Based on this concept, we propose the imageDW, an extended data warehouse with dimension tables specifically designed to support one or more perceptual layers. We also detail how to build an imageDW and how to load image data into it. Furthermore, we show how to process OLAP similarity queries composed of a conventional predicate and a similarity search predicate that encompasses the specification of one or more perceptual layers. Moreover, we introduce an index technique to improve the OLAP query processing over images. We carried out performance tests over a data warehouse environment that consolidated medical images from exams of several modalities. The results demonstrated the feasibility and efficiency of our proposed imageDWE to manage images and to process OLAP similarity queries. The results also demonstrated that the use of the proposed index technique guaranteed a great improvement in query processing. Copyright © 2015 Elsevier Ltd. All rights reserved.

  3. Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction

    DEFF Research Database (Denmark)

    Wichterich, Marc; Assent, Ira; Philipp, Kranen

    2008-01-01

    to medium dimensionality. However, multimedia applications commonly exhibit high-dimensional feature representations for which the computational complexity of the EMD hinders its adoption. An efficient query processing approach that mitigates and overcomes this effect is crucial. We propose novel...

  4. HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool

    National Research Council Canada - National Science Library

    O'Driscoll, Aisling; Belogrudov, Vladislav; Carroll, John; Kropp, Kai; Walsh, Paul; Ghazal, Peter; Sleator, Roy D

    2015-01-01

    ... large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is...

  5. HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool

    National Research Council Canada - National Science Library

    O'Driscoll, Aisling; Belogrudov, Vladislav; Carroll, John; Kropp, Kai; Walsh, Paul; Ghazal, Peter; Sleator, Roy D

    2015-01-01

    ... large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets...

  6. Personalized online information search and visualization

    Directory of Open Access Journals (Sweden)

    Orthner Helmuth F

    2005-03-01

    Full Text Available Abstract Background The rapid growth of online publications such as the Medline and other sources raises the questions how to get the relevant information efficiently. It is important, for a bench scientist, e.g., to monitor related publications constantly. It is also important, for a clinician, e.g., to access the patient records anywhere and anytime. Although time-consuming, this kind of searching procedure is usually similar and simple. Likely, it involves a search engine and a visualization interface. Different words or combination reflects different research topics. The objective of this study is to automate this tedious procedure by recording those words/terms in a database and online sources, and use the information for an automated search and retrieval. The retrieved information will be available anytime and anywhere through a secure web server. Results We developed such a database that stored searching terms, journals and et al., and implement a piece of software for searching the medical subject heading-indexed sources such as the Medline and other online sources automatically. The returned information were stored locally, as is, on a server and visible through a Web-based interface. The search was performed daily or otherwise scheduled and the users logon to the website anytime without typing any words. The system has potentials to retrieve similarly from non-medical subject heading-indexed literature or a privileged information source such as a clinical information system. The issues such as security, presentation and visualization of the retrieved information were thus addressed. One of the presentation issues such as wireless access was also experimented. A user survey showed that the personalized online searches saved time and increased and relevancy. Handheld devices could also be used to access the stored information but less satisfactory. Conclusion The Web-searching software or similar system has potential to be an efficient

  7. Strain Elastography of Breast and Prostata Cancer: Similarities and Differences.

    Science.gov (United States)

    Daniaux, M; Auer, T; De Zordo, T; Junker, D; Santner, W; Hubalek, M; Jaschke, W; Aigner, F

    2016-03-01

    Typically both breast and prostate cancer present as tissue with decreased elasticity. Palpation is the oldest technique of tumor detection in both organs and is based on this principle. Thus an operator can grade a palpable mass as suspicious for cancer. Strain elastography as modern ultrasound technique allows the visualization of tissue elasticity in a color coded elastogram and can be understood as technical finger. The following article shows similarities and differences of ultrasound strain elastography in the diagnosis of breast and prostate cancer. • In prostata cancer elastography, in breast cancer B-mode is the primary sonographic search modality. • The diagnostic value of the search modalities change with increasing age.• A cut-off value for a strain ratio is hard to obtain in the elastography of the prostata, because there is no stabile reference tissue in the prostata. © Georg Thieme Verlag KG Stuttgart · New York.

  8. A stiffly accurate integrator for elastodynamic problems

    KAUST Repository

    Michels, Dominik L.

    2017-07-21

    We present a new integration algorithm for the accurate and efficient solution of stiff elastodynamic problems governed by the second-order ordinary differential equations of structural mechanics. Current methods have the shortcoming that their performance is highly dependent on the numerical stiffness of the underlying system that often leads to unrealistic behavior or a significant loss of efficiency. To overcome these limitations, we present a new integration method which is based on a mathematical reformulation of the underlying differential equations, an exponential treatment of the full nonlinear forcing operator as opposed to more standard partially implicit or exponential approaches, and the utilization of the concept of stiff accuracy which ensures that the efficiency of the simulations is significantly less sensitive to increased stiffness. As a consequence, we are able to tremendously accelerate the simulation of stiff systems compared to established integrators and significantly increase the overall accuracy. The advantageous behavior of this approach is demonstrated on a broad spectrum of complex examples like deformable bodies, textiles, bristles, and human hair. Our easily parallelizable integrator enables more complex and realistic models to be explored in visual computing without compromising efficiency.

  9. Accurate Theoretical Thermochemistry for Fluoroethyl Radicals.

    Science.gov (United States)

    Ganyecz, Ádám; Kállay, Mihály; Csontos, József

    2017-02-09

    An accurate coupled-cluster (CC) based model chemistry was applied to calculate reliable thermochemical quantities for hydrofluorocarbon derivatives including radicals 1-fluoroethyl (CH3-CHF), 1,1-difluoroethyl (CH3-CF2), 2-fluoroethyl (CH2F-CH2), 1,2-difluoroethyl (CH2F-CHF), 2,2-difluoroethyl (CHF2-CH2), 2,2,2-trifluoroethyl (CF3-CH2), 1,2,2,2-tetrafluoroethyl (CF3-CHF), and pentafluoroethyl (CF3-CF2). The model chemistry used contains iterative triple and perturbative quadruple excitations in CC theory, as well as scalar relativistic and diagonal Born-Oppenheimer corrections. To obtain heat of formation values with better than chemical accuracy perturbative quadruple excitations and scalar relativistic corrections were inevitable. Their contributions to the heats of formation steadily increase with the number of fluorine atoms in the radical reaching 10 kJ/mol for CF3-CF2. When discrepancies were found between the experimental and our values it was always possible to resolve the issue by recalculating the experimental result with currently recommended auxiliary data. For each radical studied here this study delivers the best heat of formation as well as entropy data.

  10. Accurate lineshape spectroscopy and the Boltzmann constant.

    Science.gov (United States)

    Truong, G-W; Anstie, J D; May, E F; Stace, T M; Luiten, A N

    2015-10-14

    Spectroscopy has an illustrious history delivering serendipitous discoveries and providing a stringent testbed for new physical predictions, including applications from trace materials detection, to understanding the atmospheres of stars and planets, and even constraining cosmological models. Reaching fundamental-noise limits permits optimal extraction of spectroscopic information from an absorption measurement. Here, we demonstrate a quantum-limited spectrometer that delivers high-precision measurements of the absorption lineshape. These measurements yield a very accurate measurement of the excited-state (6P1/2) hyperfine splitting in Cs, and reveals a breakdown in the well-known Voigt spectral profile. We develop a theoretical model that accounts for this breakdown, explaining the observations to within the shot-noise limit. Our model enables us to infer the thermal velocity dispersion of the Cs vapour with an uncertainty of 35 p.p.m. within an hour. This allows us to determine a value for Boltzmann's constant with a precision of 6 p.p.m., and an uncertainty of 71 p.p.m.

  11. Accurate Emission Line Diagnostics at High Redshift

    Science.gov (United States)

    Jones, Tucker

    2017-08-01

    How do the physical conditions of high redshift galaxies differ from those seen locally? Spectroscopic surveys have invested hundreds of nights of 8- and 10-meter telescope time as well as hundreds of Hubble orbits to study evolution in the galaxy population at redshifts z 0.5-4 using rest-frame optical strong emission line diagnostics. These surveys reveal evolution in the gas excitation with redshift but the physical cause is not yet understood. Consequently there are large systematic errors in derived quantities such as metallicity.We have used direct measurements of gas density, temperature, and metallicity in a unique sample at z=0.8 to determine reliable diagnostics for high redshift galaxies. Our measurements suggest that offsets in emission line ratios at high redshift are primarily caused by high N/O abundance ratios. However, our ground-based data cannot rule out other interpretations. Spatially resolved Hubble grism spectra are needed to distinguish between the remaining plausible causes such as active nuclei, shocks, diffuse ionized gas emission, and HII regions with escaping ionizing flux. Identifying the physical origin of evolving excitation will allow us to build the necessary foundation for accurate measurements of metallicity and other properties of high redshift galaxies. Only then can we expoit the wealth of data from current surveys and near-future JWST spectroscopy to understand how galaxies evolve over time.

  12. Fast and accurate exhaled breath ammonia measurement.

    Science.gov (United States)

    Solga, Steven F; Mudalel, Matthew L; Spacek, Lisa A; Risby, Terence H

    2014-06-11

    This exhaled breath ammonia method uses a fast and highly sensitive spectroscopic method known as quartz enhanced photoacoustic spectroscopy (QEPAS) that uses a quantum cascade based laser. The monitor is coupled to a sampler that measures mouth pressure and carbon dioxide. The system is temperature controlled and specifically designed to address the reactivity of this compound. The sampler provides immediate feedback to the subject and the technician on the quality of the breath effort. Together with the quick response time of the monitor, this system is capable of accurately measuring exhaled breath ammonia representative of deep lung systemic levels. Because the system is easy to use and produces real time results, it has enabled experiments to identify factors that influence measurements. For example, mouth rinse and oral pH reproducibly and significantly affect results and therefore must be controlled. Temperature and mode of breathing are other examples. As our understanding of these factors evolves, error is reduced, and clinical studies become more meaningful. This system is very reliable and individual measurements are inexpensive. The sampler is relatively inexpensive and quite portable, but the monitor is neither. This limits options for some clinical studies and provides rational for future innovations.

  13. Learning-Based Video Superresolution Reconstruction Using Spatiotemporal Nonlocal Similarity

    Directory of Open Access Journals (Sweden)

    Meiyu Liang

    2015-01-01

    Full Text Available Aiming at improving the video visual resolution quality and details clarity, a novel learning-based video superresolution reconstruction algorithm using spatiotemporal nonlocal similarity is proposed in this paper. Objective high-resolution (HR estimations of low-resolution (LR video frames can be obtained by learning LR-HR correlation mapping and fusing spatiotemporal nonlocal similarities between video frames. With the objective of improving algorithm efficiency while guaranteeing superresolution quality, a novel visual saliency-based LR-HR correlation mapping strategy between LR and HR patches is proposed based on semicoupled dictionary learning. Moreover, aiming at improving performance and efficiency of spatiotemporal similarity matching and fusion, an improved spatiotemporal nonlocal fuzzy registration scheme is established using the similarity weighting strategy based on pseudo-Zernike moment feature similarity and structural similarity, and the self-adaptive regional correlation evaluation strategy. The proposed spatiotemporal fuzzy registration scheme does not rely on accurate estimation of subpixel motion, and therefore it can be adapted to complex motion patterns and is robust to noise and rotation. Experimental results demonstrate that the proposed algorithm achieves competitive superresolution quality compared to other state-of-the-art algorithms in terms of both subjective and objective evaluations.

  14. Spying on Search Strategies

    Science.gov (United States)

    Tenopir, Carol

    2004-01-01

    Only the most dedicated super-searchers are motivated to learn and control command systems, like DialogClassic, that rely on the user to input complex search strategies. Infrequent searchers and most end users choose interfaces that do some of the work for them and make the search process appear easy. However, the easier a good interface seems to…

  15. Agronomie Africaine: Advanced Search

    African Journals Online (AJOL)

    Search tips: Search terms are case-insensitive; Common words are ignored; By default only articles containing all terms in the query are returned (i.e., AND is implied); Combine multiple words with OR to find articles containing either term; e.g., education OR research; Use parentheses to create more complex queries; e.g., ...

  16. Scientia Africana: Advanced Search

    African Journals Online (AJOL)

    Search tips: Search terms are case-insensitive; Common words are ignored; By default only articles containing all terms in the query are returned (i.e., AND is implied); Combine multiple words with OR to find articles containing either term; e.g., education OR research; Use parentheses to create more complex queries; e.g., ...

  17. With News Search Engines

    Science.gov (United States)

    Gunn, Holly

    2005-01-01

    Although there are many news search engines on the Web, finding the news items one wants can be challenging. Choosing appropriate search terms is one of the biggest challenges. Unless one has seen the article that one is seeking, it is often difficult to select words that were used in the headline or text of the article. The limited archives of…

  18. Biokemistri: Advanced Search

    African Journals Online (AJOL)

    Search tips: Search terms are case-insensitive; Common words are ignored; By default only articles containing all terms in the query are returned (i.e., AND is implied); Combine multiple words with OR to find articles containing either term; e.g., education OR research; Use parentheses to create more complex queries; e.g., ...

  19. Searches for Supersymmetry

    CERN Document Server

    Ventura, Andrea; The ATLAS collaboration

    2017-01-01

    New and recent results on Supersymmetry searches are shown for the ATLAS and the CMS experiments. Analyses with about 36 fb$^{-1}$ are considered for searches concerning light squarks and gluinos, direct pair production of 3$^{rd}$ generation squarks, electroweak production of charginos, neutralinos, sleptons, R-parity violating scenarios and long-lived particles.

  20. Search and Recommendation

    DEFF Research Database (Denmark)

    Bogers, Toine

    2014-01-01

    -scale application by companies like Amazon, Facebook, and Netflix. But are search and recommendation really two different fields of research that address different problems with different sets of algorithms in papers published at distinct conferences? In my talk, I want to argue that search and recommendation...

  1. Search and switching costs

    NARCIS (Netherlands)

    Siekman, Wilhelm Henricus

    2016-01-01

    This thesis analyses markets with search and with switching costs. It provides insights in several important issues in search markets, including how loss aversion may affect consumer behavior and firm conduct, and how prices, welfare, and profits may change when an intermediating platform orders

  2. Sciences & Nature: Advanced Search

    African Journals Online (AJOL)

    Search tips: Search terms are case-insensitive; Common words are ignored; By default only articles containing all terms in the query are returned (i.e., AND is implied); Combine multiple words with OR to find articles containing either term; e.g., education OR research; Use parentheses to create more complex queries; e.g., ...

  3. Afrika Statistika: Advanced Search

    African Journals Online (AJOL)

    Search tips: Search terms are case-insensitive; Common words are ignored; By default only articles containing all terms in the query are returned (i.e., AND is implied); Combine multiple words with OR to find articles containing either term; e.g., education OR research; Use parentheses to create more complex queries; e.g., ...

  4. Quaestiones Mathematicae: Advanced Search

    African Journals Online (AJOL)

    Search tips: Search terms are case-insensitive; Common words are ignored; By default only articles containing all terms in the query are returned (i.e., AND is implied); Combine multiple words with OR to find articles containing either term; e.g., education OR research; Use parentheses to create more complex queries; e.g., ...

  5. Acta Theologica: Advanced Search

    African Journals Online (AJOL)

    Search tips: Search terms are case-insensitive; Common words are ignored; By default only articles containing all terms in the query are returned (i.e., AND is implied); Combine multiple words with OR to find articles containing either term; e.g., education OR research; Use parentheses to create more complex queries; e.g., ...

  6. Nigerian Libraries: Advanced Search

    African Journals Online (AJOL)

    Search tips: Search terms are case-insensitive; Common words are ignored; By default only articles containing all terms in the query are returned (i.e., AND is implied); Combine multiple words with OR to find articles containing either term; e.g., education OR research; Use parentheses to create more complex queries; e.g., ...

  7. Water SA: Advanced Search

    African Journals Online (AJOL)

    Search tips: Search terms are case-insensitive; Common words are ignored; By default only articles containing all terms in the query are returned (i.e., AND is implied); Combine multiple words with OR to find articles containing either term; e.g., education OR research; Use parentheses to create more complex queries; e.g., ...

  8. Lexikos: Advanced Search

    African Journals Online (AJOL)

    Search tips: Search terms are case-insensitive; Common words are ignored; By default only articles containing all terms in the query are returned (i.e., AND is implied); Combine multiple words with OR to find articles containing either term; e.g., education OR research; Use parentheses to create more complex queries; e.g., ...

  9. Philosophical Papers: Advanced Search

    African Journals Online (AJOL)

    Search tips: Search terms are case-insensitive; Common words are ignored; By default only articles containing all terms in the query are returned (i.e., AND is implied); Combine multiple words with OR to find articles containing either term; e.g., education OR research; Use parentheses to create more complex queries; e.g., ...

  10. Kiswahili: Advanced Search

    African Journals Online (AJOL)

    Search tips: Search terms are case-insensitive; Common words are ignored; By default only articles containing all terms in the query are returned (i.e., AND is implied); Combine multiple words with OR to find articles containing either term; e.g., education OR research; Use parentheses to create more complex queries; e.g., ...

  11. Afrimedic Journal: Advanced Search

    African Journals Online (AJOL)

    Search tips: Search terms are case-insensitive; Common words are ignored; By default only articles containing all terms in the query are returned (i.e., AND is implied); Combine multiple words with OR to find articles containing either term; e.g., education OR research; Use parentheses to create more complex queries; e.g., ...

  12. Kenya Veterinarian: Advanced Search

    African Journals Online (AJOL)

    Search tips: Search terms are case-insensitive; Common words are ignored; By default only articles containing all terms in the query are returned (i.e., AND is implied); Combine multiple words with OR to find articles containing either term; e.g., education OR research; Use parentheses to create more complex queries; e.g., ...

  13. Building Internet Search Engines

    Directory of Open Access Journals (Sweden)

    Mustafa Akgül

    1996-09-01

    Full Text Available Internet search engines are powerful tools to find electronics objects such as addresses of individuals and institutions, documents, statistics of all kinds, dictionaries, cata­logs, product information etc. This paper explains how to build and run some very common search engines on Unix platforms, so as to serve documents through the Web.

  14. ElasticSearch cookbook

    CERN Document Server

    Paro, Alberto

    2015-01-01

    If you are a developer who implements ElasticSearch in your web applications and want to sharpen your understanding of the core elements and applications, this is the book for you. It is assumed that you've got working knowledge of JSON and, if you want to extend ElasticSearch, of Java and related technologies.

  15. Searches for Supersymmetry

    CERN Document Server

    Ventura, Andrea; The ATLAS collaboration

    2017-01-01

    New and recents results on Supersymmetry searches are shown for the ATLAS and the CMS experiments. Analyses with about 36 fb^-1 are considered for searches concerning light squarks and gluinos, direct pair production of 3rd generation squarks, electroweak production of charginos, neutralinos, sleptons, R-parity violating scenarios and long-lived particles.

  16. African Environment: Advanced Search

    African Journals Online (AJOL)

    Search tips: Search terms are case-insensitive; Common words are ignored; By default only articles containing all terms in the query are returned (i.e., AND is implied); Combine multiple words with OR to find articles containing either term; e.g., education OR research; Use parentheses to create more complex queries; e.g., ...

  17. Human memory search

    NARCIS (Netherlands)

    Davelaar, E.J.; Raaijmakers, J.G.W.; Hills, T.T.; Robbins, T.W.; Todd, P.M.

    2012-01-01

    The importance of understanding human memory search is hard to exaggerate: we build and live our lives based on what whe remember. This chapter explores the characteristics of memory search, with special emphasis on the use of retrieval cues. We introduce the dependent measures that are obtained

  18. Fixing Dataset Search

    Science.gov (United States)

    Lynnes, Chris

    2014-01-01

    Three current search engines are queried for ozone data at the GES DISC. The results range from sub-optimal to counter-intuitive. We propose a method to fix dataset search by implementing a robust relevancy ranking scheme. The relevancy ranking scheme is based on several heuristics culled from more than 20 years of helping users select datasets.

  19. Vulture News: Advanced Search

    African Journals Online (AJOL)

    Search tips: Search terms are case-insensitive; Common words are ignored; By default only articles containing all terms in the query are returned (i.e., AND is implied); Combine multiple words with OR to find articles containing either term; e.g., education OR research; Use parentheses to create more complex queries; e.g., ...

  20. Ergonomics SA: Advanced Search

    African Journals Online (AJOL)

    Search tips: Search terms are case-insensitive; Common words are ignored; By default only articles containing all terms in the query are returned (i.e., AND is implied); Combine multiple words with OR to find articles containing either term; e.g., education OR research; Use parentheses to create more complex queries; e.g., ...

  1. Zede Journal: Advanced Search

    African Journals Online (AJOL)

    Search tips: Search terms are case-insensitive; Common words are ignored; By default only articles containing all terms in the query are returned (i.e., AND is implied); Combine multiple words with OR to find articles containing either term; e.g., education OR research; Use parentheses to create more complex queries; e.g., ...

  2. African Anthropologist: Advanced Search

    African Journals Online (AJOL)

    Search tips: Search terms are case-insensitive; Common words are ignored; By default only articles containing all terms in the query are returned (i.e., AND is implied); Combine multiple words with OR to find articles containing either term; e.g., education OR research; Use parentheses to create more complex queries; e.g., ...

  3. Africa Development: Advanced Search

    African Journals Online (AJOL)

    Search tips: Search terms are case-insensitive; Common words are ignored; By default only articles containing all terms in the query are returned (i.e., AND is implied); Combine multiple words with OR to find articles containing either term; e.g., education OR research; Use parentheses to create more complex queries; e.g., ...

  4. Innovation: Advanced Search

    African Journals Online (AJOL)

    Search tips: Search terms are case-insensitive; Common words are ignored; By default only articles containing all terms in the query are returned (i.e., AND is implied); Combine multiple words with OR to find articles containing either term; e.g., education OR research; Use parentheses to create more complex queries; e.g., ...

  5. Acta Structilia: Advanced Search

    African Journals Online (AJOL)

    Search tips: Search terms are case-insensitive; Common words are ignored; By default only articles containing all terms in the query are returned (i.e., AND is implied); Combine multiple words with OR to find articles containing either term; e.g., education OR research; Use parentheses to create more complex queries; e.g., ...

  6. Agrosearch: Advanced Search

    African Journals Online (AJOL)

    Search tips: Search terms are case-insensitive; Common words are ignored; By default only articles containing all terms in the query are returned (i.e., AND is implied); Combine multiple words with OR to find articles containing either term; e.g., education OR research; Use parentheses to create more complex queries; e.g., ...

  7. African Zoology: Advanced Search

    African Journals Online (AJOL)

    Search tips: Search terms are case-insensitive; Common words are ignored; By default only articles containing all terms in the query are returned (i.e., AND is implied); Combine multiple words with OR to find articles containing either term; e.g., education OR research; Use parentheses to create more complex queries; e.g., ...

  8. Rwanda Journal: Advanced Search

    African Journals Online (AJOL)

    Search tips: Search terms are case-insensitive; Common words are ignored; By default only articles containing all terms in the query are returned (i.e., AND is implied); Combine multiple words with OR to find articles containing either term; e.g., education OR research; Use parentheses to create more complex queries; e.g., ...

  9. Distributed Deep Web Search

    NARCIS (Netherlands)

    Tjin-Kam-Jet, Kien

    2013-01-01

    The World Wide Web contains billions of documents (and counting); hence, it is likely that some document will contain the answer or content you are searching for. While major search engines like Bing and Google often manage to return relevant results to your query, there are plenty of situations in

  10. Hit expansion approaches using multiple similarity methods and virtualized query structures.

    Science.gov (United States)

    Bergner, Andreas; Parel, Serge P

    2013-05-24

    Ligand-based virtual screening and computational hit expansion methods undoubtedly facilitate the finding of novel active chemical entities, utilizing already existing knowledge of active compounds. It has been demonstrated that the parallel execution of complementary similarity search methods enhances the performance of such virtual screening campaigns. In this article, we examine the use of virtualized template (query, seed) structures as an extension to common search methods, such as fingerprint and pharmacophore graph-based similarity searches. We demonstrate that template virtualization by bioisosteric enumeration and other rule-based methods, in combination with standard similarity search techniques, represents a powerful approach for hit expansion following high-throughput screening campaigns. The reliability of the methods is demonstrated by four different test data sets representing different target classes and two hit finding case studies on the epigenetic targets G9a and LSD1.

  11. FitSearch: a robust way to interpret a yeast fitness profile in terms of drug's mode-of-action.

    Science.gov (United States)

    Lee, Minho; Han, Sangjo; Chang, Hyeshik; Kwak, Youn-Sig; Weller, David M; Kim, Dongsup

    2013-01-01

    Yeast deletion-mutant collections have been successfully used to infer the mode-of-action of drugs especially by profiling chemical-genetic and genetic-genetic interactions on a genome-wide scale. Although tens of thousands of those profiles are publicly available, a lack of an accurate method for mining such data has been a major bottleneck for more widespread use of these useful resources. For general usage of those public resources, we designed FitRankDB as a general repository of fitness profiles, and developed a new search algorithm, FitSearch, for identifying the profiles that have a high similarity score with statistical significance for a given fitness profile. We demonstrated that our new repository and algorithm are highly beneficial to researchers who attempting to make hypotheses based on unknown modes-of-action of bioactive compounds, regardless of the types of experiments that have been performed using yeast deletion-mutant collection in various types of different measurement platforms, especially non-chip-based platforms. We showed that our new database and algorithm are useful when attempting to construct a hypothesis regarding the unknown function of a bioactive compound through small-scale experiments with a yeast deletion collection in a platform independent manner. The FitRankDB and FitSearch enhance the ease of searching public yeast fitness profiles and obtaining insights into unknown mechanisms of action of drugs. FitSearch is freely available at http://fitsearch.kaist.ac.kr.

  12. Search and Disrupt

    DEFF Research Database (Denmark)

    Ørding Olsen, Anders

    This paper analyzes how external search is affected by strategic interest alignment among knowledge sources. I focus on misalignment arising from the heterogeneous effects of disruptive technologies by analyzing the influence of incumbents on 2,855 non-incumbents? external knowledge search efforts....... The efforts most likely to solve innovation problems obtained funding from the European Commission?s 7th Framework Program (2007-2013). The results show that involving incumbents improves search in complementary technologies, while demoting it when strategic interests are misaligned in disruptive technologies....... However, incumbent sources engaged in capability reconfiguration to accommodate disruption improve search efforts in disruptive technologies. The paper concludes that the value of external sources is contingent on more than their knowledge. Specifically, interdependence of sources in search gives rise...

  13. Semantic search among heterogeneous biological databases based on gene ontology.

    Science.gov (United States)

    Cao, Shun-Liang; Qin, Lei; He, Wei-Zhong; Zhong, Yang; Zhu, Yang-Yong; Li, Yi-Xue

    2004-05-01

    Semantic search is a key issue in integration of heterogeneous biological databases. In this paper, we present a methodology for implementing semantic search in BioDW, an integrated biological data warehouse. Two tables are presented: the DB2GO table to correlate Gene Ontology (GO) annotated entries from BioDW data sources with GO, and the semantic similarity table to record similarity scores derived from any pair of GO terms. Based on the two tables, multifarious ways for semantic search are provided and the corresponding entries in heterogeneous biological databases in semantic terms can be expediently searched.

  14. Natural Language Search Interfaces: Health Data Needs Single-Field Variable Search

    Science.gov (United States)

    Smith, Sam; Sufi, Shoaib; Goble, Carole; Buchan, Iain

    2016-01-01

    Background Data discovery, particularly the discovery of key variables and their inter-relationships, is key to secondary data analysis, and in-turn, the evolving field of data science. Interface designers have presumed that their users are domain experts, and so they have provided complex interfaces to support these “experts.” Such interfaces hark back to a time when searches needed to be accurate first time as there was a high computational cost associated with each search. Our work is part of a governmental research initiative between the medical and social research funding bodies to improve the use of social data in medical research. Objective The cross-disciplinary nature of data science can make no assumptions regarding the domain expertise of a particular scientist, whose interests may intersect multiple domains. Here we consider the common requirement for scientists to seek archived data for secondary analysis. This has more in common with search needs of the “Google generation” than with their single-domain, single-tool forebears. Our study compares a Google-like interface with traditional ways of searching for noncomplex health data in a data archive. Methods Two user interfaces are evaluated for the same set of tasks in extracting data from surveys stored in the UK Data Archive (UKDA). One interface, Web search, is “Google-like,” enabling users to browse, search for, and view metadata about study variables, whereas the other, traditional search, has standard multioption user interface. Results Using a comprehensive set of tasks with 20 volunteers, we found that the Web search interface met data discovery needs and expectations better than the traditional search. A task × interface repeated measures analysis showed a main effect indicating that answers found through the Web search interface were more likely to be correct (F 1,19=37.3, Psearch interface (F 1,19=18.0, Psearch interface received significantly higher ratings than the traditional

  15. Adaptive Lévy processes and area-restricted search in human foraging.

    Directory of Open Access Journals (Sweden)

    Thomas T Hills

    Full Text Available A considerable amount of research has claimed that animals' foraging behaviors display movement lengths with power-law distributed tails, characteristic of Lévy flights and Lévy walks. Though these claims have recently come into question, the proposal that many animals forage using Lévy processes nonetheless remains. A Lévy process does not consider when or where resources are encountered, and samples movement lengths independently of past experience. However, Lévy processes too have come into question based on the observation that in patchy resource environments resource-sensitive foraging strategies, like area-restricted search, perform better than Lévy flights yet can still generate heavy-tailed distributions of movement lengths. To investigate these questions further, we tracked humans as they searched for hidden resources in an open-field virtual environment, with either patchy or dispersed resource distributions. Supporting previous research, for both conditions logarithmic binning methods were consistent with Lévy flights and rank-frequency methods-comparing alternative distributions using maximum likelihood methods-showed the strongest support for bounded power-law distributions (truncated Lévy flights. However, goodness-of-fit tests found that even bounded power-law distributions only accurately characterized movement behavior for 4 (out of 32 participants. Moreover, paths in the patchy environment (but not the dispersed environment showed a transition to intensive search following resource encounters, characteristic of area-restricted search. Transferring paths between environments revealed that paths generated in the patchy environment were adapted to that environment. Our results suggest that though power-law distributions do not accurately reflect human search, Lévy processes may still describe movement in dispersed environments, but not in patchy environments-where search was area-restricted. Furthermore, our results

  16. Fast Structural Search in Phylogenetic Databases

    Directory of Open Access Journals (Sweden)

    William H. Piel

    2005-01-01

    Full Text Available As the size of phylogenetic databases grows, the need for efficiently searching these databases arises. Thanks to previous and ongoing research, searching by attribute value and by text has become commonplace in these databases. However, searching by topological or physical structure, especially for large databases and especially for approximate matches, is still an art. We propose structural search techniques that, given a query or pattern tree P and a database of phylogenies D, find trees in D that are sufficiently close to P . The “closeness” is a measure of the topological relationships in P that are found to be the same or similar in a tree D in D. We develop a filtering technique that accelerates searches and present algorithms for rooted and unrooted trees where the trees can be weighted or unweighted. Experimental results on comparing the similarity measure with existing tree metrics and on evaluating the efficiency of the search techniques demonstrate that the proposed approach is promising

  17. Quantifying the Search Behaviour of Different Demographics Using Google Correlate

    Science.gov (United States)

    Letchford, Adrian; Preis, Tobias; Moat, Helen Susannah

    2016-01-01

    Vast records of our everyday interests and concerns are being generated by our frequent interactions with the Internet. Here, we investigate how the searches of Google users vary across U.S. states with different birth rates and infant mortality rates. We find that users in states with higher birth rates search for more information about pregnancy, while those in states with lower birth rates search for more information about cats. Similarly, we find that users in states with higher infant mortality rates search for more information about credit, loans and diseases. Our results provide evidence that Internet search data could offer new insight into the concerns of different demographics. PMID:26910464

  18. Quantifying the Search Behaviour of Different Demographics Using Google Correlate.

    Directory of Open Access Journals (Sweden)

    Adrian Letchford

    Full Text Available Vast records of our everyday interests and concerns are being generated by our frequent interactions with the Internet. Here, we investigate how the searches of Google users vary across U.S. states with different birth rates and infant mortality rates. We find that users in states with higher birth rates search for more information about pregnancy, while those in states with lower birth rates search for more information about cats. Similarly, we find that users in states with higher infant mortality rates search for more information about credit, loans and diseases. Our results provide evidence that Internet search data could offer new insight into the concerns of different demographics.

  19. Towards Accurate Application Characterization for Exascale (APEX)

    Energy Technology Data Exchange (ETDEWEB)

    Hammond, Simon David [Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)

    2015-09-01

    Sandia National Laboratories has been engaged in hardware and software codesign activities for a number of years, indeed, it might be argued that prototyping of clusters as far back as the CPLANT machines and many large capability resources including ASCI Red and RedStorm were examples of codesigned solutions. As the research supporting our codesign activities has moved closer to investigating on-node runtime behavior a nature hunger has grown for detailed analysis of both hardware and algorithm performance from the perspective of low-level operations. The Application Characterization for Exascale (APEX) LDRD was a project concieved of addressing some of these concerns. Primarily the research was to intended to focus on generating accurate and reproducible low-level performance metrics using tools that could scale to production-class code bases. Along side this research was an advocacy and analysis role associated with evaluating tools for production use, working with leading industry vendors to develop and refine solutions required by our code teams and to directly engage with production code developers to form a context for the application analysis and a bridge to the research community within Sandia. On each of these accounts significant progress has been made, particularly, as this report will cover, in the low-level analysis of operations for important classes of algorithms. This report summarizes the development of a collection of tools under the APEX research program and leaves to other SAND and L2 milestone reports the description of codesign progress with Sandia’s production users/developers.

  20. Accurate Thermal Conductivities from First Principles

    Science.gov (United States)

    Carbogno, Christian

    2015-03-01

    In spite of significant research efforts, a first-principles determination of the thermal conductivity at high temperatures has remained elusive. On the one hand, Boltzmann transport techniques that include anharmonic effects in the nuclear dynamics only perturbatively become inaccurate or inapplicable under such conditions. On the other hand, non-equilibrium molecular dynamics (MD) methods suffer from enormous finite-size artifacts in the computationally feasible supercells, which prevent an accurate extrapolation to the bulk limit of the thermal conductivity. In this work, we overcome this limitation by performing ab initio MD simulations in thermodynamic equilibrium that account for all orders of anharmonicity. The thermal conductivity is then assessed from the auto-correlation function of the heat flux using the Green-Kubo formalism. Foremost, we discuss the fundamental theory underlying a first-principles definition of the heat flux using the virial theorem. We validate our approach and in particular the techniques developed to overcome finite time and size effects, e.g., by inspecting silicon, the thermal conductivity of which is particularly challenging to converge. Furthermore, we use this framework to investigate the thermal conductivity of ZrO2, which is known for its high degree of anharmonicity. Our calculations shed light on the heat resistance mechanism active in this material, which eventually allows us to discuss how the thermal conductivity can be controlled by doping and co-doping. This work has been performed in collaboration with R. Ramprasad (University of Connecticut), C. G. Levi and C. G. Van de Walle (University of California Santa Barbara).

  1. Optimizing cell arrays for accurate functional genomics.

    Science.gov (United States)

    Fengler, Sven; Bastiaens, Philippe I H; Grecco, Hernán E; Roda-Navarro, Pedro

    2012-07-17

    Cellular responses emerge from a complex network of dynamic biochemical reactions. In order to investigate them is necessary to develop methods that allow perturbing a high number of gene products in a flexible and fast way. Cell arrays (CA) enable such experiments on microscope slides via reverse transfection of cellular colonies growing on spotted genetic material. In contrast to multi-well plates, CA are susceptible to contamination among neighboring spots hindering accurate quantification in cell-based screening projects. Here we have developed a quality control protocol for quantifying and minimizing contamination in CA. We imaged checkered CA that express two distinct fluorescent proteins and segmented images into single cells to quantify the transfection efficiency and interspot contamination. Compared with standard procedures, we measured a 3-fold reduction of contaminants when arrays containing HeLa cells were washed shortly after cell seeding. We proved that nucleic acid uptake during cell seeding rather than migration among neighboring spots was the major source of contamination. Arrays of MCF7 cells developed without the washing step showed 7-fold lower percentage of contaminant cells, demonstrating that contamination is dependent on specific cell properties. Previously published methodological works have focused on achieving high transfection rate in densely packed CA. Here, we focused in an equally important parameter: The interspot contamination. The presented quality control is essential for estimating the rate of contamination, a major source of false positives and negatives in current microscopy based functional genomics screenings. We have demonstrated that a washing step after seeding enhances CA quality for HeLA but is not necessary for MCF7. The described method provides a way to find optimal seeding protocols for cell lines intended to be used for the first time in CA.

  2. Optimizing cell arrays for accurate functional genomics

    Directory of Open Access Journals (Sweden)

    Fengler Sven

    2012-07-01

    Full Text Available Abstract Background Cellular responses emerge from a complex network of dynamic biochemical reactions. In order to investigate them is necessary to develop methods that allow perturbing a high number of gene products in a flexible and fast way. Cell arrays (CA enable such experiments on microscope slides via reverse transfection of cellular colonies growing on spotted genetic material. In contrast to multi-well plates, CA are susceptible to contamination among neighboring spots hindering accurate quantification in cell-based screening projects. Here we have developed a quality control protocol for quantifying and minimizing contamination in CA. Results We imaged checkered CA that express two distinct fluorescent proteins and segmented images into single cells to quantify the transfection efficiency and interspot contamination. Compared with standard procedures, we measured a 3-fold reduction of contaminants when arrays containing HeLa cells were washed shortly after cell seeding. We proved that nucleic acid uptake during cell seeding rather than migration among neighboring spots was the major source of contamination. Arrays of MCF7 cells developed without the washing step showed 7-fold lower percentage of contaminant cells, demonstrating that contamination is dependent on specific cell properties. Conclusions Previously published methodological works have focused on achieving high transfection rate in densely packed CA. Here, we focused in an equally important parameter: The interspot contamination. The presented quality control is essential for estimating the rate of contamination, a major source of false positives and negatives in current microscopy based functional genomics screenings. We have demonstrated that a washing step after seeding enhances CA quality for HeLA but is not necessary for MCF7. The described method provides a way to find optimal seeding protocols for cell lines intended to be used for the first time in CA.

  3. How flatbed scanners upset accurate film dosimetry.

    Science.gov (United States)

    van Battum, L J; Huizenga, H; Verdaasdonk, R M; Heukelom, S

    2016-01-21

    Film is an excellent dosimeter for verification of dose distributions due to its high spatial resolution. Irradiated film can be digitized with low-cost, transmission, flatbed scanners. However, a disadvantage is their lateral scan effect (LSE): a scanner readout change over its lateral scan axis. Although anisotropic light scattering was presented as the origin of the LSE, this paper presents an alternative cause. Hereto, LSE for two flatbed scanners (Epson 1680 Expression Pro and Epson 10000XL), and Gafchromic film (EBT, EBT2, EBT3) was investigated, focused on three effects: cross talk, optical path length and polarization. Cross talk was examined using triangular sheets of various optical densities. The optical path length effect was studied using absorptive and reflective neutral density filters with well-defined optical characteristics (OD range 0.2-2.0). Linear polarizer sheets were used to investigate light polarization on the CCD signal in absence and presence of (un)irradiated Gafchromic film. Film dose values ranged between 0.2 to 9 Gy, i.e. an optical density range between 0.25 to 1.1. Measurements were performed in the scanner's transmission mode, with red-green-blue channels. LSE was found to depend on scanner construction and film type. Its magnitude depends on dose: for 9 Gy increasing up to 14% at maximum lateral position. Cross talk was only significant in high contrast regions, up to 2% for very small fields. The optical path length effect introduced by film on the scanner causes 3% for pixels in the extreme lateral position. Light polarization due to film and the scanner's optical mirror system is the main contributor, different in magnitude for the red, green and blue channel. We concluded that any Gafchromic EBT type film scanned with a flatbed scanner will face these optical effects. Accurate dosimetry requires correction of LSE, therefore, determination of the LSE per color channel and dose delivered to the film.

  4. Important Nearby Galaxies without Accurate Distances

    Science.gov (United States)

    McQuinn, Kristen

    2014-10-01

    The Spitzer Infrared Nearby Galaxies Survey (SINGS) and its offspring programs (e.g., THINGS, HERACLES, KINGFISH) have resulted in a fundamental change in our view of star formation and the ISM in galaxies, and together they represent the most complete multi-wavelength data set yet assembled for a large sample of nearby galaxies. These great investments of observing time have been dedicated to the goal of understanding the interstellar medium, the star formation process, and, more generally, galactic evolution at the present epoch. Nearby galaxies provide the basis for which we interpret the distant universe, and the SINGS sample represents the best studied nearby galaxies.Accurate distances are fundamental to interpreting observations of galaxies. Surprisingly, many of the SINGS spiral galaxies have numerous distance estimates resulting in confusion. We can rectify this situation for 8 of the SINGS spiral galaxies within 10 Mpc at a very low cost through measurements of the tip of the red giant branch. The proposed observations will provide an accuracy of better than 0.1 in distance modulus. Our sample includes such well known galaxies as M51 (the Whirlpool), M63 (the Sunflower), M104 (the Sombrero), and M74 (the archetypal grand design spiral).We are also proposing coordinated parallel WFC3 UV observations of the central regions of the galaxies, rich with high-mass UV-bright stars. As a secondary science goal we will compare the resolved UV stellar populations with integrated UV emission measurements used in calibrating star formation rates. Our observations will complement the growing HST UV atlas of high resolution images of nearby galaxies.

  5. Accurate Holdup Calculations with Predictive Modeling & Data Integration

    Energy Technology Data Exchange (ETDEWEB)

    Azmy, Yousry [North Carolina State Univ., Raleigh, NC (United States). Dept. of Nuclear Engineering; Cacuci, Dan [Univ. of South Carolina, Columbia, SC (United States). Dept. of Mechanical Engineering

    2017-04-03

    In facilities that process special nuclear material (SNM) it is important to account accurately for the fissile material that enters and leaves the plant. Although there are many stages and processes through which materials must be traced and measured, the focus of this project is material that is “held-up” in equipment, pipes, and ducts during normal operation and that can accumulate over time into significant quantities. Accurately estimating the holdup is essential for proper SNM accounting (vis-à-vis nuclear non-proliferation), criticality and radiation safety, waste management, and efficient plant operation. Usually it is not possible to directly measure the holdup quantity and location, so these must be inferred from measured radiation fields, primarily gamma and less frequently neutrons. Current methods to quantify holdup, i.e. Generalized Geometry Holdup (GGH), primarily rely on simple source configurations and crude radiation transport models aided by ad hoc correction factors. This project seeks an alternate method of performing measurement-based holdup calculations using a predictive model that employs state-of-the-art radiation transport codes capable of accurately simulating such situations. Inverse and data assimilation methods use the forward transport model to search for a source configuration that best matches the measured data and simultaneously provide an estimate of the level of confidence in the correctness of such configuration. In this work the holdup problem is re-interpreted as an inverse problem that is under-determined, hence may permit multiple solutions. A probabilistic approach is applied to solving the resulting inverse problem. This approach rates possible solutions according to their plausibility given the measurements and initial information. This is accomplished through the use of Bayes’ Theorem that resolves the issue of multiple solutions by giving an estimate of the probability of observing each possible solution. To use

  6. Similarity of TIMSS Math and Science Achievement of Nations

    Directory of Open Access Journals (Sweden)

    Algirdas Zabulionis

    2001-09-01

    Full Text Available In 1991-97, the International Association for the Evaluation of Educational Achievement (IEA undertook a Third International Mathematics and Science Study (TIMSS in which data about the mathematics and science achievement of the thirteen year-old students in more than 40 countries were collected. These data provided the opportunity to search for patterns of students' answers to the test items: which group of items was relatively more difficult (or more easy for the students from a particular country (or group of countries. Using this massive data set an attempt was made to measure the similarities among country profiles of how students responded to the test items.

  7. Asthma and COPD: Differences and Similarities

    Science.gov (United States)

    ... and COPD: differences and similarities Share | Asthma and COPD: Differences and Similarities This article has been reviewed ... or you could have Chronic Obstructive Pulmonary Disease (COPD) , such as emphysema or chronic bronchitis. Because asthma ...

  8. How Users Search the Library from a Single Search Box

    Science.gov (United States)

    Lown, Cory; Sierra, Tito; Boyer, Josh

    2013-01-01

    Academic libraries are turning increasingly to unified search solutions to simplify search and discovery of library resources. Unfortunately, very little research has been published on library user search behavior in single search box environments. This study examines how users search a large public university library using a prominent, single…

  9. Image copy-move forgery detection based on polar cosine transform and approximate nearest neighbor searching.

    Science.gov (United States)

    Li, Yuenan

    2013-01-10

    Copy-move is one of the most commonly used image tampering operation, where a part of image content is copied and then pasted to another part of the same image. In order to make the forgery visually convincing and conceal its trace, the copied part may subject to post-processing operations such as rotation and blur. In this paper, we propose a polar cosine transform and approximate nearest neighbor searching based copy-move forgery detection algorithm. The algorithm starts by dividing the image into overlapping patches. Robust and compact features are extracted from patches by taking advantage of the rotationally-invariant and orthogonal properties of the polar cosine transform. Potential copy-move pairs are then detected by identifying the patches with similar features, which is formulated as approximate nearest neighbor searching and accomplished by means of locality-sensitive hashing (LSH). Finally, post-verifications are performed on potential pairs to filter out false matches and improve the accuracy of forgery detection. To sum up, the LSH based similar patch identification and the post-verification methods are two major novelties of the proposed work. Experimental results reveal that the proposed work can produce accurate detection results, and it exhibits high robustness to various post-processing operations. In addition, the LSH based similar patch detection scheme is much more effective than the widely used lexicographical sorting. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  10. Skewed Binary Search Trees

    DEFF Research Database (Denmark)

    Brodal, Gerth Stølting; Moruz, Gabriel

    2006-01-01

    It is well-known that to minimize the number of comparisons a binary search tree should be perfectly balanced. Previous work has shown that a dominating factor over the running time for a search is the number of cache faults performed, and that an appropriate memory layout of a binary search tree...... can reduce the number of cache faults by several hundred percent. Motivated by the fact that during a search branching to the left or right at a node does not necessarily have the same cost, e.g. because of branch prediction schemes, we in this paper study the class of skewed binary search trees....... For all nodes in a skewed binary search tree the ratio between the size of the left subtree and the size of the tree is a fixed constant (a ratio of 1/2 gives perfect balanced trees). In this paper we present an experimental study of various memory layouts of static skewed binary search trees, where each...

  11. Efficient Compressed Domain Target Image Search and Retrieval

    OpenAIRE

    Bracamonte, Javier; Ansorge, Michael; Pellandini, Fausto; Farine, Pierre-André

    2005-01-01

    In this paper we introduce a low complexity and accurate technique for target image search and retrieval. This method, which oper- ates directly in the compressed JPEG domain, addresses two of the CBIR challenges stated by The Benchathlon Network regarding the search of a specific image: finding out if an exact same image exists in a database, and identifying this occurrence even when the database image has been compressed with a different coding bit-rate. The proposed technique can be applie...

  12. JSC Search System Usability Case Study

    Science.gov (United States)

    Meza, David; Berndt, Sarah

    2014-01-01

    The advanced nature of "search" has facilitated the movement from keyword match to the delivery of every conceivable information topic from career, commerce, entertainment, learning... the list is infinite. At NASA Johnson Space Center (JSC ) the Search interface is an important means of knowledge transfer. By indexing multiple sources between directorates and organizations, the system's potential is culture changing in that through search, knowledge of the unique accomplishments in engineering and science can be seamlessly passed between generations. This paper reports the findings of an initial survey, the first of a four part study to help determine user sentiment on the intranet, or local (JSC) enterprise search environment as well as the larger NASA enterprise. The survey is a means through which end users provide direction on the development and transfer of knowledge by way of the search experience. The ideal is to identify what is working and what needs to be improved from the users' vantage point by documenting: (1) Where users are satisfied/dissatisfied (2) Perceived value of interface components (3) Gaps which cause any disappointment in search experience. The near term goal is it to inform JSC search in order to improve users' ability to utilize existing services and infrastructure to perform tasks with a shortened life cycle. Continuing steps include an agency based focus with modified questions to accomplish a similar purpose

  13. Interactive searching of facial image databases

    Science.gov (United States)

    Nicholls, Robert A.; Shepherd, John W.; Shepherd, Jean

    1995-09-01

    A set of psychological facial descriptors has been devised to enable computerized searching of criminal photograph albums. The descriptors have been used to encode image databased of up to twelve thousand images. Using a system called FACES, the databases are searched by translating a witness' verbal description into corresponding facial descriptors. Trials of FACES have shown that this coding scheme is more productive and efficient than searching traditional photograph albums. An alternative method of searching the encoded database using a genetic algorithm is currenly being tested. The genetic search method does not require the witness to verbalize a description of the target but merely to indicate a degree of similarity between the target and a limited selection of images from the database. The major drawback of FACES is that is requires a manual encoding of images. Research is being undertaken to automate the process, however, it will require an algorithm which can predict human descriptive values. Alternatives to human derived coding schemes exist using statistical classifications of images. Since databases encoded using statistical classifiers do not have an obvious direct mapping to human derived descriptors, a search method which does not require the entry of human descriptors is required. A genetic search algorithm is being tested for such a purpose.

  14. 7 CFR 51.1997 - Similar type.

    Science.gov (United States)

    2010-01-01

    ... 7 Agriculture 2 2010-01-01 2010-01-01 false Similar type. 51.1997 Section 51.1997 Agriculture Regulations of the Department of Agriculture AGRICULTURAL MARKETING SERVICE (Standards, Inspections, Marketing... Standards for Grades of Filberts in the Shell 1 Definitions § 51.1997 Similar type. Similar type means that...

  15. Molecular quantum similarity using conceptual DFT descriptors

    Indian Academy of Sciences (India)

    This paper reports a Molecular Quantum Similarity study for a set of congeneric steroid molecules, using as basic similarity descriptors electron density ρ (r), shape function (r), the Fukui functions +(r) and -(r) and local softness +(r) and -(r). Correlations are investigated between similarity indices for each couple of ...

  16. Data mining technique for fast retrieval of similar waveforms in Fusion massive databases

    Energy Technology Data Exchange (ETDEWEB)

    Vega, J. [Asociacion EURATOM/CIEMAT Para Fusion, Madrid (Spain)], E-mail: jesus.vega@ciemat.es; Pereira, A.; Portas, A. [Asociacion EURATOM/CIEMAT Para Fusion, Madrid (Spain); Dormido-Canto, S.; Farias, G.; Dormido, R.; Sanchez, J.; Duro, N. [Departamento de Informatica y Automatica, UNED, Madrid (Spain); Santos, M. [Departamento de Arquitectura de Computadores y Automatica, UCM, Madrid (Spain); Sanchez, E. [Asociacion EURATOM/CIEMAT Para Fusion, Madrid (Spain); Pajares, G. [Departamento de Arquitectura de Computadores y Automatica, UCM, Madrid (Spain)

    2008-01-15

    Fusion measurement systems generate similar waveforms for reproducible behavior. A major difficulty related to data analysis is the identification, in a rapid and automated way, of a set of discharges with comparable behaviour, i.e. discharges with 'similar' waveforms. Here we introduce a new technique for rapid searching and retrieval of 'similar' signals. The approach consists of building a classification system that avoids traversing the whole database looking for similarities. The classification system diminishes the problem dimensionality (by means of waveform feature extraction) and reduces the searching space to just the most probable 'similar' waveforms (clustering techniques). In the searching procedure, the input waveform is classified in any of the existing clusters. Then, a similarity measure is computed between the input signal and all cluster elements in order to identify the most similar waveforms. The inner product of normalized vectors is used as the similarity measure as it allows the searching process to be independent of signal gain and polarity. This development has been applied recently to TJ-II stellarator databases and has been integrated into its remote participation system.

  17. A New Method for Measuring Text Similarity in Learning Management Systems Using WordNet

    Science.gov (United States)

    Alkhatib, Bassel; Alnahhas, Ammar; Albadawi, Firas

    2014-01-01

    As text sources are getting broader, measuring text similarity is becoming more compelling. Automatic text classification, search engines and auto answering systems are samples of applications that rely on text similarity. Learning management systems (LMS) are becoming more important since electronic media is getting more publicly available. As…

  18. Using the Dual-Target Cost to Explore the Nature of Search Target Representations

    Science.gov (United States)

    Stroud, Michael J.; Menneer, Tamaryn; Cave, Kyle R.; Donnelly, Nick

    2012-01-01

    Eye movements were monitored to examine search efficiency and infer how color is mentally represented to guide search for multiple targets. Observers located a single color target very efficiently by fixating colors similar to the target. However, simultaneous search for 2 colors produced a dual-target cost. In addition, as the similarity between…

  19. Dual Target Search is Neither Purely Simultaneous nor Purely Successive.

    Science.gov (United States)

    Cave, Kyle R; Menneer, Tamaryn; Nomani, Mohammad S; Stroud, Michael J; Donnelly, Nick

    2017-08-31

    Previous research shows that visual search for two different targets is less efficient than search for a single target. Stroud, Menneer, Cave and Donnelly (2012) concluded that two target colours are represented separately based on modeling the fixation patterns. Although those analyses provide evidence for two separate target representations, they do not show whether participants search simultaneously for both targets, or first search for one target and then the other. Some studies suggest that multiple target representations are simultaneously active, while others indicate that search can be voluntarily simultaneous, or switching, or a mixture of both. Stroud et al.'s participants were not explicitly instructed to use any particular strategy. These data were revisited to determine which strategy was employed. Each fixated item was categorised according to whether its colour was more similar to one target or the other. Once an item similar to one target is fixated, the next fixated item is more likely to be similar to that target than the other, showing that at a given moment during search, one target is generally favoured. However, the search for one target is not completed before search for the other begins. Instead, there are often short runs of one or two fixations to distractors similar to one target, with each run followed by a switch to the other target. Thus, the results suggest that one target is more highly weighted than the other at any given time, but not to the extent that search is purely successive.

  20. GeoSearch: A lightweight broking middleware for geospatial resources discovery

    Science.gov (United States)

    Gui, Z.; Yang, C.; Liu, K.; Xia, J.

    2012-12-01

    With petabytes of geodata, thousands of geospatial web services available over the Internet, it is critical to support geoscience research and applications by finding the best-fit geospatial resources from the massive and heterogeneous resources. Past decades' developments witnessed the operation of many service components to facilitate geospatial resource management and discovery. However, efficient and accurate geospatial resource discovery is still a big challenge due to the following reasons: 1)The entry barriers (also called "learning curves") hinder the usability of discovery services to end users. Different portals and catalogues always adopt various access protocols, metadata formats and GUI styles to organize, present and publish metadata. It is hard for end users to learn all these technical details and differences. 2)The cost for federating heterogeneous services is high. To provide sufficient resources and facilitate data discovery, many registries adopt periodic harvesting mechanism to retrieve metadata from other federated catalogues. These time-consuming processes lead to network and storage burdens, data redundancy, and also the overhead of maintaining data consistency. 3)The heterogeneous semantics issues in data discovery. Since the keyword matching is still the primary search method in many operational discovery services, the search accuracy (precision and recall) is hard to guarantee. Semantic technologies (such as semantic reasoning and similarity evaluation) offer a solution to solve these issues. However, integrating semantic technologies with existing service is challenging due to the expandability limitations on the service frameworks and metadata templates. 4)The capabilities to help users make final selection are inadequate. Most of the existing search portals lack intuitive and diverse information visualization methods and functions (sort, filter) to present, explore and analyze search results. Furthermore, the presentation of the value

  1. Chemical-text hybrid search engines.

    Science.gov (United States)

    Zhou, Yingyao; Zhou, Bin; Jiang, Shumei; King, Frederick J

    2010-01-01

    As the amount of chemical literature increases, it is critical that researchers be enabled to accurately locate documents related to a particular aspect of a given compound. Existing solutions, based on text and chemical search engines alone, suffer from the inclusion of "false negative" and "false positive" results, and cannot accommodate diverse repertoire of formats currently available for chemical documents. To address these concerns, we developed an approach called Entity-Canonical Keyword Indexing (ECKI), which converts a chemical entity embedded in a data source into its canonical keyword representation prior to being indexed by text search engines. We implemented ECKI using Microsoft Office SharePoint Server Search, and the resultant hybrid search engine not only supported complex mixed chemical and keyword queries but also was applied to both intranet and Internet environments. We envision that the adoption of ECKI will empower researchers to pose more complex search questions that were not readily attainable previously and to obtain answers at much improved speed and accuracy.

  2. Modified harmony search

    Science.gov (United States)

    Mohamed, Najihah; Lutfi Amri Ramli, Ahmad; Majid, Ahmad Abd; Piah, Abd Rahni Mt

    2017-09-01

    A metaheuristic algorithm, called Harmony Search is quite highly applied in optimizing parameters in many areas. HS is a derivative-free real parameter optimization algorithm, and draws an inspiration from the musical improvisation process of searching for a perfect state of harmony. Propose in this paper Modified Harmony Search for solving optimization problems, which employs a concept from genetic algorithm method and particle swarm optimization for generating new solution vectors that enhances the performance of HS algorithm. The performances of MHS and HS are investigated on ten benchmark optimization problems in order to make a comparison to reflect the efficiency of the MHS in terms of final accuracy, convergence speed and robustness.

  3. ElasticSearch server

    CERN Document Server

    Rogozinski, Marek

    2014-01-01

    This book is a detailed, practical, hands-on guide packed with real-life scenarios and examples which will show you how to implement an ElasticSearch search engine on your own websites.If you are a web developer or a user who wants to learn more about ElasticSearch, then this is the book for you. You do not need to know anything about ElastiSeach, Java, or Apache Lucene in order to use this book, though basic knowledge about databases and queries is required.

  4. Entropy, Search, Complexity

    CERN Document Server

    Katona, Gyula O H; Tardos, Gábor

    2007-01-01

    The present volume is a collection of survey papers in the fields of entropy, search and complexity. They summarize the latest developments in their respective areas. More than half of the papers belong to search theory which lies on the borderline of mathematics and computer science, information theory and combinatorics, respectively. Search theory has variegated applications, among others in bioinformatics. Some of these papers also have links to linear statistics and communicational complexity. Further works survey the fundamentals of information theory and quantum source coding. The volume is recommended to experienced researchers as well as young scientists and students both in mathematics and computer science

  5. Are patients referred to rehabilitation diagnosed accurately?

    Science.gov (United States)

    Tederko, Piotr; Krasuski, Marek; Nyka, Izabella; Mycielski, Jerzy; Tarnacka, Beata

    2017-07-17

    An accurate diagnosis of the leading health condition and comorbidities is a prerequisite for safe and effective rehabilitation. The problem of diagnostic errors in physical and rehabilitation medicine (PRM) has not been addressed sufficiently. The responsibility of a referring physician is to determine indications and contraindications for rehabilitation. To assess the rate of and risk factors for inaccurate referral diagnoses (RD) in patients referred to a rehabilitation facility. We hypothesized that inaccurate RD would be more common in patients 1) referred by non-PRM physicians; 2) waiting longer for the admission; 3) older patients. Retrospective observational study. 1000 randomly selected patients admitted between 2012 and 2016 to a day- rehabilitation center (DRC). University DRC specialized in musculoskeletal diseases. On admission all cases underwent clinical verification of RD. Inappropriateness regarding primary diagnoses and comorbidities were noted. Influence of several factors affecting probability of inaccurate RD was analyzed with multiple binary regression model applied to 6 categories of diseases. The rate of inaccurate RD was 25.2%. Higher frequency of inaccurate RD was noted among patients referred by non-PRM specialists (30.3% vs 17.3% in cases referred by PRM specialists). Application of logit regression showed highly significant influence of the specialty of a referring physician on the odds of inaccurate RD (joint Wald test ch2(6)=38.98, p- value=0.000), controlling for the influence of other variables. This may reflect a suboptimal knowledge of the rehabilitation process and a tendency to neglect of comorbidities by non-PRM specialists. The rate of inaccurate RD did not correlate with time between referral and admission (joint Wald test of all odds ratios equal to 1, chi2(6)=5.62, p-value=0.467), however, mean and median waiting times were relatively short (35.7 and 25 days respectively).A high risk of overlooked multimorbidity was

  6. An automated method for accurate vessel segmentation

    Science.gov (United States)

    Yang, Xin; Liu, Chaoyue; Le Minh, Hung; Wang, Zhiwei; Chien, Aichi; (Tim Cheng, Kwang-Ting

    2017-05-01

    Vessel segmentation is a critical task for various medical applications, such as diagnosis assistance of diabetic retinopathy, quantification of cerebral aneurysm’s growth, and guiding surgery in neurosurgical procedures. Despite technology advances in image segmentation, existing methods still suffer from low accuracy for vessel segmentation in the two challenging while common scenarios in clinical usage: (1) regions with a low signal-to-noise-ratio (SNR), and (2) at vessel boundaries disturbed by adjacent non-vessel pixels. In this paper, we present an automated system which can achieve highly accurate vessel segmentation for both 2D and 3D images even under these challenging scenarios. Three key contributions achieved by our system are: (1) a progressive contrast enhancement method to adaptively enhance contrast of challenging pixels that were otherwise indistinguishable, (2) a boundary refinement method to effectively improve segmentation accuracy at vessel borders based on Canny edge detection, and (3) a content-aware region-of-interests (ROI) adjustment method to automatically determine the locations and sizes of ROIs which contain ambiguous pixels and demand further verification. Extensive evaluation of our method is conducted on both 2D and 3D datasets. On a public 2D retinal dataset (named DRIVE (Staal 2004 IEEE Trans. Med. Imaging 23 501-9)) and our 2D clinical cerebral dataset, our approach achieves superior performance to the state-of-the-art methods including a vesselness based method (Frangi 1998 Int. Conf. on Medical Image Computing and Computer-Assisted Intervention) and an optimally oriented flux (OOF) based method (Law and Chung 2008 European Conf. on Computer Vision). An evaluation on 11 clinical 3D CTA cerebral datasets shows that our method can achieve 94% average accuracy with respect to the manual segmentation reference, which is 23% to 33% better than the five baseline methods (Yushkevich 2006 Neuroimage 31 1116-28; Law and Chung 2008

  7. Efficient Set Similarity Joins Using Min-prefixes

    Science.gov (United States)

    Ribeiro, Leonardo A.; Härder, Theo

    Identification of all objects in a dataset whose similarity is not less than a specified threshold is of major importance for management, search, and analysis of data. Set similarity joins are commonly used to implement this operation; they scale to large datasets and are versatile to represent a variety of similarity notions. Most set similarity join methods proposed so far present two main phases at a high level of abstraction: candidate generation producing a set of candidate pairs and verification applying the actual similarity measure to the candidates and returning the correct answer. Previous work has primarily focused on the reduction of candidates, where candidate generation presented the major effort to obtain better pruning results. Here, we propose an opposite approach. We drastically decrease the computational cost of candidate generation by dynamically reducing the number of indexed objects at the expense of increasing the workload of the verification phase. Our experimental findings show that this trade-off is advantageous: we consistently achieve substantial speed-ups as compared to previous algorithms.

  8. A Quantum-Based Similarity Method in Virtual Screening

    Directory of Open Access Journals (Sweden)

    Mohammed Mumtaz Al-Dabbagh

    2015-10-01

    Full Text Available One of the most widely-used techniques for ligand-based virtual screening is similarity searching. This study adopted the concepts of quantum mechanics to present as state-of-the-art similarity method of molecules inspired from quantum theory. The representation of molecular compounds in mathematical quantum space plays a vital role in the development of quantum-based similarity approach. One of the key concepts of quantum theory is the use of complex numbers. Hence, this study proposed three various techniques to embed and to re-represent the molecular compounds to correspond with complex numbers format. The quantum-based similarity method that developed in this study depending on complex pure Hilbert space of molecules called Standard Quantum-Based (SQB. The recall of retrieved active molecules were at top 1% and top 5%, and significant test is used to evaluate our proposed methods. The MDL drug data report (MDDR, maximum unbiased validation (MUV and Directory of Useful Decoys (DUD data sets were used for experiments and were represented by 2D fingerprints. Simulated virtual screening experiment show that the effectiveness of SQB method was significantly increased due to the role of representational power of molecular compounds in complex numbers forms compared to Tanimoto benchmark similarity measure.

  9. Application II: Search Arts and Sports Photography

    Indian Academy of Sciences (India)

    First page Back Continue Last page Overview Graphics. Application II: Search Arts and Sports Photography. Depth variation induces defocus. Defocus blur is severe in nature and sports photography. Given a low DOF image, segment the focused region and retrieve the images based similarities in the focused region.

  10. A literature search tool for intelligent extraction of disease-associated genes.

    Science.gov (United States)

    Jung, Jae-Yoon; DeLuca, Todd F; Nelson, Tristan H; Wall, Dennis P

    2014-01-01

    To extract disorder-associated genes from the scientific literature in PubMed with greater sensitivity for literature-based support than existing methods. We developed a PubMed query to retrieve disorder-related, original research articles. Then we applied a rule-based text-mining algorithm with keyword matching to extract target disorders, genes with significant results, and the type of study described by the article. We compared our resulting candidate disorder genes and supporting references with existing databases. We demonstrated that our candidate gene set covers nearly all genes in manually curated databases, and that the references supporting the disorder-gene link are more extensive and accurate than other general purpose gene-to-disorder association databases. We implemented a novel publication search tool to find target articles, specifically focused on links between disorders and genotypes. Through comparison against gold-standard manually updated gene-disorder databases and comparison with automated databases of similar functionality we show that our tool can search through the entirety of PubMed to extract the main gene findings for human diseases rapidly and accurately.

  11. Search Engine Bias and the Demise of Search Engine Utopianism

    Science.gov (United States)

    Goldman, E.

    Due to search engines' automated operations, people often assume that search engines display search results neutrally and without bias. However, this perception is mistaken. Like any other media company, search engines affirmatively control their users' experiences, which has the consequence of skewing search results (a phenomenon called "search engine bias"). Some commentators believe that search engine bias is a defect requiring legislative correction. Instead, this chapter argues that search engine bias is the beneficial consequence of search engines optimizing content for their users. The chapter further argues that the most problematic aspect of search engine bias, the "winner-take-all" effect caused by top placement in search results, will be mooted by emerging personalized search technology.

  12. Contextual factors for finding similar experts

    NARCIS (Netherlands)

    Hofmann, K.; Balog, K.; Bogers, T.; de Rijke, M.

    2010-01-01

    Expertise-seeking research studies how people search for expertise and choose whom to contact in the context of a specific task. An important outcome are models that identify factors that influence expert finding. Expertise retrieval addresses the same problem, expert finding, but from a

  13. A fingerprint based metric for measuring similarities of crystalline structures.

    Science.gov (United States)

    Zhu, Li; Amsler, Maximilian; Fuhrer, Tobias; Schaefer, Bastian; Faraji, Somayeh; Rostami, Samare; Ghasemi, S Alireza; Sadeghi, Ali; Grauzinyte, Migle; Wolverton, Chris; Goedecker, Stefan

    2016-01-21

    Measuring similarities/dissimilarities between atomic structures is important for the exploration of potential energy landscapes. However, the cell vectors together with the coordinates of the atoms, which are generally used to describe periodic systems, are quantities not directly suitable as fingerprints to distinguish structures. Based on a characterization of the local environment of all atoms in a cell, we introduce crystal fingerprints that can be calculated easily and define configurational distances between crystalline structures that satisfy the mathematical properties of a metric. This distance between two configurations is a measure of their similarity/dissimilarity and it allows in particular to distinguish structures. The new method can be a useful tool within various energy landscape exploration schemes, such as minima hopping, random search, swarm intelligence algorithms, and high-throughput screenings.

  14. A fingerprint based metric for measuring similarities of crystalline structures

    Energy Technology Data Exchange (ETDEWEB)

    Zhu, Li; Fuhrer, Tobias; Schaefer, Bastian; Grauzinyte, Migle; Goedecker, Stefan, E-mail: stefan.goedecker@unibas.ch [Department of Physics, Universität Basel, Klingelbergstr. 82, 4056 Basel (Switzerland); Amsler, Maximilian [Department of Physics, Universität Basel, Klingelbergstr. 82, 4056 Basel (Switzerland); Department of Materials Science and Engineering, Northwestern University, Evanston, Illinois 60208 (United States); Faraji, Somayeh; Rostami, Samare; Ghasemi, S. Alireza [Institute for Advanced Studies in Basic Sciences, P.O. Box 45195-1159, Zanjan (Iran, Islamic Republic of); Sadeghi, Ali [Physics Department, Shahid Beheshti University, G. C., Evin, 19839 Tehran (Iran, Islamic Republic of); Wolverton, Chris [Department of Materials Science and Engineering, Northwestern University, Evanston, Illinois 60208 (United States)

    2016-01-21

    Measuring similarities/dissimilarities between atomic structures is important for the exploration of potential energy landscapes. However, the cell vectors together with the coordinates of the atoms, which are generally used to describe periodic systems, are quantities not directly suitable as fingerprints to distinguish structures. Based on a characterization of the local environment of all atoms in a cell, we introduce crystal fingerprints that can be calculated easily and define configurational distances between crystalline structures that satisfy the mathematical properties of a metric. This distance between two configurations is a measure of their similarity/dissimilarity and it allows in particular to distinguish structures. The new method can be a useful tool within various energy landscape exploration schemes, such as minima hopping, random search, swarm intelligence algorithms, and high-throughput screenings.

  15. Infrared image super-resolution via transformed self-similarity

    Science.gov (United States)

    Qi, Wei; Han, Jing; Zhang, Yi; Bai, Lian-fa

    2017-03-01

    Single image super-resolution is of great importance in computer vision. Various methods (e.g. learning methods) have been successfully developed in recent years. Despite the demonstrated success in the natural images, less research focuses on the infrared images. In this paper, we present a transformed self-similarity based super-resolution method without any learning priors, restore high-resolution infrared images from low-resolution ones. We exploit appearance similarity, dense error, and region covariances, and use the detected cues to guide the patch search process. We also add scale cue to consider local scale variations. We then present a compositional framework to simultaneously accommodate the four different cues. Experimental results demonstrate that our method performs better than previous methods, restores pleasant results, and high evaluate scores further show the effectiveness and robustness of our method for the infrared images.

  16. A fingerprint based metric for measuring similarities of crystalline structures

    Science.gov (United States)

    Zhu, Li; Amsler, Maximilian; Fuhrer, Tobias; Schaefer, Bastian; Faraji, Somayeh; Rostami, Samare; Ghasemi, S. Alireza; Sadeghi, Ali; Grauzinyte, Migle; Wolverton, Chris; Goedecker, Stefan

    2016-01-01

    Measuring similarities/dissimilarities between atomic structures is important for the exploration of potential energy landscapes. However, the cell vectors together with the coordinates of the atoms, which are generally used to describe periodic systems, are quantities not directly suitable as fingerprints to distinguish structures. Based on a characterization of the local environment of all atoms in a cell, we introduce crystal fingerprints that can be calculated easily and define configurational distances between crystalline structures that satisfy the mathematical properties of a metric. This distance between two configurations is a measure of their similarity/dissimilarity and it allows in particular to distinguish structures. The new method can be a useful tool within various energy landscape exploration schemes, such as minima hopping, random search, swarm intelligence algorithms, and high-throughput screenings.

  17. Branch length similarity entropy-based descriptors for shape representation

    Science.gov (United States)

    Kwon, Ohsung; Lee, Sang-Hee

    2017-11-01

    In previous studies, we showed that the branch length similarity (BLS) entropy profile could be successfully used for the shape recognition such as battle tanks, facial expressions, and butterflies. In the present study, we proposed new descriptors, roundness, symmetry, and surface roughness, for the recognition, which are more accurate and fast in the computation than the previous descriptors. The roundness represents how closely a shape resembles to a circle, the symmetry characterizes how much one shape is similar with another when the shape is moved in flip, and the surface roughness quantifies the degree of vertical deviations of a shape boundary. To evaluate the performance of the descriptors, we used the database of leaf images with 12 species. Each species consisted of 10 - 20 leaf images and the total number of images were 160. The evaluation showed that the new descriptors successfully discriminated the leaf species. We believe that the descriptors can be a useful tool in the field of pattern recognition.

  18. Chemical Search Web Utility

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Chemical Search Web Utility is an intuitive web application that allows the public to easily find the chemical that they are interested in using, and which...

  19. Searches at LEP

    CERN Document Server

    Junk, Tom

    2000-01-01

    Searches have been conducted for a broad range of new phenomena by the four experiments ALEPH, DELPHI, L3, and OPAL, at LEP2. Each experiment contributes approximately 150 pb-1 of e+e- annihilation data with a mean sqrt(s) of 205.9 GeV in 2000 to these searches (data prepared for the September 5 LEPC meeting). The statistical procedure for setting limits and evaluating the significance of excesses observed in the data is reviewed. Search results are presented for the Standard Model Higgs boson, the neutral Higgs bosons in the MSSM, charged Higgs bosons, invisibly decaying Higgs bosons produced by Higgs-strahlung, and fermiophobic Higgs bosons. Search results are briefly summarized for gauginos, stops, and staus. The photon recoil spectrum is checked for hints of new physics.

  20. Beyond the search process

    DEFF Research Database (Denmark)

    Hyldegård, Jette

    2009-01-01

    This paper reports on the findings from a longitudinal case study exploring Kuhlthau's information search process (ISP)-model in a group based academic setting. The research focus is on group members' activities and cognitive and emotional experiences during the task process of writing an assignm......This paper reports on the findings from a longitudinal case study exploring Kuhlthau's information search process (ISP)-model in a group based academic setting. The research focus is on group members' activities and cognitive and emotional experiences during the task process of writing...... seeking, the cognitive pattern associated with focus formulation and the tendency towards an increase in writing activities while searching activities decreased. Differences in behavior were also found, which were associated with contextual and social factors beyond the mere search process...

  1. The determination of accurate dipole polarizabilities alpha and gamma for the noble gases

    Science.gov (United States)

    Rice, Julia E.; Taylor, Peter R.; Lee, Timothy J.; Almlof, Jan

    1991-01-01

    Accurate static dipole polarizabilities alpha and gamma of the noble gases He through Xe were determined using wave functions of similar quality for each system. Good agreement with experimental data for the static polarizability gamma was obtained for Ne and Xe, but not for Ar and Kr. Calculations suggest that the experimental values for these latter ions are too low.

  2. Supersymmetry searches in ATLAS

    CERN Document Server

    Torro Pastor, Emma; The ATLAS collaboration

    2016-01-01

    Weak scale supersymmetry remains one of the best motivated and studied Standard Model extensions. This talk summarises recent ATLAS results for searches for supersymmetric (SUSY) particles. Weak and strong production in both R-Parity conserving and R-Parity violating SUSY scenarios are considered. The searches involved final states including jets, missing transverse momentum, light leptons, taus or photons, as well as long-lived particle signatures.

  3. Accelerated Evidence Search Report

    Science.gov (United States)

    2014-07-01

    Accelerated Evidence Search Report IMPORTANT INFORMATIVE STATEMENTS Accelerated Multi-Camera Evidence Search and Retrieval CSSP Project #: CSSP -2013...CD-1063 was supported by the Canadian Safety and Security Program ( CSSP ) which is led by Defence Research and Development Canada’s Centre for...Border Technology Division The CSSP is a federally-funded program to strengthen Canada’s ability to anticipate, prevent/mitigate, prepare for, respond to

  4. Accurate prediction of peptide binding sites on protein surfaces.

    Directory of Open Access Journals (Sweden)

    Evangelia Petsalaki

    2009-03-01

    Full Text Available Many important protein-protein interactions are mediated by the binding of a short peptide stretch in one protein to a large globular segment in another. Recent efforts have provided hundreds of examples of new peptides binding to proteins for which a three-dimensional structure is available (either known experimentally or readily modeled but where no structure of the protein-peptide complex is known. To address this gap, we present an approach that can accurately predict peptide binding sites on protein surfaces. For peptides known to bind a particular protein, the method predicts binding sites with great accuracy, and the specificity of the approach means that it can also be used to predict whether or not a putative or predicted peptide partner will bind. We used known protein-peptide complexes to derive preferences, in the form of spatial position specific scoring matrices, which describe the binding-site environment in globular proteins for each type of amino acid in bound peptides. We then scan the surface of a putative binding protein for sites for each of the amino acids present in a peptide partner and search for combinations of high-scoring amino acid sites that satisfy constraints deduced from the peptide sequence. The method performed well in a benchmark and largely agreed with experimental data mapping binding sites for several recently discovered interactions mediated by peptides, including RG-rich proteins with SMN domains, Epstein-Barr virus LMP1 with TRADD domains, DBC1 with Sir2, and the Ago hook with Argonaute PIWI domain. The method, and associated statistics, is an excellent tool for predicting and studying binding sites for newly discovered peptides mediating critical events in biology.

  5. Toward accurate and precise estimates of lion density.

    Science.gov (United States)

    Elliot, Nicholas B; Gopalaswamy, Arjun M

    2017-08-01

    Reliable estimates of animal density are fundamental to understanding ecological processes and population dynamics. Furthermore, their accuracy is vital to conservation because wildlife authorities rely on estimates to make decisions. However, it is notoriously difficult to accurately estimate density for wide-ranging carnivores that occur at low densities. In recent years, significant progress has been made in density estimation of Asian carnivores, but the methods have not been widely adapted to African carnivores, such as lions (Panthera leo). Although abundance indices for lions may produce poor inferences, they continue to be used to estimate density and inform management and policy. We used sighting data from a 3-month survey and adapted a Bayesian spatially explicit capture-recapture (SECR) model to estimate spatial lion density in the Maasai Mara National Reserve and surrounding conservancies in Kenya. Our unstructured spatial capture-recapture sampling design incorporated search effort to explicitly estimate detection probability and density on a fine spatial scale, making our approach robust in the context of varying detection probabilities. Overall posterior mean lion density was estimated to be 17.08 (posterior SD 1.310) lions >1 year old/100 km 2 , and the sex ratio was estimated at 2.2 females to 1 male. Our modeling framework and narrow posterior SD demonstrate that SECR methods can produce statistically rigorous and precise estimates of population parameters, and we argue that they should be favored over less reliable abundance indices. Furthermore, our approach is flexible enough to incorporate different data types, which enables robust population estimates over relatively short survey periods in a variety of systems. Trend analyses are essential to guide conservation decisions but are frequently based on surveys of differing reliability. We therefore call for a unified framework to assess lion numbers in key populations to improve management and

  6. Testing Self-Similarity Through Lamperti Transformations

    KAUST Repository

    Lee, Myoungji

    2016-07-14

    Self-similar processes have been widely used in modeling real-world phenomena occurring in environmetrics, network traffic, image processing, and stock pricing, to name but a few. The estimation of the degree of self-similarity has been studied extensively, while statistical tests for self-similarity are scarce and limited to processes indexed in one dimension. This paper proposes a statistical hypothesis test procedure for self-similarity of a stochastic process indexed in one dimension and multi-self-similarity for a random field indexed in higher dimensions. If self-similarity is not rejected, our test provides a set of estimated self-similarity indexes. The key is to test stationarity of the inverse Lamperti transformations of the process. The inverse Lamperti transformation of a self-similar process is a strongly stationary process, revealing a theoretical connection between the two processes. To demonstrate the capability of our test, we test self-similarity of fractional Brownian motions and sheets, their time deformations and mixtures with Gaussian white noise, and the generalized Cauchy family. We also apply the self-similarity test to real data: annual minimum water levels of the Nile River, network traffic records, and surface heights of food wrappings. © 2016, International Biometric Society.

  7. Similarity increases altruistic punishment in humans.

    Science.gov (United States)

    Mussweiler, Thomas; Ockenfels, Axel

    2013-11-26

    Humans are attracted to similar others. As a consequence, social networks are homogeneous in sociodemographic, intrapersonal, and other characteristics--a principle called homophily. Despite abundant evidence showing the importance of interpersonal similarity and homophily for human relationships, their behavioral correlates and cognitive foundations are poorly understood. Here, we show that perceived similarity substantially increases altruistic punishment, a key mechanism underlying human cooperation. We induced (dis)similarity perception by manipulating basic cognitive mechanisms in an economic cooperation game that included a punishment phase. We found that similarity-focused participants were more willing to punish others' uncooperative behavior. This influence of similarity is not explained by group identity, which has the opposite effect on altruistic punishment. Our findings demonstrate that pure similarity promotes reciprocity in ways known to encourage cooperation. At the same time, the increased willingness to punish norm violations among similarity-focused participants provides a rationale for why similar people are more likely to build stable social relationships. Finally, our findings show that altruistic punishment is differentially involved in encouraging cooperation under pure similarity vs. in-group conditions.

  8. Supporting Book Search

    DEFF Research Database (Denmark)

    Bogers, Toine; Petras, Vivien

    2017-01-01

    Book search is far from a solved problem. Complex information needs often go beyond bibliographic facts and cover a combination of different aspects, such as specific genres or plot elements, engagement or novelty. Conventional book metadata may not be sufficient to address these kinds of informa......Book search is far from a solved problem. Complex information needs often go beyond bibliographic facts and cover a combination of different aspects, such as specific genres or plot elements, engagement or novelty. Conventional book metadata may not be sufficient to address these kinds......-depth analysis of topical metadata, comparing controlled vocabularies with social tags. Tags perform better overall in this setting, but controlled vocabulary terms provide complementary information, which will improve a search. We analyze potential underlying factors that contribute to search performance......, such as the relevance aspect(s) mentioned in a request or the type of book. In addition, we investigate the possible causes of search failure. We conclude that neither tags nor controlled vocabularies are wholly suited to handling the complex information needs in book search, which means that different approaches...

  9. Similarity boosted quantitative structure-activity relationship--a systematic study of enhancing structural descriptors by molecular similarity.

    Science.gov (United States)

    Girschick, Tobias; Almeida, Pedro R; Kramer, Stefan; Stålring, Jonna

    2013-05-24

    The concept of molecular similarity is one of the most central in the fields of predictive toxicology and quantitative structure-activity relationship (QSAR) research. Many toxicological responses result from a multimechanistic process and, consequently, structural diversity among the active compounds is likely. Combining this knowledge, we introduce similarity boosted QSAR modeling, where we calculate molecular descriptors using similarities with respect to representative reference compounds to aid a statistical learning algorithm in distinguishing between different structural classes. We present three approaches for the selection of reference compounds, one by literature search and two by clustering. Our experimental evaluation on seven publicly available data sets shows that the similarity descriptors used on their own perform quite well compared to structural descriptors. We show that the combination of similarity and structural descriptors enhances the performance and that a simple stacking approach is able to use the complementary information encoded by the different descriptor sets to further improve predictive results. All software necessary for our experiments is available within the cheminformatics software framework AZOrange.

  10. User Oriented Trajectory Search for Trip Recommendation

    KAUST Repository

    Ding, Ruogu

    2012-07-08

    Trajectory sharing and searching have received significant attention in recent years. In this thesis, we propose and investigate the methods to find and recommend the best trajectory to the traveler, and mainly focus on a novel technique named User Oriented Trajectory Search (UOTS) query processing. In contrast to conventional trajectory search by locations (spatial domain only), we consider both spatial and textual domains in the new UOTS query. Given a trajectory data set, the query input contains a set of intended places given by the traveler and a set of textual attributes describing the traveler’s preference. If a trajectory is connecting/close to the specified query locations, and the textual attributes of the trajectory are similar to the traveler’s preference, it will be recommended to the traveler. This type of queries can enable many popular applications such as trip planning and recommendation. There are two challenges in UOTS query processing, (i) how to constrain the searching range in two domains and (ii) how to schedule multiple query sources effectively. To overcome the challenges and answer the UOTS query efficiently, a novel collaborative searching approach is developed. Conceptually, the UOTS query processing is conducted in the spatial and textual domains alternately. A pair of upper and lower bounds are devised to constrain the searching range in two domains. In the meantime, a heuristic searching strategy based on priority ranking is adopted for scheduling the multiple query sources, which can further reduce the searching range and enhance the query efficiency notably. Furthermore, the devised collaborative searching approach can be extended to situations where the query locations are ordered. Extensive experiments are conducted on both real and synthetic trajectory data in road networks. Our approach is verified to be effective in reducing both CPU time and disk I/O time.

  11. User oriented trajectory search for trip recommendation

    KAUST Repository

    Shang, Shuo

    2012-01-01

    Trajectory sharing and searching have received significant attentions in recent years. In this paper, we propose and investigate a novel problem called User Oriented Trajectory Search (UOTS) for trip recommendation. In contrast to conventional trajectory search by locations (spatial domain only), we consider both spatial and textual domains in the new UOTS query. Given a trajectory data set, the query input contains a set of intended places given by the traveler and a set of textual attributes describing the traveler\\'s preference. If a trajectory is connecting/close to the specified query locations, and the textual attributes of the trajectory are similar to the traveler\\'e preference, it will be recommended to the traveler for reference. This type of queries can bring significant benefits to travelers in many popular applications such as trip planning and recommendation. There are two challenges in the UOTS problem, (i) how to constrain the searching range in two domains and (ii) how to schedule multiple query sources effectively. To overcome the challenges and answer the UOTS query efficiently, a novel collaborative searching approach is developed. Conceptually, the UOTS query processing is conducted in the spatial and textual domains alternately. A pair of upper and lower bounds are devised to constrain the searching range in two domains. In the meantime, a heuristic searching strategy based on priority ranking is adopted for scheduling the multiple query sources, which can further reduce the searching range and enhance the query efficiency notably. Furthermore, the devised collaborative searching approach can be extended to situations where the query locations are ordered. The performance of the proposed UOTS query is verified by extensive experiments based on real and synthetic trajectory data in road networks. © 2012 ACM.

  12. Routine development of objectively derived search strategies

    Directory of Open Access Journals (Sweden)

    Hausner Elke

    2012-02-01

    Full Text Available Abstract Background Over the past few years, information retrieval has become more and more professionalized, and information specialists are considered full members of a research team conducting systematic reviews. Research groups preparing systematic reviews and clinical practice guidelines have been the driving force in the development of search strategies, but open questions remain regarding the transparency of the development process and the available resources. An empirically guided approach to the development of a search strategy provides a way to increase transparency and efficiency. Methods Our aim in this paper is to describe the empirically guided development process for search strategies as applied by the German Institute for Quality and Efficiency in Health Care (Institut für Qualität und Wirtschaftlichkeit im Gesundheitswesen, or "IQWiG". This strategy consists of the following steps: generation of a test set, as well as the development, validation and standardized documentation of the search strategy. Results We illustrate our approach by means of an example, that is, a search for literature on brachytherapy in patients with prostate cancer. For this purpose, a test set was generated, including a total of 38 references from 3 systematic reviews. The development set for the generation of the strategy included 25 references. After application of textual analytic procedures, a strategy was developed that included all references in the development set. To test the search strategy on an independent set of references, the remaining 13 references in the test set (the validation set were used. The validation set was also completely identified. Discussion Our conclusion is that an objectively derived approach similar to that used in search filter development is a feasible way to develop and validate reliable search strategies. Besides creating high-quality strategies, the widespread application of this approach will result in a

  13. Search Tips: MedlinePlus

    Science.gov (United States)

    ... of this page: https://medlineplus.gov/searchtips.html Search Tips To use the sharing features on this page, please enable JavaScript. How do I search MedlinePlus? The search box appears at the top ...

  14. Axion Searches, Old and New

    CERN Multimedia

    CERN. Geneva

    2005-01-01

    Outline of the lecture: Constraints from laboratory searches and astrophysics, axion cosmology, the cavity detector of dark matter axions, solar axion searches, laser experiments, a telescope search, macroscopic forces mediated by axions.

  15. Natural Language Search Interfaces: Health Data Needs Single-Field Variable Search.

    Science.gov (United States)

    Jay, Caroline; Harper, Simon; Dunlop, Ian; Smith, Sam; Sufi, Shoaib; Goble, Carole; Buchan, Iain

    2016-01-14

    Data discovery, particularly the discovery of key variables and their inter-relationships, is key to secondary data analysis, and in-turn, the evolving field of data science. Interface designers have presumed that their users are domain experts, and so they have provided complex interfaces to support these "experts." Such interfaces hark back to a time when searches needed to be accurate first time as there was a high computational cost associated with each search. Our work is part of a governmental research initiative between the medical and social research funding bodies to improve the use of social data in medical research. The cross-disciplinary nature of data science can make no assumptions regarding the domain expertise of a particular scientist, whose interests may intersect multiple domains. Here we consider the common requirement for scientists to seek archived data for secondary analysis. This has more in common with search needs of the "Google generation" than with their single-domain, single-tool forebears. Our study compares a Google-like interface with traditional ways of searching for noncomplex health data in a data archive. Two user interfaces are evaluated for the same set of tasks in extracting data from surveys stored in the UK Data Archive (UKDA). One interface, Web search, is "Google-like," enabling users to browse, search for, and view metadata about study variables, whereas the other, traditional search, has standard multioption user interface. Using a comprehensive set of tasks with 20 volunteers, we found that the Web search interface met data discovery needs and expectations better than the traditional search. A task × interface repeated measures analysis showed a main effect indicating that answers found through the Web search interface were more likely to be correct (F1,19=37.3, Pserendipity as part of the refinement. The results provide clear evidence that data science should adopt single-field natural language search interfaces for

  16. Bridging Database Applications and Declarative Similarity Matching

    OpenAIRE

    Andrade Ribeiro, Leonardo; Schneider, Natália Cristina; de Souza Inácio, Andrei; Wagner, Harley Michel; von Wangenheim, Aldo

    2017-01-01

    Effective manipulation of string data is of fundamental importance to modern database applications. Very often, textual inconsistencies render equality comparisons meaningless and strings have to be matched in terms of their similarity. Previous work has proposed techniques to express similarity operations using declarative SQL statements. However, the non-trivial issue of embedding similarity support into object-oriented applications has received little attention. Particularly, declarative s...

  17. Integrating Triangle and Jaccard similarities for recommendation

    Science.gov (United States)

    Sun, Shuang-Bo; Zhang, Zhi-Heng; Dong, Xin-Ling; Zhang, Heng-Ru; Li, Tong-Jun; Zhang, Lin

    2017-01-01

    This paper proposes a new measure for recommendation through integrating Triangle and Jaccard similarities. The Triangle similarity considers both the length and the angle of rating vectors between them, while the Jaccard similarity considers non co-rating users. We compare the new similarity measure with eight state-of-the-art ones on four popular datasets under the leave-one-out scenario. Results show that the new measure outperforms all the counterparts in terms of the mean absolute error and the root mean square error. PMID:28817692

  18. An advanced search engine for patent analytics in medicinal chemistry.

    Science.gov (United States)

    Pasche, Emilie; Gobeill, Julien; Teodoro, Douglas; Gaudinat, Arnaud; Vishnykova, Dina; Lovis, Christian; Ruch, Patrick

    2012-01-01

    Patent collections contain an important amount of medical-related knowledge, but existing tools were reported to lack of useful functionalities. We present here the development of TWINC, an advanced search engine dedicated to patent retrieval in the domain of health and life sciences. Our tool embeds two search modes: an ad hoc search to retrieve relevant patents given a short query and a related patent search to retrieve similar patents given a patent. Both search modes rely on tuning experiments performed during several patent retrieval competitions. Moreover, TWINC is enhanced with interactive modules, such as chemical query expansion, which is of prior importance to cope with various ways of naming biomedical entities. While the related patent search showed promising performances, the ad-hoc search resulted in fairly contrasted results. Nonetheless, TWINC performed well during the Chemathlon task of the PatOlympics competition and experts appreciated its usability.

  19. SEARCHES FOR SUPERSYMMETRY IN ATLAS

    CERN Document Server

    Xu, Da; The ATLAS collaboration

    2017-01-01

    A wide range of supersymmetric searches are presented. All searches are based on the proton- proton collision dataset collected by the ATLAS experiment during the 2015 and 2016 (before summer) run with a center-of-mass energy of 13 TeV, corresponding to an integrated lumi- nosity of 36.1 (36.7) fb-1. The searches are categorized into inclusive gluino and squark search, third generation search, electroweak search, prompt RPV search and long-lived par- ticle search. No evidence of new physics is observed. The results are intepreted in various models and expressed in terms of limits on the masses of new particles.

  20. Haunted by a doppelgänger: irrelevant facial similarity affects rule-based judgments.

    Science.gov (United States)

    von Helversen, Bettina; Herzog, Stefan M; Rieskamp, Jörg

    2014-01-01

    Judging other people is a common and important task. Every day professionals make decisions that affect the lives of other people when they diagnose medical conditions, grant parole, or hire new employees. To prevent discrimination, professional standards require that decision makers render accurate and unbiased judgments solely based on relevant information. Facial similarity to previously encountered persons can be a potential source of bias. Psychological research suggests that people only rely on similarity-based judgment strategies if the provided information does not allow them to make accurate rule-based judgments. Our study shows, however, that facial similarity to previously encountered persons influences judgment even in situations in which relevant information is available for making accurate rule-based judgments and where similarity is irrelevant for the task and relying on similarity is detrimental. In two experiments in an employment context we show that applicants who looked similar to high-performing former employees were judged as more suitable than applicants who looked similar to low-performing former employees. This similarity effect was found despite the fact that the participants used the relevant résumé information about the applicants by following a rule-based judgment strategy. These findings suggest that similarity-based and rule-based processes simultaneously underlie human judgment.

  1. Scaling, similarity, and the fourth paradigm for hydrology

    Directory of Open Access Journals (Sweden)

    C. D. Peters-Lidard

    2017-07-01

    Full Text Available In this synthesis paper addressing hydrologic scaling and similarity, we posit that roadblocks in the search for universal laws of hydrology are hindered by our focus on computational simulation (the third paradigm and assert that it is time for hydrology to embrace a fourth paradigm of data-intensive science. Advances in information-based hydrologic science, coupled with an explosion of hydrologic data and advances in parameter estimation and modeling, have laid the foundation for a data-driven framework for scrutinizing hydrological scaling and similarity hypotheses. We summarize important scaling and similarity concepts (hypotheses that require testing; describe a mutual information framework for testing these hypotheses; describe boundary condition, state, flux, and parameter data requirements across scales to support testing these hypotheses; and discuss some challenges to overcome while pursuing the fourth hydrological paradigm. We call upon the hydrologic sciences community to develop a focused effort towards adopting the fourth paradigm and apply this to outstanding challenges in scaling and similarity.

  2. Monte-Carlo Tree Search for Poly-Y

    NARCIS (Netherlands)

    Wevers, L.; te Brinke, Steven

    2014-01-01

    Monte-Carlo tree search (MCTS) is a heuristic search algorithm that has recently been very successful in the games of Go and Hex. In this paper, we describe an MCTS player for the game of Poly-Y, which is a connection game similar to Hex. Our player won the CodeCup 2014 AI programming competition.

  3. Improved Gravitational Search Algorithm (GSA Using Fuzzy Logic

    Directory of Open Access Journals (Sweden)

    Omid Mokhlesi

    2013-04-01

    Full Text Available Researchers tendency to use different collective intelligence as the search methods to optimize complex engineering problems has increased because of the high performance of this algorithms. Gravitational search algorithm (GSA is among these algorithms. This algorithm is inspired by Newton's laws of physics and gravitational attraction. Random masses are agents who have searched for the space. This paper presents a new Fuzzy Population GSA model called FPGSA. The proposed method is a combination of parametric fuzzy controller and gravitational search algorithm. The space being searched using this combined reasonable and accurate method. In the collective intelligence algorithms, population size influences the final answer so that for a large population, a better response is obtained but the algorithm execution time is longer. To overcome this problem, a new parameter called the dispersion coefficient is added to the algorithm. Implementation results show that by controlling this factor, system performance can be improved.

  4. Forecasting influenza outbreak dynamics in Melbourne from Internet search query surveillance data.

    Science.gov (United States)

    Moss, Robert; Zarebski, Alexander; Dawson, Peter; McCaw, James M

    2016-07-01

    Accurate forecasting of seasonal influenza epidemics is of great concern to healthcare providers in temperate climates, as these epidemics vary substantially in their size, timing and duration from year to year, making it a challenge to deliver timely and proportionate responses. Previous studies have shown that Bayesian estimation techniques can accurately predict when an influenza epidemic will peak many weeks in advance, using existing surveillance data, but these methods must be tailored both to the target population and to the surveillance system. Our aim was to evaluate whether forecasts of similar accuracy could be obtained for metropolitan Melbourne (Australia). We used the bootstrap particle filter and a mechanistic infection model to generate epidemic forecasts for metropolitan Melbourne (Australia) from weekly Internet search query surveillance data reported by Google Flu Trends for 2006-14. Optimal observation models were selected from hundreds of candidates using a novel approach that treats forecasts akin to receiver operating characteristic (ROC) curves. We show that the timing of the epidemic peak can be accurately predicted 4-6 weeks in advance, but that the magnitude of the epidemic peak and the overall burden are much harder to predict. We then discuss how the infection and observation models and the filtering process may be refined to improve forecast robustness, thereby improving the utility of these methods for healthcare decision support. © 2016 The Authors. Influenza and Other Respiratory Viruses Published by John Wiley & Sons Ltd.

  5. Mining Diagnostic Assessment Data for Concept Similarity

    Science.gov (United States)

    Madhyastha, Tara; Hunt, Earl

    2009-01-01

    This paper introduces a method for mining multiple-choice assessment data for similarity of the concepts represented by the multiple choice responses. The resulting similarity matrix can be used to visualize the distance between concepts in a lower-dimensional space. This gives an instructor a visualization of the relative difficulty of concepts…

  6. Similarity Structure of Wave-Collapse

    DEFF Research Database (Denmark)

    Rypdal, Kristoffer; Juul Rasmussen, Jens; Thomsen, Kenneth

    1985-01-01

    Similarity transformations of the cubic Schrödinger equation (CSE) are investigated. The transformations are used to remove the explicit time variation in the CSE and reduce it to differential equations in the spatial variables only. Two different methods for similarity reduction are employed and...

  7. Similarity indices I: what do they measure.

    Energy Technology Data Exchange (ETDEWEB)

    Johnston, J.W.

    1976-11-01

    A method for estimating the effects of environmental effusions on ecosystems is described. The characteristics of 25 similarity indices used in studies of ecological communities were investigated. The type of data structure, to which these indices are frequently applied, was described as consisting of vectors of measurements on attributes (species) observed in a set of samples. A general similarity index was characterized as the result of a two-step process defined on a pair of vectors. In the first step an attribute similarity score is obtained for each attribute by comparing the attribute values observed in the pair of vectors. The result is a vector of attribute similarity scores. These are combined in the second step to arrive at the similarity index. The operation in the first step was characterized as a function, g, defined on pairs of attribute values. The second operation was characterized as a function, F, defined on the vector of attribute similarity scores from the first step. Usually, F was a simple sum or weighted sum of the attribute similarity scores. It is concluded that similarity indices should not be used as the test statistic to discriminate between two ecological communities.

  8. Some Effects of Similarity Self-Disclosure

    Science.gov (United States)

    Murphy, Kevin C.; Strong, Stanley R.

    1972-01-01

    College males were interviewed about how college had altered their friendships, values, and plans. The interviewers diclosed experiences and feelings similar to those revealed by the students. Results support Byrne's Law of Similarity in generating interpersonal attraction in the interview and suggest that the timing of self-disclosures is…

  9. Correlation between social proximity and mobility similarity.

    Science.gov (United States)

    Fan, Chao; Liu, Yiding; Huang, Junming; Rong, Zhihai; Zhou, Tao

    2017-09-20

    Human behaviors exhibit ubiquitous correlations in many aspects, such as individual and collective levels, temporal and spatial dimensions, content, social and geographical layers. With rich Internet data of online behaviors becoming available, it attracts academic interests to explore human mobility similarity from the perspective of social network proximity. Existent analysis shows a strong correlation between online social proximity and offline mobility similarity, namely, mobile records between friends are significantly more similar than between strangers, and those between friends with common neighbors are even more similar. We argue the importance of the number and diversity of common friends, with a counter intuitive finding that the number of common friends has no positive impact on mobility similarity while the diversity plays a key role, disagreeing with previous studies. Our analysis provides a novel view for better understanding the coupling between human online and offline behaviors, and will help model and predict human behaviors based on social proximity.

  10. Similarity and a Duality for Fullerenes

    Directory of Open Access Journals (Sweden)

    Jennifer J. Edmond

    2015-11-01

    Full Text Available Fullerenes are molecules of carbon that are modeled by trivalent plane graphs with only pentagonal and hexagonal faces. Scaling up a fullerene gives a notion of similarity, and fullerenes are partitioned into similarity classes. In this expository article, we illustrate how the values of two important fullerene parameters can be deduced for all fullerenes in a similarity class by computing the values of these parameters for just the three smallest representatives of that class. In addition, it turns out that there is a natural duality theory for similarity classes of fullerenes based on one of the most important fullerene construction techniques: leapfrog construction. The literature on fullerenes is very extensive, and since this is a general interest journal, we will summarize and illustrate the fundamental results that we will need to develop similarity and this duality.

  11. SearchResultFinder: federated search made easy

    NARCIS (Netherlands)

    Trieschnigg, Rudolf Berend; Tjin-Kam-Jet, Kien; Hiemstra, Djoerd

    Building a federated search engine based on a large number existing web search engines is a challenge: implementing the programming interface (API) for each search engine is an exacting and time-consuming job. In this demonstration we present SearchResultFinder, a browser plugin which speeds up

  12. Market Dominance and Search Quality in the Search Engine Market

    NARCIS (Netherlands)

    Lianos, I.; Motchenkova, E.I.

    2013-01-01

    We analyze a search engine market from a law and economics perspective and incorporate the choice of quality-improving innovations by a search engine platform in a two-sided model of Internet search engine. In the proposed framework, we first discuss the legal issues the search engine market raises

  13. IBRI-CASONTO: Ontology-based semantic search engine

    Directory of Open Access Journals (Sweden)

    Awny Sayed

    2017-11-01

    Full Text Available The vast availability of information, that added in a very fast pace, in the data repositories creates a challenge in extracting correct and accurate information. Which has increased the competition among developers in order to gain access to technology that seeks to understand the intent researcher and contextual meaning of terms. While the competition for developing an Arabic Semantic Search systems are still in their infancy, and the reason could be traced back to the complexity of Arabic Language. It has a complex morphological, grammatical and semantic aspects, as it is a highly inflectional and derivational language. In this paper, we try to highlight and present an Ontological Search Engine called IBRI-CASONTO for Colleges of Applied Sciences, Oman. Our proposed engine supports both Arabic and English language. It is also employed two types of search which are a keyword-based search and a semantics-based search. IBRI-CASONTO is based on different technologies such as Resource Description Framework (RDF data and Ontological graph. The experiments represent in two sections, first it shows a comparison among Entity-Search and the Classical-Search inside the IBRI-CASONTO itself, second it compares the Entity-Search of IBRI-CASONTO with currently used search engines, such as Kngine, Wolfram Alpha and the most popular engine nowadays Google, in order to measure their performance and efficiency.

  14. Using Clinicians’ Search Query Data to Monitor Influenza Epidemics

    Science.gov (United States)

    Santillana, Mauricio; Nsoesie, Elaine O.; Mekaru, Sumiko R.; Scales, David; Brownstein, John S.

    2014-01-01

    Search query information from a clinician's database, UpToDate, is shown to predict influenza epidemics in the United States in a timely manner. Our results show that digital disease surveillance tools based on experts' databases may be able to provide an alternative, reliable, and stable signal for accurate predictions of influenza outbreaks. PMID:25115873

  15. Fast and accurate hashing via iterative nearest neighbors expansion.

    Science.gov (United States)

    Jin, Zhongming; Zhang, Debing; Hu, Yao; Lin, Shiding; Cai, Deng; He, Xiaofei

    2014-11-01

    Recently, the hashing techniques have been widely applied to approximate the nearest neighbor search problem in many real applications. The basic idea of these approaches is to generate binary codes for data points which can preserve the similarity between any two of them. Given a query, instead of performing a linear scan of the entire data base, the hashing method can perform a linear scan of the points whose hamming distance to the query is not greater than rh , where rh is a constant. However, in order to find the true nearest neighbors, both the locating time and the linear scan time are proportional to O(∑i=0(rh)(c || i)) ( c is the code length), which increase exponentially as rh increases. To address this limitation, we propose a novel algorithm named iterative expanding hashing in this paper, which builds an auxiliary index based on an offline constructed nearest neighbor table to avoid large rh . This auxiliary index can be easily combined with all the traditional hashing methods. Extensive experimental results over various real large-scale datasets demonstrate the superiority of the proposed approach.

  16. PSnAP: Accurate Synthetic Address Streams through Memory Profiles

    Science.gov (United States)

    Olschanowsky, Catherine Mills; Tikir, Mustafa M.; Carrington, Laura; Snavely, Allan

    Memory address traces are an important information source; they drive memory simulations for performance modeling, systems design and application tuning. For long running applications, the direct use of an address trace is complicated by its size. Previous attempts to reduce trace size incurred a substantial penalty with respect to trace accuracy. We propose a novel method of memory profiling that enables the generation of highly accurate synthetic traces with space requirements typically under 1% of the original traces. We demonstrate the synthetic trace accuracy in terms of cache hit rates, spatial-temporal locality scores and locality surfaces. Simulated cache hit rates from synthetic traces are within 3.5% of observed and on average are within 1.0% for L1 cache. Our profiles are on average 60 times smaller than compressed traces. The combination of small profile sizes and high similarity to original traces makes our technique uniquely applicable to performance modeling and trace driven simulation of large-scale parallel scientific applications.

  17. Two types of motor strategy for accurate dart throwing.

    Science.gov (United States)

    Nasu, Daiki; Matsuo, Tomoyuki; Kadota, Koji

    2014-01-01

    This study investigated whether expert dart players utilize hand trajectory patterns that can compensate for the inherent variability in their release timing. In this study, we compared the timing error and hand trajectory patterns of expert players with those of novices. Eight experts and eight novices each made 60 dart throws, aiming at the bull's-eye. The movements of the dart and index finger were captured using seven 480-Hz cameras. The data were interpolated using a cubic spline function and analyzed by the millisecond. The estimated vertical errors on the dartboard were calculated as a time-series by using the state variables of the index finger (position, velocity, and direction of motion). This time-series error represents the hand trajectory pattern. Two variables assessing the performance outcome in the vertical plane and two variables related to the timing control were quantified on the basis of the time-series error. The results revealed two typical types of motor strategies in the expert group. The timing error of some experts was similar to that of novices; however, these experts had a longer window of time in which to release an accurately thrown dart. These subjects selected hand trajectory patterns that could compensate for the timing error. Other experts did not select the complementary hand trajectories, but greatly reduced their error in release timing.

  18. Accurate measurement of streamwise vortices in low speed aerodynamic flows

    Science.gov (United States)

    Waldman, Rye M.; Kudo, Jun; Breuer, Kenneth S.

    2010-11-01

    Low Reynolds number experiments with flapping animals (such as bats and small birds) are of current interest in understanding biological flight mechanics, and due to their application to Micro Air Vehicles (MAVs) which operate in a similar parameter space. Previous PIV wake measurements have described the structures left by bats and birds, and provided insight to the time history of their aerodynamic force generation; however, these studies have faced difficulty drawing quantitative conclusions due to significant experimental challenges associated with the highly three-dimensional and unsteady nature of the flows, and the low wake velocities associated with lifting bodies that only weigh a few grams. This requires the high-speed resolution of small flow features in a large field of view using limited laser energy and finite camera resolution. Cross-stream measurements are further complicated by the high out-of-plane flow which requires thick laser sheets and short interframe times. To quantify and address these challenges we present data from a model study on the wake behind a fixed wing at conditions comparable to those found in biological flight. We present a detailed analysis of the PIV wake measurements, discuss the criteria necessary for accurate measurements, and present a new dual-plane PIV configuration to resolve these issues.

  19. Superpixel-based graph cuts for accurate stereo matching

    Science.gov (United States)

    Feng, Liting; Qin, Kaihuai

    2017-06-01

    Estimating the surface normal vector and disparity of a pixel simultaneously, also known as three-dimensional label method, has been widely used in recent continuous stereo matching problem to achieve sub-pixel accuracy. However, due to the infinite label space, it’s extremely hard to assign each pixel an appropriate label. In this paper, we present an accurate and efficient algorithm, integrating patchmatch with graph cuts, to approach this critical computational problem. Besides, to get robust and precise matching cost, we use a convolutional neural network to learn a similarity measure on small image patches. Compared with other MRF related methods, our method has several advantages: its sub-modular property ensures a sub-problem optimality which is easy to perform in parallel; graph cuts can simultaneously update multiple pixels, avoiding local minima caused by sequential optimizers like belief propagation; it uses segmentation results for better local expansion move; local propagation and randomization can easily generate the initial solution without using external methods. Middlebury experiments show that our method can get higher accuracy than other MRF-based algorithms.

  20. Mastering Search Analytics Measuring SEO, SEM and Site Search

    CERN Document Server

    Chaters, Brent

    2011-01-01

    Many companies still approach Search Engine Optimization (SEO) and paid search as separate initiatives. This in-depth guide shows you how to use these programs as part of a comprehensive strategy-not just to improve your site's search rankings, but to attract the right people and increase your conversion rate. Learn how to measure, test, analyze, and interpret all of your search data with a wide array of analytic tools. Gain the knowledge you need to determine the strategy's return on investment. Ideal for search specialists, webmasters, and search marketing managers, Mastering Search Analyt

  1. Search techniques in intelligent classification systems

    CERN Document Server

    Savchenko, Andrey V

    2016-01-01

    A unified methodology for categorizing various complex objects is presented in this book. Through probability theory, novel asymptotically minimax criteria suitable for practical applications in imaging and data analysis are examined including the special cases such as the Jensen-Shannon divergence and the probabilistic neural network. An optimal approximate nearest neighbor search algorithm, which allows faster classification of databases is featured. Rough set theory, sequential analysis and granular computing are used to improve performance of the hierarchical classifiers. Practical examples in face identification (including deep neural networks), isolated commands recognition in voice control system and classification of visemes captured by the Kinect depth camera are included. This approach creates fast and accurate search procedures by using exact probability densities of applied dissimilarity measures. This book can be used as a guide for independent study and as supplementary material for a technicall...

  2. Application of fast orthogonal search to linear and nonlinear stochastic systems

    DEFF Research Database (Denmark)

    Chon, K H; Korenberg, M J; Holstein-Rathlou, N H

    1997-01-01

    linear and nonlinear stochastic ARMA model parameters by using a method known as fast orthogonal search, with an extended model containing prediction errors as part of the model estimation process. The extended algorithm uses fast orthogonal search in a two-step procedure in which deterministic terms...... and accurately. The model order selection criteria developed for the extended algorithm are also crucial in obtaining accurate parameter estimates. Several simulated examples are presented to demonstrate the efficacy of the algorithm....

  3. Two perspectives on similarity between words

    Science.gov (United States)

    Frisch, Stefan A.

    2003-10-01

    This presentation examines the similarity between words from both bottom up (phonetic) and top down (phonological/psycholinguistic) perspectives. From the phonological perspective, the influence of structure on similarity is explored using metalinguistic acceptability judgments for multisyllabic nonwords. Results from an experiment suggest that subjects try to align novel words with known words in order to maximize similarities while minimizing dissimilarities. This finding parallels results from psychology on similarity judgments for visual scenes. From the phonetic perspective, the influence of similar gestures on speech error rates is examined using ultrasound measurement of tongue position. In a pilot experiment, subjects, produced tongue twisters containing words where onset and vowel phonemes had similar gestures (e.g., tip, comb) and where the onset and vowel had dissimilar gestures (e.g., tube, keep). Preliminary results suggest that misarticulations are more frequent in the context of dissimilar gestures (e.g., in the tongue twister tip cape keep tape, error rates are higher for /k/ than /t/). These errors appear to be gestural interactions rather than errors at the phonemic or featural level of phonological spellout. Together, these two experiments indicate that similarity relations between words are found at multiple levels, any which are potentially relevant to the structure of phonological systems.

  4. Electrostatic Similarity Determination Using Multiresolution Analysis.

    Science.gov (United States)

    Hakkoymaz, Huseyin; Kieslich, Chris A; Jr, Ronald D Gorham; Gunopulos, Dimitrios; Morikis, Dimitrios

    2011-08-01

    Molecular similarity is an important tool in protein and drug design for analyzing the quantitative relationships between physicochemical properties of two molecules. We present a family of similarity measures which exploits the ability of wavelet transformation to analyze the spectral components of physicochemical properties and suggests a sensitive way for measuring similarities of biological molecules. In order to investigate how effective wavelet-based similarity measures were against conventional measures, we defined several patterns which involve scalar or topological changes in the distribution of electrostatic properties. The wavelet-based measures were more successful in discriminating these patterns in contrast to the current state-of-art similarity measures. We also present the validity of wavelet-based similarity measures through the hierarchical clustering of two protein datasets consisting of families of homologous domains and alanine scan mutants. This type of similarity analysis is useful for protein structure-function studies and protein design. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Measure of Node Similarity in Multilayer Networks.

    Directory of Open Access Journals (Sweden)

    Anders Mollgaard

    Full Text Available The weight of links in a network is often related to the similarity of the nodes. Here, we introduce a simple tunable measure for analysing the similarity of nodes across different link weights. In particular, we use the measure to analyze homophily in a group of 659 freshman students at a large university. Our analysis is based on data obtained using smartphones equipped with custom data collection software, complemented by questionnaire-based data. The network of social contacts is represented as a weighted multilayer network constructed from different channels of telecommunication as well as data on face-to-face contacts. We find that even strongly connected individuals are not more similar with respect to basic personality traits than randomly chosen pairs of individuals. In contrast, several socio-demographics variables have a significant degree of similarity. We further observe that similarity might be present in one layer of the multilayer network and simultaneously be absent in the other layers. For a variable such as gender, our measure reveals a transition from similarity between nodes connected with links of relatively low weight to dis-similarity for the nodes connected by the strongest links. We finally analyze the overlap between layers in the network for different levels of acquaintanceships.

  6. An accurate and rapid continuous wavelet dynamic time warping algorithm for unbalanced global mapping in nanopore sequencing

    KAUST Repository

    Han, Renmin

    2017-12-24

    Long-reads, point-of-care, and PCR-free are the promises brought by nanopore sequencing. Among various steps in nanopore data analysis, the global mapping between the raw electrical current signal sequence and the expected signal sequence from the pore model serves as the key building block to base calling, reads mapping, variant identification, and methylation detection. However, the ultra-long reads of nanopore sequencing and an order of magnitude difference in the sampling speeds of the two sequences make the classical dynamic time warping (DTW) and its variants infeasible to solve the problem. Here, we propose a novel multi-level DTW algorithm, cwDTW, based on continuous wavelet transforms with different scales of the two signal sequences. Our algorithm starts from low-resolution wavelet transforms of the two sequences, such that the transformed sequences are short and have similar sampling rates. Then the peaks and nadirs of the transformed sequences are extracted to form feature sequences with similar lengths, which can be easily mapped by the original DTW. Our algorithm then recursively projects the warping path from a lower-resolution level to a higher-resolution one by building a context-dependent boundary and enabling a constrained search for the warping path in the latter. Comprehensive experiments on two real nanopore datasets on human and on Pandoraea pnomenusa, as well as two benchmark datasets from previous studies, demonstrate the efficiency and effectiveness of the proposed algorithm. In particular, cwDTW can almost always generate warping paths that are very close to the original DTW, which are remarkably more accurate than the state-of-the-art methods including FastDTW and PrunedDTW. Meanwhile, on the real nanopore datasets, cwDTW is about 440 times faster than FastDTW and 3000 times faster than the original DTW. Our program is available at https://github.com/realbigws/cwDTW.

  7. SUSY Searches at ATLAS

    CERN Document Server

    Mamuzic, Judita; The ATLAS collaboration

    2017-01-01

    Supersymmetry (SUSY) is considered one of the best motivated extensions of the Standard Model. It postulates a fundamental symmetry between fermions and bosons, and introduces a set of new supersymmetric particles at the electroweak scale. It addresses the hierarchy and naturalness problem, gives a solution to the gauge coupling unification, and offers a cold dark matter candidate. Different aspects of SUSY searches, using strong, electroweak, third generation production, and R-parity violation and long lived particles are being studied at the LHC. An overview of most recent SUSY searches results using the 13 TeV ATLAS RUN2 data will be presented.

  8. Searching with iterated maps

    Science.gov (United States)

    Elser, V.; Rankenburg, I.; Thibault, P.

    2007-01-01

    In many problems that require extensive searching, the solution can be described as satisfying two competing constraints, where satisfying each independently does not pose a challenge. As an alternative to tree-based and stochastic searching, for these problems we propose using an iterated map built from the projections to the two constraint sets. Algorithms of this kind have been the method of choice in a large variety of signal-processing applications; we show here that the scope of these algorithms is surprisingly broad, with applications as diverse as protein folding and Sudoku. PMID:17202267

  9. Upgrading Enterprise Search

    Energy Technology Data Exchange (ETDEWEB)

    McDunn, R

    2005-04-28

    This presentation will describe the process we went through this past year to upgrade our enterprise search tool from a very old version of Inktomi to the latest version of Verity Ultraseek. We started with requirements gathering and then compared requirements against several available products to determine which product to choose. After purchasing the product, we worked through several defined phases of implementation and customization, with initial rollout late January 2004. Finally, we will show you where we are today and describe future search plans.

  10. Search for $2\

    CERN Document Server

    :,; Auty, D J; Barbeau, P S; Beck, D; Belov, V; Breidenbach, M; Brunner, T; Burenkov, A; Cao, G F; Chambers, C; Chaves, J; Cleveland, B; Coon, M; Craycraft, A; Daniels, T; Danilov, M; Daugherty, S J; Davis, J; Delaquis, S; Der Mesrobian-Kabakian, A; DeVoe, R; Didberidze, T; Dilling, J; Dolgolenko, A; Dolinski, M J; Dunford, M; Fairbank, W; Farine, J; Feldmeier, W; Feyzbakhsh, S; Fierlinger, P; Fudenberg, D; Gornea, R; Graham, K; Gratta, G; Hall, C; Hughes, M; Jewell, M J; Johnson, A; Johnson, T N; Johnston, S; Karelin, A; Kaufman, L J; Killick, R; King, J; Koffas, T; Kravitz, S; Krücken, R; Kuchenkov, A; Kumar, K S; Leonard, D S; Licciardi, C; Lin, Y H; Ling, J; MacLellan, R; Marino, M G; Mong, B; Moore, D; Njoya, O; Nelson, R; Odian, A; Ostrovskiy, I; Piepke, A; Pocar, A; Prescott, C Y; Retière, F; Rowson, P C; Russell, J J; Schubert, A; Sinclair, D; Smith, E; Stekhanov, V; Tarka, M; Tolba, T; Tsang, R; Twelker, K; Vogel, P; Vuilleumier, J -L; Waite, A; Walton, J; Walton, T; Weber, M; Wen, L J; Wichoski, U; Winick, T A; Wood, J; Xu, Q Y; Yang, L; Yen, Y -R; Zeldovich, O Ya

    2015-01-01

    EXO-200 is a single phase liquid xenon detector designed to search for neutrinoless double-beta decay of $^{136}$Xe to the ground state of $^{136}$Ba. We report here on a search for the two-neutrino double-beta decay of $^{136}$Xe to the first $0^+$ excited state, $0^+_1$, of $^{136}$Ba based on a 100 kg$\\cdot$yr exposure of $^{136}$Xe. Using a specialized analysis employing a machine learning algorithm, we obtain a 90% CL half-life sensitivity of $1.7 \\times 10^{24}$ yr. We find no statistically significant evidence for the $2\

  11. Search for glueballs

    Energy Technology Data Exchange (ETDEWEB)

    Toki, W. [Colorado State Univ., Ft. Collins, CO (United States). Dept. of Physics

    1997-06-01

    In these Summer School lectures, the author reviews the results of recent glueball searches. He begins with a brief review of glueball phenomenology and meson spectroscopy, including a discussion of resonance behavior. The results on the f{sub o}(1500) and f{sub J}(1700) resonances from proton-antiproton experiments and radiative J/{Psi} decays are discussed. Finally, {pi}{pi} and {eta}{pi} studies from D{sub s} decays and exotic meson searches are reviewed. 46 refs., 40 figs.

  12. Searching for excellence

    Science.gov (United States)

    Yager, Robert E.

    Visits to six school districts which were identified by the National Science Teachers Association's Search for Excellence program were made during 1983 by teams of 17 researchers. The reports were analyzed in search for common characteristics that can explain the requirements necessary for excellent science programs. The results indicate that creative ideas, administrative and community involvement, local ownership and pride, and well-developed in-service programs and implementation strategies are vital. Exceptional teachers with boundless energies also seem to exist where exemplary science programs are found.

  13. The Search for Extraterrestrial Life

    Science.gov (United States)

    Peter, Ulmschneider

    Looking at the nature, origin, and evolution of life on Earth is one way of assessing whether extraterrestrial life exists on Earth-like planets elsewhere (see Chaps. 5 and 6). A more direct approach is to search for favorable conditions and traces of life on other celestial bodies, both in the solar system and beyond. Clearly, there is little chance of encountering nonhuman intelligent beings in the solar system. But there could well be primitive life on Mars, particularly as in the early history of the solar system the conditions on Mars were quite similar to those on Earth. In addition, surprisingly favorable conditions for life once existed on the moons of Jupiter. Yet even if extraterrestrial life is not encountered in forthcoming space missions, it would be of utmost importance to recover fossils of past organisms as such traces would greatly contribute to our basic understanding of the formation of life. In addition to the planned missions to Mars and Europa, there are extensive efforts to search for life outside the solar system. Rapid advances in the detection of extrasolar planets, outlined in Chap. 3, are expected to lead to the discovery of Earth-like planets in the near future. But how can we detect life on these distant bodies?

  14. Similarity and Decay Laws of Momentumless Wakes.

    Science.gov (United States)

    1977-05-26

    Naval Sea Systems Command , Code NSEA—03133. -‘ F 9 —2— 26 May 1977 SN : j ep SIMILARITY AND DECAY LAWS OF MOMENTUI’ILESS WAKE S by Samuel Ilassid...and this neg lect will have to be justified a posteriori. Now, one seeks self-similar solutions of the type: k = Xh(ri) c = Ee(n) Ud = UDf (n) ~ = where...0 9x r 3r L6 c Again, seeking self-similar solutions of the type: k = Kh(ti) c = Ee(ri) Ud = UDf (n) r~ = (6.4) and letting 2 2 = 1 (6.5)E L one

  15. Similarity-based pattern analysis and recognition

    CERN Document Server

    Pelillo, Marcello

    2013-01-01

    This accessible text/reference presents a coherent overview of the emerging field of non-Euclidean similarity learning. The book presents a broad range of perspectives on similarity-based pattern analysis and recognition methods, from purely theoretical challenges to practical, real-world applications. The coverage includes both supervised and unsupervised learning paradigms, as well as generative and discriminative models. Topics and features: explores the origination and causes of non-Euclidean (dis)similarity measures, and how they influence the performance of traditional classification alg

  16. Joint Frequency Ambiguity Resolution and Accurate Timing Estimation in OFDM Systems with Multipath Fading

    Directory of Open Access Journals (Sweden)

    Ouyang Shan

    2006-01-01

    Full Text Available A serious disadvantage of orthogonal frequency-division multiplexing (OFDM is its sensitivity to carrier frequency offset (CFO and timing offset (TO. For many low-complexity algorithms, the estimation ambiguity exists when the CFO is greater than one or two subcarrier spacing, and the estimated TO is also prone to exceeding the ISI-free interval within the cyclic prefix (CP. This paper presents a method for joint CFO ambiguity resolution and accurate TO estimation in multipath fading. Maximum-likelihood (ML principle is employed and only one pilot symbol is needed. Frequency ambiguity is resolved and accurate TO can be obtained simultaneously by using the fast Fourier transform (FFT and one-dimensional (1D search. Both known and unknown channel order cases are considered. Computer simulations show that the proposed algorithm outperforms some others in the multipath fading channels.

  17. SUSY Searches at the Tevatron

    Energy Technology Data Exchange (ETDEWEB)

    Zivkovic, L.

    2011-07-01

    In this article results from supersymmetry searches at D0 and CDF are reported. Searches for third generation squarks, searches for gauginos, and searches for models with R-parity violation are described. As no signs of supersymmetry for these models are observed, the most stringent limits to date are presented.

  18. Complete local search with memory

    NARCIS (Netherlands)

    Ghosh, D.; Sierksma, G.

    2000-01-01

    Neighborhood search heuristics like local search and its variants are some of the most popular approaches to solve discrete optimization problems of moderate to large size. Apart from tabu search, most of these heuristics are memoryless. In this paper we introduce a new neighborhood search heuristic

  19. Complete local search with memory

    NARCIS (Netherlands)

    Ghosh, D.; Sierksma, G.

    2002-01-01

    Neighborhood search heuristics like local search and its variants are some of the most popular approaches to solve discrete optimization problems of moderate to large size. Apart from tabu search, most of these heuristics are memoryless. In this paper we introduce a new neighborhood search heuristic

  20. The Evolution of Web Searching.

    Science.gov (United States)

    Green, David

    2000-01-01

    Explores the interrelation between Web publishing and information retrieval technologies and lists new approaches to Web indexing and searching. Highlights include Web directories; search engines; portalisation; Internet service providers; browser providers; meta search engines; popularity based analysis; natural language searching; links-based…

  1. Prices and heterogeneous search costs

    NARCIS (Netherlands)

    Moraga Gonzalez, J.L.; Sandor, Z.; Wildenbeest, M.R.

    2017-01-01

    We study price formation in a model of consumer search for differentiated products in which consumers have heterogeneous search costs. We provide conditions under which a pure-strategy symmetric Nash equilibrium exists and is unique. Search costs affect two margins—the intensive search margin (or

  2. Self-learning search engines

    NARCIS (Netherlands)

    Schuth, A.

    How does a search engine such as Google know which search results to display? There are many competing algorithms that generate search results, but which one works best? We developed a new probabilistic method for quickly comparing large numbers of search algorithms by examining the results users

  3. Standardization of Keyword Search Mode

    Science.gov (United States)

    Su, Di

    2010-01-01

    In spite of its popularity, keyword search mode has not been standardized. Though information professionals are quick to adapt to various presentations of keyword search mode, novice end-users may find keyword search confusing. This article compares keyword search mode in some major reference databases and calls for standardization. (Contains 3…

  4. Distances and similarities in intuitionistic fuzzy sets

    CERN Document Server

    Szmidt, Eulalia

    2014-01-01

    This book presents the state-of-the-art in theory and practice regarding similarity and distance measures for intuitionistic fuzzy sets. Quantifying similarity and distances is crucial for many applications, e.g. data mining, machine learning, decision making, and control. The work provides readers with a comprehensive set of theoretical concepts and practical tools for both defining and determining similarity between intuitionistic fuzzy sets. It describes an automatic algorithm for deriving intuitionistic fuzzy sets from data, which can aid in the analysis of information in large databases. The book also discusses other important applications, e.g. the use of similarity measures to evaluate the extent of agreement between experts in the context of decision making.

  5. Comparing methods for single paragraph similarity analysis.

    Science.gov (United States)

    Stone, Benjamin; Dennis, Simon; Kwantes, Peter J

    2011-01-01

    The focus of this paper is two-fold. First, similarities generated from six semantic models were compared to human ratings of paragraph similarity on two datasets-23 World Entertainment News Network paragraphs and 50 ABC newswire paragraphs. Contrary to findings on smaller textual units such as word associations (Griffiths, Tenenbaum, & Steyvers, 2007), our results suggest that when single paragraphs are compared, simple nonreductive models (word overlap and vector space) can provide better similarity estimates than more complex models (LSA, Topic Model, SpNMF, and CSM). Second, various methods of corpus creation were explored to facilitate the semantic models' similarity estimates. Removing numeric and single characters, and also truncating document length improved performance. Automated construction of smaller Wikipedia-based corpora proved to be very effective, even improving upon the performance of corpora that had been chosen for the domain. Model performance was further improved by augmenting corpora with dataset paragraphs. Copyright © 2010 Cognitive Science Society, Inc.

  6. Discovering Music Structure via Similarity Fusion

    DEFF Research Database (Denmark)

    Arenas-García, Jerónimo; Parrado-Hernandez, Emilio; Meng, Anders

    Automatic methods for music navigation and music recommendation exploit the structure in the music to carry out a meaningful exploration of the “song space”. To get a satisfactory performance from such systems, one should incorporate as much information about songs similarity as possible; however...... semantics”, in such a way that all observed similarities can be satisfactorily explained using the latent semantics. Therefore, one can think of these semantics as the real structure in music, in the sense that they can explain the observed similarities among songs. The suitability of the PLSA model...... for representing music structure is studied in a simplified scenario consisting of 4412 songs and two similarity measures among them. The results suggest that the PLSA model is a useful framework to combine different sources of information, and provides a reasonable space for song representation....

  7. Abundance estimation of spectrally similar minerals

    CSIR Research Space (South Africa)

    Debba, Pravesh

    2009-07-01

    Full Text Available This paper evaluates a spectral unmixing method for estimating the partial abundance of spectrally similar minerals in complex mixtures. The method requires formulation of a linear function of individual spectra of individual minerals. The first...

  8. A Hybrid Model Ranking Search Result for Research Paper Searching on Social Bookmarking

    Directory of Open Access Journals (Sweden)

    pijitra jomsri

    2015-11-01

    Full Text Available Social bookmarking and publication sharing systems are essential tools for web resource discovery. The performance and capabilities of search results from research paper bookmarking system are vital. Many researchers use social bookmarking for searching papers related to their topics of interest. This paper proposes a combination of similarity based indexing “tag title and abstract” and static ranking to improve search results. In this particular study, the year of the published paper and type of research paper publication are combined with similarity ranking called (HybridRank. Different weighting scores are employed. The retrieval performance of these weighted combination rankings are evaluated using mean values of NDCG. The results suggest that HybridRank and similarity rank with weight 75:25 has the highest NDCG scores. From the preliminary result of experiment, the combination ranking technique provide more relevant research paper search results. Furthermore the chosen heuristic ranking can improve the efficiency of research paper searching on social bookmarking websites.

  9. Comparison of PubMed and Google Scholar literature searches.

    Science.gov (United States)

    Anders, Michael E; Evans, Dennis P

    2010-05-01

    Literature searches are essential to evidence-based respiratory care. To conduct literature searches, respiratory therapists rely on search engines to retrieve information, but there is a dearth of literature on the comparative efficiencies of search engines for researching clinical questions in respiratory care. To compare PubMed and Google Scholar search results for clinical topics in respiratory care to that of a benchmark. We performed literature searches with PubMed and Google Scholar, on 3 clinical topics. In PubMed we used the Clinical Queries search filter. In Google Scholar we used the search filters in the Advanced Scholar Search option. We used the reference list of a related Cochrane Collaboration evidence-based systematic review as the benchmark for each of the search results. We calculated recall (sensitivity) and precision (positive predictive value) with 2 x 2 contingency tables. We compared the results with the chi-square test of independence and Fisher's exact test. PubMed and Google Scholar had similar recall for both overall search results (71% vs 69%) and full-text results (43% vs 51%). PubMed had better precision than Google Scholar for both overall search results (13% vs 0.07%, P searches with the Clinical Queries filter are more precise than with the Advanced Scholar Search in Google Scholar for respiratory care topics. PubMed appears to be more practical to conduct efficient, valid searches for informing evidence-based patient-care protocols, for guiding the care of individual patients, and for educational purposes.

  10. On distributional assumptions and whitened cosine similarities

    DEFF Research Database (Denmark)

    Loog, Marco

    2008-01-01

    Recently, an interpretation of the whitened cosine similarity measure as a Bayes decision rule was proposed (C. Liu, "The Bayes Decision Rule Induced Similarity Measures,'' IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 6, pp. 1086-1090, June 2007. This communication makes...... the observation that some of the distributional assumptions made to derive this measure are very restrictive and, considered simultaneously, even inconsistent....

  11. Privacy-preserving matching of similar patients.

    Science.gov (United States)

    Vatsalan, Dinusha; Christen, Peter

    2016-02-01

    The identification of similar entities represented by records in different databases has drawn considerable attention in many application areas, including in the health domain. One important type of entity matching application that is vital for quality healthcare analytics is the identification of similar patients, known as similar patient matching. A key component of identifying similar records is the calculation of similarity of the values in attributes (fields) between these records. Due to increasing privacy and confidentiality concerns, using the actual attribute values of patient records to identify similar records across different organizations is becoming non-trivial because the attributes in such records often contain highly sensitive information such as personal and medical details of patients. Therefore, the matching needs to be based on masked (encoded) values while being effective and efficient to allow matching of large databases. Bloom filter encoding has widely been used as an efficient masking technique for privacy-preserving matching of string and categorical values. However, no work on Bloom filter-based masking of numerical data, such as integer (e.g. age), floating point (e.g. body mass index), and modulus (numbers wrap around upon reaching a certain value, e.g. date and time), which are commonly required in the health domain, has been presented in the literature. We propose a framework with novel methods for masking numerical data using Bloom filters, thereby facilitating the calculation of similarities between records. We conduct an empirical study on publicly available real-world datasets which shows that our framework provides efficient masking and achieves similar matching accuracy compared to the matching of actual unencoded patient records. Copyright © 2015 Elsevier Inc. All rights reserved.

  12. Crossing Rivers Problem Solution with Breadth-First Search Approach

    Science.gov (United States)

    Ratnadewi, R.; Sartika, E. M.; Rahim, R.; Anwar, B.; Syahril, M.; Winata, H.

    2018-01-01

    Crossing river problem is one of the problems state and space that represents a state by using a rule to define the problem, to find solutions to state and space problem can use breadth first search method by conducting a search based on rule and condition given, it is expected with this research crossing river problem could be solved properly and optimally and also this research can be the basis of solving other similar problems by using breadth first search method, to model the situation and the problem space using Breadth First Search made an application with a programming language.

  13. An effective collaborative movie recommender system with cuckoo search

    Directory of Open Access Journals (Sweden)

    Rahul Katarya

    2017-07-01

    Full Text Available Recommender systems are information filtering tools that aspire to predict the rating for users and items, predominantly from big data to recommend their likes. Movie recommendation systems provide a mechanism to assist users in classifying users with similar interests. This makes recommender systems essentially a central part of websites and e-commerce applications. This article focuses on the movie recommendation systems whose primary objective is to suggest a recommender system through data clustering and computational intelligence. In this research article, a novel recommender system has been discussed which makes use of k-means clustering by adopting cuckoo search optimization algorithm applied on the Movielens dataset. Our approach has been explained systematically, and the subsequent results have been discussed. It is also compared with existing approaches, and the results have been analyzed and interpreted. Evaluation metrics such as mean absolute error (MAE, standard deviation (SD, root mean square error (RMSE and t-value for the movie recommender system delivers better results as our approach offers lesser value of the mean absolute error, standard deviation, and root mean square error. The experiment results obtained on Movielens dataset stipulate that the proposed approach may provide high performance regarding reliability, efficiency and delivers accurate personalized movie recommendations when compared with existing methods. Our proposed system (K-mean Cuckoo has 0.68 MAE, which is superior to existing work (0.78 MAE [1] and also has improvement of our previous work (0.75 MAE [2].

  14. Searching for Movies

    DEFF Research Database (Denmark)

    Bogers, Toine

    2015-01-01

    Despite a surge in popularity of work on casual leisure search, some leisure domains are still relatively underrepresented. Movies are good example of such a domain, which is peculiar given the popularity of movie-centered websites and discovery services such as IMDB, RottenTomatoes, and Netflix...

  15. SUSY Search at LHC

    CERN Document Server

    Xu, Da; The ATLAS collaboration

    2018-01-01

    Despite the absence of experimental evidence, weak scale supersymmetry remains one of the best motivated and studied Standard Model extensions. This talk gives an overview of the most recent SUSY searches in ATLAS and CMS experiments using 13 TeV ATLAS Run2 data.

  16. Next Generation Search Interfaces

    Science.gov (United States)

    Roby, W.; Wu, X.; Ly, L.; Goldina, T.

    2015-09-01

    Astronomers are constantly looking for easier ways to access multiple data sets. While much effort is spent on VO, little thought is given to the types of User Interfaces we need to effectively search this sort of data. For instance, an astronomer might need to search Spitzer, WISE, and 2MASS catalogs and images then see the results presented together in one UI. Moving seamlessly between data sets is key to presenting integrated results. Results need to be viewed using first class, web based, integrated FITS viewers, XY Plots, and advanced table display tools. These components should be able to handle very large datasets. To make a powerful Web based UI that can manage and present multiple searches to the user requires taking advantage of many HTML5 features. AJAX is used to start searches and present results. Push notifications (Server Sent Events) monitor background jobs. Canvas is required for advanced result displays. Lesser known CSS3 technologies makes it all flow seamlessly together. At IPAC, we have been developing our Firefly toolkit for several years. We are now using it to solve this multiple data set, multiple queries, and integrated presentation problem to create a powerful research experience. Firefly was created in IRSA, the NASA/IPAC Infrared Science Archive (http://irsa.ipac.caltech.edu). Firefly is the core for applications serving many project archives, including Spitzer, Planck, WISE, PTF, LSST and others. It is also used in IRSA's new Finder Chart and catalog and image displays.

  17. Internet video search

    NARCIS (Netherlands)

    Snoek, C.G.M.; Smeulders, A.W.M.

    2011-01-01

    In this tutorial, we focus on the challenges in internet video search, present methods how to achieve state-of-the-art performance while maintaining efficient execution, and indicate how to obtain improvements in the near future. Moreover, we give an overview of the latest developments and future

  18. The Great Superintendent Search.

    Science.gov (United States)

    Mayo, C. Russell

    2003-01-01

    Discusses three sequential questions boards should reach consensus on before beginning a superintendent search. Questions involve district's position related to education issues deemed important to the community, a new superintendent's personality traits and leadership skills required to address these issues, and whether the board should seek to…

  19. Google Search Mastery Basics

    Science.gov (United States)

    Hill, Paul; MacArthur, Stacey; Read, Nick

    2014-01-01

    Effective Internet search skills are essential with the continually increasing amount of information available on the Web. Extension personnel are required to find information to answer client questions and to conduct research on programs. Unfortunately, many lack the skills necessary to effectively navigate the Internet and locate needed…

  20. Search for intervalmodels

    DEFF Research Database (Denmark)

    Høskuldsson, Agnar

    1996-01-01

    Methods are presented that carry out sorting of data according to some criteria, and investigate the possibilities of finding intervals that give separate models relative to the given data. The methods presented are more reliable than related clustering methods, because the search is carried out...

  1. Searching low and high

    DEFF Research Database (Denmark)

    Laursen, Keld; Salter, Ammon

    2003-01-01

    AbstractThis paper examines the factors that influence whether firms draw from universities in theirinnovative activities. The link between the universities and industrial innovation, and the role ofdifferent search strategies in influencing the propensity of firms to use universities is explored...

  2. Searching circular DNA strands

    Energy Technology Data Exchange (ETDEWEB)

    Eliazar, Iddo [Department of Technology Management, Holon Institute of Technology, Holon 58102 (Israel); Koren, Tal [School of Chemistry, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978 (Israel); Klafter, Joseph [School of Chemistry, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978 (Israel)

    2007-02-14

    We introduce and explore a model of an ensemble of enzymes searching, in parallel, a circular DNA strand for a target site. The enzymes performing the search combine local scanning-conducted by a 1D motion along the strand-and random relocations on the strand-conducted via a confined motion in the medium containing the strand. Both the local scan mechanism and the relocation mechanism are considered general. The search durations are analysed, and their limiting probability distributions-for long DNA strands-are obtained in closed form. The results obtained (i) encompass the cases of single, parallel and massively parallel searches, taking place in the presence of either finite-mean or heavy-tailed relocation times, (ii) are applicable to a wide spectrum of local scan mechanisms including linear, Brownian, selfsimilar, and sub-diffusive motions (iii) provide a quantitative theoretical justification for the necessity of the relocation mechanism, and (iv) facilitate the derivation of optimal relocation strategies.

  3. Biased predecessor search

    DEFF Research Database (Denmark)

    Bose, Prosenjit; Fagerberg, Rolf; Howat, John

    2014-01-01

    We consider the problem of performing predecessor searches in a bounded universe while achieving query times that depend on the distribution of queries. We obtain several data structures with various properties: in particular, we give data structures that achieve expected query times logarithmic...... structures with general weights on universe elements, as well as the case when the distribution is not known in advance....

  4. A data mining approach to selecting herbs with similar efficacy: Targeted selection methods based on medical subject headings (MeSH).

    Science.gov (United States)

    Yea, Sang-Jun; Seong, BoSeok; Jang, Yunji; Kim, Chul

    2016-04-22

    Natural products have long been the most important source of ingredients in the discovery of new drugs. Moreover, since the Nagoya Protocol, finding alternative herbs with similar efficacy in traditional medicine has become a very important issue. Although random selection is a common method of finding ethno-medicinal herbs of similar efficacy, it proved to be less effective; therefore, this paper proposes a novel targeted selection method using data mining approaches in the MEDLINE database in order to identify and select herbs with a similar degree of efficacy. From among sixteen categories of medical subject headings (MeSH) descriptors, three categories containing terms related to herbal compounds, efficacy, toxicity, and the metabolic process were selected. In order to select herbs of similar efficacy in a targeted way, we adopted the similarity measurement method based on MeSH. In order to evaluate the proposed algorithm, we built up three different validation datasets which contain lists of original herbs and corresponding medicinal herbs of similar efficacy. The average area under curve (AUC) of the proposed algorithm was found to be about 500% larger than the random selection method. We found that the proposed algorithm puts more hits at the front of the top-10 list than the random selection method, and precisely discerns the efficacy of the herbs. It was also found that the AUC of the experiments either remained the same or increased slightly in all three validation datasets as the search range was increased. This study reveals and proves that the proposed algorithm is significantly more accurate and efficient in finding alternative herbs of similar efficacy than the random selection method. As such, it is hoped that this approach will be used in diverse applications in the ethno-pharmacology field. Copyright © 2016 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.

  5. How Accurately Can Emergency Department Providers Estimate Patient Satisfaction?

    Directory of Open Access Journals (Sweden)

    Lalena M. Yarris

    2012-09-01

    Full Text Available Introduction: Patient satisfaction is an important measure of emergency department (ED quality of care. Little is known about providers’ ability to estimate patient satisfaction. We aimed to measure providers’ ability to assess patient satisfaction and hypothesized that providers could accurately estimate overall patient satisfaction.Methods: We surveyed ED patients regarding satisfaction with their care. Treating providers completed analogous surveys, estimating patients’ responses. Sexual assault victims and non-English-speaking or severely ill patients were excluded. Satisfaction responses were categorized as ‘‘satisfied’’ or ‘‘not satisfied.’’ Patient satisfaction scores were considered the ‘‘gold standard,’’ and providers’ perceptions of the patient satisfaction were considered tests. Measures of diagnosticaccuracy, such as positive predictive value (PPV and sensitivity, were used to assess how accurately the provider could estimate his or her patient’s satisfaction.Results: Here, 242/457 eligible patients (53% completed the survey; 227 providers (94% completed a corresponding survey. Subject-reported overall satisfaction was 96.6%, compared with a provider estimated rate of 94.4%. The sensitivity and PPV of the provider’s estimate of the patient’s satisfaction were 95.2 (95% confidence interval [CI] 91.4, 97.7 and 97.5 (95% CI 94.4, 99.2, respectively, for overall patient satisfaction. The PPV was similar for clarity of communication. The PPV was 78.9 for perceived length of ED stay (99% CI 70.8, 85.6 and 82.6 for quality of pain control (95% CI 68.6, 92.2. Accuracy of attending and resident estimates of patient satisfaction did not differ significantly. The agreement between patient-reported and provider-estimated patient satisfaction was not associated with age, gender, patient disposition, or ED divert status.Conclusion: Providers are able to assess overall patient satisfaction and clarity of

  6. Batch-Orthogonal Locality-Sensitive Hashing for Angular Similarity.

    Science.gov (United States)

    Ji, Jianqiu; Yan, Shuicheng; Li, Jianmin; Gao, Guangyu; Tian, Qi; Zhang, Bo

    2014-10-01

    Sign-random-projection locality-sensitive hashing (SRP-LSH) is a widely used hashing method, which provides an unbiased estimate of pairwise angular similarity, yet may suffer from its large estimation variance. We propose in this work batch-orthogonal locality-sensitive hashing (BOLSH), as a significant improvement of SRP-LSH. Instead of independent random projections, BOLSH makes use of batch-orthogonalized random projections, i.e, we divide random projection vectors into several batches and orthogonalize the vectors in each batch respectively. These batch-orthogonalized random projections partition the data space into regular regions, and thus provide a more accurate estimator. We prove theoretically that BOLSH still provides an unbiased estimate of pairwise angular similarity, with a smaller variance for any angle in (0, π), compared with SRP-LSH. Furthermore, we give a lower bound on the reduction of variance. The extensive experiments on real data well validate that with the same length of binary code, BOLSH may achieve significant mean squared error reduction in estimating pairwise angular similarity. Moreover, BOLSH shows the superiority in extensive approximate nearest neighbor (ANN) retrieval experiments.

  7. On the similarity of symbol frequency distributions with heavy tails

    CERN Document Server

    Gerlach, Martin; Altmann, Eduardo G

    2015-01-01

    Quantifying the similarity between symbolic sequences is a traditional problem in Information Theory which requires comparing the frequencies of symbols in different sequences. In numerous modern applications, ranging from DNA over music to texts, the distribution of symbol frequencies is characterized by heavy-tailed distributions (e.g., Zipf's law). The large number of low-frequency symbols in these distributions poses major difficulties to the estimation of the similarity between sequences, e.g., they hinder an accurate finite-size estimation of entropies. Here we show how the accuracy of estimations depend on the sample size~$N$, not only for the Shannon entropy $(\\alpha=1)$ and its corresponding similarity measures (e.g., the Jensen-Shanon divergence) but also for measures based on the generalized entropy of order $\\alpha$. For small $\\alpha$'s, including $\\alpha=1$, the bias and fluctuations in the estimations decay slower than the $1/N$ decay observed in short-tailed distributions. For $\\alpha$ larger ...

  8. Identifying multiple influential spreaders via local structural similarity

    Science.gov (United States)

    Liu, J.-G.; Wang, Z.-Y.; Guo, Q.; Guo, L.; Chen, Q.; Ni, Y.-Z.

    2017-07-01

    Identifying the nodes with largest spreading influence is of significance for information diffusion, product exposure and contagious disease detection. In this letter, based on the local structural similarity, we present a method (LSS) to identify the multiple influential spreaders. Firstly, we choose the node with the largest degree as the first spreader. Then the new spreaders would be selected if they belong to the first- or second-order-neighbor node set of the spreaders and their local structural similarities with other spreaders are smaller than the threshold parameter r. Comparing with the susceptible-infected-recovered model, the experimental results for four empirical networks show that the spreading influences of spreaders selected by the local structural similarity method are larger than that of the color method, the degree, betweenness and closeness centralities. The further experimental results for the Barabàsi-Albert and random networks show that the LSS method could identify the multiple influential spreaders more accurately, which suggests that the local structural property plays a more important role than the distance for identifying multiple influential spreaders.

  9. Accurate position estimation methods based on electrical impedance tomography measurements

    Science.gov (United States)

    Vergara, Samuel; Sbarbaro, Daniel; Johansen, T. A.

    2017-08-01

    Electrical impedance tomography (EIT) is a technology that estimates the electrical properties of a body or a cross section. Its main advantages are its non-invasiveness, low cost and operation free of radiation. The estimation of the conductivity field leads to low resolution images compared with other technologies, and high computational cost. However, in many applications the target information lies in a low intrinsic dimensionality of the conductivity field. The estimation of this low-dimensional information is addressed in this work. It proposes optimization-based and data-driven approaches for estimating this low-dimensional information. The accuracy of the results obtained with these approaches depends on modelling and experimental conditions. Optimization approaches are sensitive to model discretization, type of cost function and searching algorithms. Data-driven methods are sensitive to the assumed model structure and the data set used for parameter estimation. The system configuration and experimental conditions, such as number of electrodes and signal-to-noise ratio (SNR), also have an impact on the results. In order to illustrate the effects of all these factors, the position estimation of a circular anomaly is addressed. Optimization methods based on weighted error cost functions and derivate-free optimization algorithms provided the best results. Data-driven approaches based on linear models provided, in this case, good estimates, but the use of nonlinear models enhanced the estimation accuracy. The results obtained by optimization-based algorithms were less sensitive to experimental conditions, such as number of electrodes and SNR, than data-driven approaches. Position estimation mean squared errors for simulation and experimental conditions were more than twice for the optimization-based approaches compared with the data-driven ones. The experimental position estimation mean squared error of the data-driven models using a 16-electrode setup was less

  10. Biomimetic Approach for Accurate, Real-Time Aerodynamic Coefficients Project

    Data.gov (United States)

    National Aeronautics and Space Administration — Aerodynamic and structural reliability and efficiency depends critically on the ability to accurately assess the aerodynamic loads and moments for each lifting...

  11. Triplet correlations among similarly tuned cells impact population coding

    Directory of Open Access Journals (Sweden)

    Natasha Alexandra Cayco Gajic

    2015-05-01

    Full Text Available Which statistical features of spiking activity matter for how stimuli are encoded in neural populations? A vast body of work has explored how firing rates in individual cells and correlations in the spikes of cell pairs impact coding. Recent experiments have shown evidence for the existence of higher-order spiking correlations, which describe simultaneous firing in triplets and larger ensembles of cells; however, little is known about their impact on encoded stimulus information. Here, we take a first step toward closing this gap. We vary triplet correlations in small (approximately 10 cell neural populations while keeping single cell and pairwise statistics fixed at typically reported values. This connection with empirically observed lower-order statistics important, as it places strong constraints on the level of triplet correlations that can occur. For each value of triplet correlations, we estimate the performance of the neural population on a two-stimulus discrimination task. We find that the allowed changes in the level of triplet correlations can significantly enhance coding, in particular if triplet correlations differ for the two stimuli. In this scenario, triplet correlations must be included in order to accurately quantify the functionality of neural populations. When both stimuli elicit similar triplet correlations, however, pairwise models provide relatively accurate descriptions of coding accuracy. We explain our findings geometrically via the skew that triplet correlations induce in population-wide distributions of neural responses. Finally, we calculate how many samples are necessary to accurately measure spiking correlations of this type, providing an estimate of the necessary recording times in future experiments.

  12. Foraging patterns in online searches

    Science.gov (United States)

    Wang, Xiangwen; Pleimling, Michel

    2017-03-01

    Nowadays online searches are undeniably the most common form of information gathering, as witnessed by billions of clicks generated each day on search engines. In this work we describe online searches as foraging processes that take place on the semi-infinite line. Using a variety of quantities like probability distributions and complementary cumulative distribution functions of step length and waiting time as well as mean square displacements and entropies, we analyze three different click-through logs that contain the detailed information of millions of queries submitted to search engines. Notable differences between the different logs reveal an increased efficiency of the search engines. In the language of foraging, the newer logs indicate that online searches overwhelmingly yield local searches (i.e., on one page of links provided by the search engines), whereas for the older logs the foraging processes are a combination of local searches and relocation phases that are power law distributed. Our investigation of click logs of search engines therefore highlights the presence of intermittent search processes (where phases of local explorations are separated by power law distributed relocation jumps) in online searches. It follows that good search engines enable the users to find the information they are looking for through a local exploration of a single page with search results, whereas for poor search engine users are often forced to do a broader exploration of different pages.

  13. Semantic similarity between ontologies at different scales

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Qingpeng; Haglin, David J.

    2016-04-01

    In the past decade, existing and new knowledge and datasets has been encoded in different ontologies for semantic web and biomedical research. The size of ontologies is often very large in terms of number of concepts and relationships, which makes the analysis of ontologies and the represented knowledge graph computational and time consuming. As the ontologies of various semantic web and biomedical applications usually show explicit hierarchical structures, it is interesting to explore the trade-offs between ontological scales and preservation/precision of results when we analyze ontologies. This paper presents the first effort of examining the capability of this idea via studying the relationship between scaling biomedical ontologies at different levels and the semantic similarity values. We evaluate the semantic similarity between three Gene Ontology slims (Plant, Yeast, and Candida, among which the latter two belong to the same kingdom—Fungi) using four popular measures commonly applied to biomedical ontologies (Resnik, Lin, Jiang-Conrath, and SimRel). The results of this study demonstrate that with proper selection of scaling levels and similarity measures, we can significantly reduce the size of ontologies without losing substantial detail. In particular, the performance of Jiang-Conrath and Lin are more reliable and stable than that of the other two in this experiment, as proven by (a) consistently showing that Yeast and Candida are more similar (as compared to Plant) at different scales, and (b) small deviations of the similarity values after excluding a majority of nodes from several lower scales. This study provides a deeper understanding of the application of semantic similarity to biomedical ontologies, and shed light on how to choose appropriate semantic similarity measures for biomedical engineering.

  14. Identifying mechanistic similarities in drug responses

    KAUST Repository

    Zhao, C.

    2012-05-15

    Motivation: In early drug development, it would be beneficial to be able to identify those dynamic patterns of gene response that indicate that drugs targeting a particular gene will be likely or not to elicit the desired response. One approach would be to quantitate the degree of similarity between the responses that cells show when exposed to drugs, so that consistencies in the regulation of cellular response processes that produce success or failure can be more readily identified.Results: We track drug response using fluorescent proteins as transcription activity reporters. Our basic assumption is that drugs inducing very similar alteration in transcriptional regulation will produce similar temporal trajectories on many of the reporter proteins and hence be identified as having similarities in their mechanisms of action (MOA). The main body of this work is devoted to characterizing similarity in temporal trajectories/signals. To do so, we must first identify the key points that determine mechanistic similarity between two drug responses. Directly comparing points on the two signals is unrealistic, as it cannot handle delays and speed variations on the time axis. Hence, to capture the similarities between reporter responses, we develop an alignment algorithm that is robust to noise, time delays and is able to find all the contiguous parts of signals centered about a core alignment (reflecting a core mechanism in drug response). Applying the proposed algorithm to a range of real drug experiments shows that the result agrees well with the prior drug MOA knowledge. © The Author 2012. Published by Oxford University Press. All rights reserved.

  15. Reference module selection criteria for accurate testing of photovoltaic (PV) panels

    Energy Technology Data Exchange (ETDEWEB)

    Roy, J.N.; Gariki, Govardhan Rao; Nagalakhsmi, V. [Solar Semiconductor Pvt. Ltd., Banjara Hills, Hyderabad (India)

    2010-01-15

    It is shown that for accurate testing of PV panels the correct selection of reference modules is important. A detailed description of the test methodology is given. Three different types of reference modules, having different I{sub SC} (short circuit current) and power (in Wp) have been used for this study. These reference modules have been calibrated from NREL. It has been found that for accurate testing, both I{sub SC} and power of the reference module must be either similar or exceed to that of modules under test. In case corresponding values of the test modules are less than a particular limit, the measurements may not be accurate. The experimental results obtained have been modeled by using simple equivalent circuit model and associated I-V equations. (author)

  16. Accurate Prediction of Motor Failures by Application of Multi CBM Tools: A Case Study

    Science.gov (United States)

    Dutta, Rana; Singh, Veerendra Pratap; Dwivedi, Jai Prakash

    2018-02-01

    Motor failures are very difficult to predict accurately with a single condition-monitoring tool as both electrical and the mechanical systems are closely related. Electrical problem, like phase unbalance, stator winding insulation failures can, at times, lead to vibration problem and at the same time mechanical failures like bearing failure, leads to rotor eccentricity. In this case study of a 550 kW blower motor it has been shown that a rotor bar crack was detected by current signature analysis and vibration monitoring confirmed the same. In later months in a similar motor vibration monitoring predicted bearing failure and current signature analysis confirmed the same. In both the cases, after dismantling the motor, the predictions were found to be accurate. In this paper we will be discussing the accurate predictions of motor failures through use of multi condition monitoring tools with two case studies.

  17. Epidemiology of Recurrent Acute and Chronic Pancreatitis: Similarities and Differences.

    Science.gov (United States)

    Machicado, Jorge D; Yadav, Dhiraj

    2017-07-01

    Emerging data in the past few years suggest that acute, recurrent acute (RAP), and chronic pancreatitis (CP) represent a disease continuum. This review discusses the similarities and differences in the epidemiology of RAP and CP. RAP is a high-risk group, comprised of individuals at varying risk of progression. The premise is that RAP is an intermediary stage in the pathogenesis of CP, and a subset of RAP patients during their natural course transition to CP. Although many clinical factors have been identified, accurately predicting the probability of disease course in individual patients remains difficult. Future studies should focus on providing more precise estimates of the risk of disease transition in a cohort of patients, quantification of clinical events during the natural course of disease, and discovery of biomarkers of the different stages of the disease continuum. Availability of clinically relevant endpoints and linked biomarkers will allow more accurate prediction of the natural course of disease over intermediate- or long-term-based characteristics of an individual patient. These endpoints will also provide objective measures for use in clinical trials of interventions that aim to alter the natural course of disease.

  18. Affinity Propagation Clustering Using Path Based Similarity

    Directory of Open Access Journals (Sweden)

    Yuan Jiang

    2016-07-01

    Full Text Available Clustering is a fundamental task in data mining. Affinity propagation clustering (APC is an effective and efficient clustering technique that has been applied in various domains. APC iteratively propagates information between affinity samples, updates the responsibility matrix and availability matrix, and employs these matrices to choose cluster centers (or exemplars of respective clusters. However, since it mainly uses negative Euclidean distance between exemplars and samples as the similarity between them, it is difficult to identify clusters with complex structure. Therefore, the performance of APC deteriorates on samples distributed with complex structure. To mitigate this problem, we propose an improved APC based on a path-based similarity (APC-PS. APC-PS firstly utilizes negative Euclidean distance to find exemplars of clusters. Then, it employs the path-based similarity to measure the similarity between exemplars and samples, and to explore the underlying structure of clusters. Next, it assigns non-exemplar samples to their respective clusters via that similarity. Our empirical study on synthetic and UCI datasets shows that the proposed APC-PS significantly outperforms original APC and other related approaches.

  19. When is a search not a search? A comparison of searching the AMED complementary health database via EBSCOhost, OVID and DIALOG.

    Science.gov (United States)

    Younger, Paula; Boddy, Kate

    2009-06-01

    The researchers involved in this study work at Exeter Health library and at the Complementary Medicine Unit, Peninsula School of Medicine and Dentistry (PCMD). Within this collaborative environment it is possible to access the electronic resources of three institutions. This includes access to AMED and other databases using different interfaces. The aim of this study was to investigate whether searching different interfaces to the AMED allied health and complementary medicine database produced the same results when using identical search terms. The following Internet-based AMED interfaces were searched: DIALOG DataStar; EBSCOhost and OVID SP_UI01.00.02. Search results from all three databases were saved in an endnote database to facilitate analysis. A checklist was also compiled comparing interface features. In our initial search, DIALOG returned 29 hits, OVID 14 and Ebsco 8. If we assume that DIALOG returned 100% of potential hits, OVID initially returned only 48% of hits and EBSCOhost only 28%. In our search, a researcher using the Ebsco interface to carry out a simple search on AMED would miss over 70% of possible search hits. Subsequent EBSCOhost searches on different subjects failed to find between 21 and 86% of the hits retrieved using the same keywords via DIALOG DataStar. In two cases, the simple EBSCOhost search failed to find any of the results found via DIALOG DataStar. Depending on the interface, the number of hits retrieved from the same database with the same simple search can vary dramatically. Some simple searches fail to retrieve a substantial percentage of citations. This may result in an uninformed literature review, research funding application or treatment intervention. In addition to ensuring that keywords, spelling and medical subject headings (MeSH) accurately reflect the nature of the search, database users should include wildcards and truncation and adapt their search strategy substantially to retrieve the maximum number of appropriate

  20. Approaches for the accurate definition of geological time boundaries

    Science.gov (United States)

    Schaltegger, Urs; Baresel, Björn; Ovtcharova, Maria; Goudemand, Nicolas; Bucher, Hugo

    2015-04-01

    Which strategies lead to the most precise and accurate date of a given geological boundary? Geological units are usually defined by the occurrence of characteristic taxa and hence boundaries between these geological units correspond to dramatic faunal and/or floral turnovers and they are primarily defined using first or last occurrences of index species, or ideally by the separation interval between two consecutive, characteristic associations of fossil taxa. These boundaries need to be defined in a way that enables their worldwide recognition and correlation across different stratigraphic successions, using tools as different as bio-, magneto-, and chemo-stratigraphy, and astrochronology. Sedimentary sequences can be dated in numerical terms by applying high-precision chemical-abrasion, isotope-dilution, thermal-ionization mass spectrometry (CA-ID-TIMS) U-Pb age determination to zircon (ZrSiO4) in intercalated volcanic ashes. But, though volcanic activity is common in geological history, ashes are not necessarily close to the boundary we would like to date precisely and accurately. In addition, U-Pb zircon data sets may be very complex and difficult to interpret in terms of the age of ash deposition. To overcome these difficulties we use a multi-proxy approach we applied to the precise and accurate dating of the Permo-Triassic and Early-Middle Triassic boundaries in South China. a) Dense sampling of ashes across the critical time interval and a sufficiently large number of analysed zircons per ash sample can guarantee the recognition of all system complexities. Geochronological datasets from U-Pb dating of volcanic zircon may indeed combine effects of i) post-crystallization Pb loss from percolation of hydrothermal fluids (even using chemical abrasion), with ii) age dispersion from prolonged residence of earlier crystallized zircon in the magmatic system. As a result, U-Pb dates of individual zircons are both apparently younger and older than the depositional age